## A Beginner’s Guide to Learning Data Science Fundamentals

I’m a data scientist by profession with an actuarial background.

I graduated with a degree in Criminology; it was during university that I fell in love with the power of statistics. A typical problem would involve estimating the likelihood of a house getting burgled on a street, if there has already been a burglary on that street. For the layman, this is part of predictive policing techniques used to tackle crime. More technically, “It involves a Non-Markovian counting process called the “Hawkes Process” which models for “self-exciting” events (like crimes, future stock price movements, or even popularity of political leaders, etc.)

Being able to predict the likelihood of future events (like crimes in this case) was the main thing which drew me to Statistics. On a philosophical level, it’s really a quest for “truth of things” unfettered by the inherent cognitive biases humans are born with (there are 25 I know of).

Arguably, Actuaries are the original Data Scientists, turning data in actionable insights since the 18th Century when Alexander Webster with Robert Wallace built a predictive model to calculate the average life expectancy of soldiers going to war using death records. And so, “Insurance” was born to provide cover to the widows and children of the deceased soldiers.

Of course, Alan Turing’s contribution cannot be ignored, which eventually afforded us with the computational power needed to carry out statistical testing on entire populations – thereby Machine Learning was born. To be fair, the history of Data Science is an entire blog of its own. More on that will come later.

The aim of this series of blogs is to initiate anyone daunted by the task of acquiring the very basics of Statistics and Mathematics used in Machine Learning. There are tonnes of online resources which will only list out the topics but will rarely explain why you need to learn them and to what extent. This series will attempt to address this problem adopting a “first principle” approach. Its best to refer back to this article a second time after gaining the very basics of each Topic discussed below:

#### We will be discussing:

• Central Limit Theorem
• Bayes Theorem
• Probability Theory
• Point Estimation – MLE’s
• Confidence Intervals
• P-values and Significance Test.

This list is by no means exhaustive of the statistical and mathematical concepts you will need in your career as a data scientist. Nevertheless, it provides a solid grounding going into more advanced topics.

#### Central Limit Theorem

Central Limit Theorem (CLT) is perhaps one of the most important results in all of Statistics. Essentially, it allows making large sample inference about the Population Mean (μ), as well as making large sample inference about population proportion (p).

#### So what does this really means?

Consider (X1, X2, X3……..Xn) samples, where n is a large number say, 100. Each sample will have its own respective sample Mean (x̅). This will give us “n” number of sample means. Central Limit Theorem now states:

&

Try to visualise the distribution “of the average of lots of averages”… Essentially, if we have a large number of averages that have been taken from a corresponding large number of samples; then Central Limit theorem allows us to find the distribution of those averages. The beauty of it is that we don’t have to know the parent distribution of the averages. They all tend to Normal… eventually!

Similarly if we were to add up independent and identically distributed (iid) samples, then their corresponding distribution will also tend to a Normal.

Very often in your work as a data scientist a lot of the unknown distributions will tend to Normal, now you can visualise how and more importantly why!

Stay tuned to DexLab Analytics for more articles discussing the topics listed above in depth.

About the Author: Nish Lau Bakshi is a professional data scientist with an actuarial background and a passion to use the power of statistics to tackle various pressing, daily life problems.

## The Impact of Big Data on the Legal Industry

The importance of big data is soaring. Each day, the profound impact of data analytics can be felt across myriad domains of digital services – courtesy an endless stream of information they generate. Yet, a handful number of people actually ponders over how big data is influencing society’s some of the most important professions, including legal. In this blog, we are going to dig into how big data is impacting the legal profession and transforming the dreary judiciary landscape across the globe.

#### Importance of Big Data

Information is challenging our legal frameworks. Though technology has transformed lives 360-degree, most of the country’s bigwigs and institutions are still clueless about how to harness the power of big data technology and reap significant benefits. The men in power remain baffled about the role of data. The information age is frantic and the recent court cases highlight that the Supreme Court is facing a tough time taming the big data.

However, on a positive note, they have identified the reason of slowdown and are joining the bandwagon to upgrade their digital skills and upend tech modernization strategies. Data analytics is a growing area of relevance and it must be leveraged by the nation’s biggest legal authorities and departments. From tracking employee behaviors to scanning through case histories, big data is being employed everywhere. In fact, criminal defense lawyers are of the opinion that big data is altering their courtroom approaches, which have always dominated the trials with a set of certain evidence. Today, the pieces of evidences have become digital than judicial.

#### Boon for Law Enforcement Officials

The technology of big data has proved to be a welcoming-change for the army of law enforcement officials; the reason being efficiency in prosecuting a large number of criminals in a jiffy. Officials can now scan through piles and piles of data at a super-fast pace and handpick scam artists, hackers and delinquents. Besides law enforcers, police officers are also identifying threats and rounding up criminals before they even plan to get way.

Moreover, the prosecutors are leveraging droves of data to summon up evidence to support their legal arguments in court. That’s helping them win cases! For example, of late, federal prosecutors served a warrant to Microsoft to gain access to their data pool. It was essential for their case.

#### Big Data Transforming Legal Research

Biggest of all, big data is transforming the intricacies of the legal profession by altering the ways how scholars research and analyze the court proceedings. For example, big data is used to study the Supreme Court’s arguments and we have discovered that arguments are becoming more and more peculiar in their own ways.

Such research tactics will largely lead the show as big data technology tends to become cheaper and more widely popular across the market. In the near future, big data is going to be applied in a plethora of industry verticals and we are quite excited to witness impactful results.

As a matter of fact, you don’t have to wait long to see how big data changes the legal landscape. In this flourishing age of round-the-clock information exchange, the change will take no time.

As a matter of fact, you don't have to wait long to see how big data changes the legal landscape. In this flourishing age of round-the-clock information exchange, the change will take no time.

The blog has been sourced from —  e27.co/how-big-data-is-impacting-the-legal-world-20190408

## Big Data Analytics for Event Processing

Courtesy cloud and Internet of Things, big data is gaining prominence and recognition worldwide. Large chunks of data are being stored in robust platforms such as Hadoop. As a result, much-hyped data frameworks are clouted with ML-powered technologies to discover interesting patterns from the given datasets.

#### Defining Event Processing

In simple terms, event processing is a typical practice of tracking and analyzing a steady stream of data about events to derive relevant insights about the events taking place real time in the real world. However, the process is not as easy as it sounds; transforming the insights and patterns quickly into meaningful actions while hatching operational market data in real time is no mean feat. The whole process is known as ‘fast data approach’ and it works by embedding patterns, which are panned out from previous data analysis into the future transactions that take place real time.

#### Employing Analytics and ML Models

In some instances, it is crucial to analyze data that is still in motion. For that, the predictions must be proactive and must be determined in real-time. Random forests, logistic regression, k-means clustering and linear regression are some of the most common machine learning techniques used for prediction needs. Below, we’ve enlisted the analytical purposes for which the organizations are levering the power of predictive analytics:

Developing the Model – The companies ask the data scientists to construct a comprehensive predictive model and in the process can use different types of ML algorithms along with different approaches to fulfill the purpose.

Validating the Model – It is important to validate a model to check if it is working in the desired manner. At times, coordinating with new data inputs can give a tough time to the data scientists. After validation, the model has to further meet the improvement standards to deploy real-time event processing.

#### Apache Spark

Ideal for batch and streaming data, Apache Spark is an open-source parallel processing framework. It is simple, easy to use and is ideal for machine learning as it supports cluster-computing framework.

#### Hadoop

If you are looking for an open-source batch processing framework then Hadoop is the best you can get. It not only supports distributed processing of large scale data sets across different clusters of computers with a single programming model but also boasts of an incredibly versatile library.

#### Apache Storm

Apache Storm is a cutting edge open source, big data processing framework that supports real-time as well as distributed stream processing. It makes it fairly easy to steadily process unbounded streams of data working on real-time.

#### IBM Infosphere Streams

IBM Infosphere Streams is a highly-functional platform that facilitates the development and execution of applications that channels information in data streams. It also boosts the process of data analysis and improves the overall speed of business decision-making and insight drawing.

If you are interested in reading more such blogs, you must follow us at DexLab Analytics.

## Cryptojacking: How Businesses Can Protect Systems from This Latest Cyber Threat

The rising threat of cyber attacks and the sophistication of these crimes have created a frightening security situation all over the world. The cases of data and privacy breach are increasing every day. Both private and public sector are at risk and understandably the average internet user is very paranoid. Cybercriminals keep innovating new ways to take advantage of security vulnerabilities present in systems.

Cryptojacking is one such cyber threat that has targeted countless unsuspected users all around the world. Particularly in India, cryptojacking has become a pressing problem. According to a recent study by Quick Heal Technologies, from January to May 2018, nearly 3 million cryptojacking cases have been reported.

#### What is Cryptojacking?

Cryptojacking is a method of hacking into systems and illegally using them to mine cryptocurrency. Malicious scripts are loaded into machines without the knowledge of owners. The group or individual that loads the malicious program reaps the rewards of cryptomining activities, while the owner of the machine isn’t provided any kind of compensation.

There are two types of processes used to carry out cryptojacking attacks:

In the first process, a cryptomining code is installed in the compromised system by means of an infected file.

In the second method, a website or online ad is infected with a JavaScript-based cryptomining script. When users click on this link, the scrip auto-executes itself.

#### Why is Cryptojacking So Challenging for Businesses?

The malicious script transmits processing power from the compromised machine to the unauthorized cryptocurrency mining. This affects the computer in the following ways:

• Slows down the system
• Causes the machine to lag
• Some applications become completely inaccessible
• Resource-intensive operations related to cryptomining damage the hardware of an infected system and at times even cause it to crash repeatedly.

Cryptojacking is a serious business hazard. These disruptions result in downtime and IT tickets, which basically cost the business a lot of money. Global businesses lose billions of dollars due to IT downtime. Infected systems consume huge amounts of electricity and hence the operational costs increase significantly. Bottom line, cryptojacking eats away business revenues, which if taken precautions may be avoided.

#### How to Protect from Cryptojacking Attacks?

The modern cyber-attack landscape evolves every minute. In the face of such dynamism, it is absolutely essential to adopt a multi-layered approach for preserving IT security. The need of the hour is to invest in advanced security solutions. These solutions must include the following features:

Endpoint Security: In order to protect endpoints from cryptojacking a robust Endpoint Security solution with cutting-edge features like behavior based detection and antivirus is necessary.

Web Filtering: Web Filtering includes a set of tools that can be customized to safeguard your business network from suspicious websites. Distrustful websites are blocked and users are prevented from accessing them.

Network Monitoring: This is a tool that is able to detect huge surges in processor activity, which is a well-known symptom of a cryptojacked device. It helps network administrators keep a close eye on data anomalies.

Mobile Device Management (MDM): Business users depend on mobile phones for conveniently carrying out activities. Hence, deploying a robust MDM solution is important for preventing this type of hijacking.

Apart from these, businesses must ensure basic security hygiene, such as installing a web security solution for the safety of visitors on their website and also carry out patching of latest security updates. For example, SecBI has developed an artificial intelligence solution that analyzes network data and identifies cryptojacking threats.

For more blogs on the latest technical innovations, follow the premier big data Hadoop training institute— DexLab Analytics.

## How Can Big Data Tools Complement a Data Warehouse?

Every person believes that he/she is above average. Businesses feel the same way about their best asset— data. They want to believe that their big data is above average and perfect for implementing advanced big data tools. But, that’s not the case always.

#### Do you really need big data tools?

In the data world, big data tools like Hadoop Spark and NoSQL are like freight trains delivering goods. Freight trains are powerful, but they’ve limited routes and a slow start. They are great for delivering goods in bulk regularly. However, if you need a swift delivery, freight train might not be the best choice.

So firs of all, it is important to understand if there’s a big data scenario in your business or not.

A 100 times increase in data velocity, volume or variety indicates that you have a big data situation at hand. For example, if data velocity increases to hundreds of thousands of transactions per hour from thousands of transactions, or if the data sources shoot up from dozens to hundreds, you can safely conclude that your business is dealing with big data.

In such scenarios, you are likely to get frustrated with traditional SQL tools. A complete revamp or moderate tuning of existing big data tools is needed to effectively handle such massive data sets.

#### What tools to use?

The tool to be used depends on the task at hand. For main business outcomes like sales, payments, etc., traditional reporting tools employed within the data warehouse architecture are suitable. For secondary business outcomes like following the customer journey in detail, tracking browsing history and monitoring device activity, big data tools within data warehouse are necessary. In a data warehouse these events are aggregated into models that show the summarized business processes.

#### Incorporating Big Data Tools in Data Warehouse

Consider an alarm company with sensors that are connected though the internet across an entire country. Storing the response of individual sensors in a SQL data warehouse would incur huge expenses, but no value. An alternative storage solution is retaining this information in data lake environments that are cheaper and later aggregating them in a data warehouse. For example, the company could define sensor events that constitute a person locking up a house. A fact table recording departures and arrivals could be stoked up in a data warehouse as an aggregate event.

There are many other use cases. Some are given below:

Sum up and filter IoT data:  A leading bed manufacturing company uses biometric sensors in their range of luxury mattresses. Apache Hadoop could be used to store individual sensor readings and Apache Spark can be employed to amass and filter signals. The aggregated data in data warehouses can be used to create time-trended reports once the boundary metrics are surpassed.

Merge real-time data with past data: Financial institutes need live access to market data. However, they also need to store that data and use it for identifying historical trends in future. Merging these two types of data with tools like Apache Kafka or Amazon Kinesis is important because, with these tools the data can be directly streamed to visualization tools and there’s hardly any delay.

The ultimate goal is to form a balance between the two sides of the data pipeline. While it is important to collect as much raw data about customers as possible, it is equally important to use the right tool for the right job.

The ultimate goal is to form a balance between the two sides of the data pipeline. While it is important to collect as much raw data about customers as possible, it is equally important to use the right tool for the right job.

## The 8 Leading Big Data Analytics Influencers for 2018

Big data is one of the most talked about technology topics of the last few years. As big data and analytics keep evolving, it is important for people associated with it to keep themselves updated about the latest developments in this field. However, many find it difficult to be up to date with the latest news and publications.

If you are a big data enthusiast looking for ways to get your hands on the latest data news, then this blog is the ideal read for you. In this article, we list the top 8 big data influencers of 2018. Following these people and their blogs and websites shall keep you informed about all the trending things in big data.

#### Kirk Borne

Known as the kirk in the field of analytics, his popularity has been growing over the last couple of years.  From 2016 to 2017, the number of people following him grew by 30 thousand. Currently he’s the principal data scientist at Booz Allen; previously he has worked with NASA for a decade. Kirk was also appointed by the US president to share his knowledge on Data Mining and how to protect oneself from cyber attacks. He has participated in several Ted talks. So, interested candidates should listen to those talks and follow him on Twitter.

#### Ronald Van Loon

He is an expert on not only big data, but also Business Intelligence and the Internet of Things, and writes articles on these topics so that readers become familiar with these technologies. Ronald writes for important organizations like Dataconomy and DataFloq. He has over hundred thousand followers on Twitter. Currently, he works as a big data educator at Simplelearn.

#### Hilary Manson

She is a big data professional who manages multiple roles together. Hilary is a data scientist at Accel, Vice president at Cloudera, and a speaker and writer in this field. Back in 2014, she founded a machine learning research company called Fast Forward labs. Clearly, she is a big data analytics influencer that everyone should follow.

#### Carla Gentry

Currently working in Samtec Inc; she has helped many big shot companies to draw insights from complicated data and increase profits. Carla is a mathematician, an economist, owner of Analytic Solution, a social media ethusiat, and a must-follow expert in this field.

#### Vincent Granville

Vincent Granville’s thorough understanding of topics like machine learning, BI, data mining, predictive modeling and fraud detection make him one the best influencers of 2018. Data Science Central-the popular online platform for gaining knowledge on big data analytics has been cofounded by Vincent.

#### Merv Adrian

Presently the Research Vice President at Gartner, he has over 30 years of experience in IT sector. His current work focuses on upcoming Hadoop technologies, data management and data security problems. By following Merv’s blogs and twitter posts, you shall be informed about important industry issues that are sometimes not covered in his Gartner research publications.

#### Bernard Marr

Bernard has earned a good reputation in the big data and analytics world. He publishes articles on platforms like LinkedIn, Forbes and Huffington Post on a daily basis. Besides being the major speaker and strategic advisor for top companies and the government, he is also a successful business author.

#### Craig Brown

With over twenty years of experience in this field, he is a renowned technology consultant and subject matter expert. The book Untapped Potential, which explains the path of self-discovery, has been written by Craig.

If you have read the entire article, then one thing is very clear-you are a big data enthusiast! So, why not make your career in the big data analytics industry?

If you have read the entire article, then one thing is very clear-you are a big data enthusiast!

## Study: Demand for Data Scientists is Sky-Rocketing; India Leads the Show

Last year, India witnessed a surging demand for data scientists by more than 400% – as medium to large-scale companies are increasingly putting their faith on data science capabilities to build and develop next generation products that will be well integrated, highly personalized and extremely dynamic.

#### Companies in the Limelight

At the same time, India contributed to almost 10% of open job openings for data scientists worldwide, making India the next data science hub after the US. This striking revelation comes at a time when Indian IT sector job creation has hit a slow mode, thus flourishing data science job creation is found providing a silver lining. According to the report, Microsoft, JPMorgan, Deloitte, Accenture, EY, Flipkart, Adobe, AIG, Wipro and Vodafone are some of the top of the line companies which hired the highest number of data scientists this year. Besides data scientists, they also advertised openings for analytics managers, analytics consultants and data analysts among others.

#### City Stats

After blue chip companies, talking about Indian cities which accounts for the most number of data scientists – we found that Bengaluru leads the show with highest number of data analytics and science related jobs accounting for almost 27% of the total share. In fact, the statistics has further increased from the last year’s 25%, followed by Delhi NCR and Mumbai. Even, owing to an increase in the number of start-ups, 14% of job openings were posted from Tier-II cities.

#### Notable Sectors

A large chunk of data science jobs originated from the banking and financial sector – 41% of job generation was from banking sector. Other industries that followed the suit are Energy & Utilities and Pharmaceutical and Healthcare; both of which have observed significant increase in job creation over the last year.

Get hands on training on data science from DexLab Analytics, the promising big data hadoop institute in Delhi.

#### Talent Supply Index (TSI) – Insights

Another study – Talent Supply Index (TSI) by Belong suggested that the demand in jobs is a result of data science being employed in some areas or the other across industries with burgeoning online presence, evident in the form of targeted advertising, product recommendation and demand forecasts. Interestingly, businesses sit on a massive pile of information collected over years in forms of partners, customers and internal data. Analyzing such massive volumes of data is the key.

Shedding further light on the matter, Rishabh Kaul, Co-Founder, Belong shared, “If the TSI 2017 data proved that we are in a candidate-driven market, the 2018 numbers should be a wakeup call for talent acquisition to adopt data-driven and a candidate-first approach to attract the best talent. If digital transformation is forcing businesses to adapt and innovate, it’s imperative for talent acquisition to reinvent itself too.”

Significantly, skill-based recruitment is garnering a lot of attention of the recruiters, instead of technology and tool-based training. The demand for Python skill is the highest scoring 39% of all posted data science and analytical jobs. In the second position is R skill with 25%.

#### Last Notes

The analytics job landscape in India is changing drastically. Companies are constantly seeking worthy candidates who are well-versed in particular fields of study, such as data science, big data, artificial intelligence, predictive analytics and machine learning.

For more information – go to their official website now.

## Predicting World Cup Winner 2018 with Big Data

Is there any way to predict who will win World Cup 2018?

Could big data be used to decipher the internal mechanisms of this beautiful game?

How to collect meaningful insights about a team before supporting one?

#### Data Points

Opta Sports and STATS help predict which teams will perform better. These are the two sports companies that have answers to all the above questions. Their objective is to collect data and interpret it for their clients, mainly sports teams, federations and of course media, always hungry for data insights.

How do they do it? Opta’s marketing manager Peter Deeley shares that for each football match, his company representatives collects as many as 2000 individual data points, mostly focused on ‘on-ball’ actions. Generally, a team of three analysts operates from the company’s data hub in Leeds; they record everything happening on the pitch and analyze the positions on the field where each interaction takes place. The clients receive live data; that’s the reason why Gary Lineker, former England player is able to share information like possession and shots on goal during half time.

The same procedure is followed at Stats.com; Paul Power, a data scientist from Stats.com explains how they don’t rely only on humans for data collection, but on latest computer vision technologies. Though computer vision can be used to log different sorts of data, yet it can never replace human beings altogether. “People are still best because of nuances that computers are not going to be able to understand,” adds Paul.

#### Who is going to win?

In this section, we’re going to hit the most important question of this season – which team is going to win this time? As far as STATS is concerned, it’s not too eager to publish its predictions this year. The reason being they believe is a very valuable piece of information and by spilling the beans they don’t want to upset their clients.

On the other hand, we do have a prediction from Opta. According to them, veteran World Cup champion Brazil holds the highest chance of taking home the trophy – giving them a 14.2% winning chance. What’s more, Opta also has a soft corner for Germany – thus giving them an 11.4% chance of bringing back the cup once again.

If it’s about prediction and accuracy, we can’t help but mention EA Sports. For the last 3 World Cups, it maintained a track record of predicting the eventual World Cup winner impeccably. Using the encompassing data about the players and team rankings in FIFA 2018, the company representatives ran a simulation of the tournament, in which France came out to be the winner, defeating Germany in the final. As it has already predicted right about Germany and Spain in 2014 and 2010 World Cups, consecutively, this new revelation is a good catch.

So, can big data predict the World Cup winner? We guess yes, somehow.

So, can big data predict the World Cup winner? We guess yes, somehow.

The blog has been sourced from – https://www.techradar.com/news/world-cup-2018-predictions-with-big-data-who-is-going-to-win-what-and-when

## Fintech Companies: How They Are Revolutionizing the Banking Industry?

The world of technology is expanding rapidly. And so is the finance. Fintech is the new buzzword; and its extensive use of cutting edge algorithms, big data solutions and AI is transforming the traditional banking sector.

Nevertheless, there exist many obstacles, which fintech companies need to deal with before creating an entirely complementary system that covers the gap between both.

### Ezbob and LaaS

Innovation takes time to settle, but with little effort, banks can strike gold than ever. New transparency laws and digital standards are being introduced and if banks are quicker in embracing this new technology, they can ring off success very easily. Not every fintech is determined to cause discomfort to banks, in fact a lot of fintech startups offer incredible services to attract new customers.

One of them is ezbob, a robust platform in partnership with multiple major banking institutions that streamlines an old process with cutting edge technology. This platform develops a smooth, automatic lending process for bank’s customers by sorting data accumulated from more than 25 sources in real time. Currently, it’s leading Lending-as-a-Service (LaaS) industry, which is deemed to be the future of banking sector.

LaaS is one of the key transforming agents that have brought in a new trend in the banking sector. It reflects how everyone can benefit, including customers and partners, when efficiency is improved. Real time decisions are crucial; it helps bankers turn attention to the bigger picture, while technology takes care of other factors.

### The Art of Regulations

Conversely, fintech startups should be wary of regulations. Notwithstanding the fact that technology is fast decentralizing the whole framework and disrupting institutional banking sector, fintech companies should focus on regulation and be patient with all the innovations taking place around. Banks need time to accept the potentials of fintech’s innovation but once they do, they would gain much more from adopting these technologies.

The aftermath of 2008 financial crisis have made it relatively easier for fintech startups to remain compliant and be more accountable. One of the latest regulations passed is about e-invoicing, which require organizations should send digital invoices through a common system. This measure is expected to save billions of dollars on account of businesses and governments, as well.

Some of the other reforms that have been passed recently are mainly PSD2, which has systematized mobile and internet payments, and AMLD, which is an abbreviation of Anti Money Laundering Directive. The later hurts those who don’t want to be accountable for their income, or involved in terrorism activities.

### Conclusion

As closing thoughts, we all can see the financial sector has been the largest consumers of big data technology. According to Gartner, 64% of financial service companies have used big data in 2013. And the figures are still rising.

As closing thoughts, we all can see the financial sector has been the largest consumers of big data technology. According to Gartner, 64% of financial service companies have used big data in 2013. And the figures are still rising.

To be the unicorn among the horses, it's high time to imbibe big data hadoop skills. This new-age skill is going to take you a long way, provided you get certified from a reputable institute.

A Special Alert: DexLab Analytics is offering #SummerSpecial 10% off on in-demand courses of big data hadoop, data science, machine learning and business analytics. Enroll now for #BigDataIngstion: the new on-going admission drive!

The blog has been sourced from – http://dataconomy.com/2017/10/rise-fintechpreneur-matters

