Dexlab, Author at DexLab Analytics | Big Data Hadoop SAS R Analytics Predictive Modeling & Excel VBA - Page 11 of 18

Machine Learning Algorithms – With Python (Part I)

Posted on June 17, 2020June 22, 2020 by Dexlab

Our industry experts introduce beginners to Machine Learning Algorithms with Python. In this blog, we will go through various Machine Learning Algorithms to understand the concepts better. This is the first part of a series.

Machine Learning, a subset of Artificial Intelligence, is a process of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that computing systems can learn from data, identify patterns in them and make intelligent decisions with minimal human intervention.

Parametric and Non-Parametric ML Algorithms

We first divide the mathematical methods for decision making in to sections – parametric and non-parametric algorithms. Parametric has a functional form while non-parametric has no functional form.

Functional form comprises a simple formula like 2+2=4 or Y=F(X). So if you input a value, you are to get a fixed output value. That means, if the data set is changed or being changed, there is not much variation in the results. But in non-parametric algorithms, a small change in data sets can result in a large change in the results.

But we do not desire this. We do not want this massive change in results in investments, for instance. We have various ways to solve this difficulty. For example, in statistics, you must have learnt the Central Limit Theorem – As the number of samples increase, the data will start following the normal distribution.

Here is an experiment on decision making with the help of non-parametric algorithm. We first take a random sample, and we apply an algorithm to it to get a result. We repeat this process several times and get an average of the results. In this way, the variation in our results goes down considerably. We will get a central tendency.

Take for example stock market data where prices are totally random. There is no fixed pattern to it. It is a manmade phenomenon. In the same way, we can make predictions in data sets only when there is a particular pattern. It becomes that much more difficult to make predictions in the absence of a clear pattern. In such a case, we take thousands of samples and work them to get a result before investing. We can use a Decision Tree like Random Forest for this.

Supervised and Unsupervised Algorithms

Now, secondly, we can term ML algorithms as supervised or unsupervised algorithms. Suppose we have data under sub-heads – Name, Age, Gender and Salary and Period of Service. Now, consider the model wherein we are asked to predict the period of service of an employee based on data provided under the rest of the sub-heads based on existing employee data.

Now, in this example, the period of service is the Target. The data sets on the basis of which the prediction will be made – Name, Age, Gender, Salary – is the Input. In such a model, where the target variable is specified, we term it as supervised machine learning algorithm. We do this according to a formula – Y=B0 + B1X1.

In unsupervised learning, the target variable is not provided and all we can do is divide the historical data in clusters. For example, Google Translate runs on a supervised model as do chatbots. Data is not only the new oil, it is everything. And there will come a time of data colonisation whereby the organisation with the best data will rule. The better the date, the better our ML models. Who has the best data sets in the world? Google and Amazon, among others, do.

So this is it, about supervised and unsupervised machine learning. For more on this, do watch our intensive video tutorial on ML algorithms.

(Translated till first 28:00 minutes)

This is the first blog of the series, stay tuned with Dexlab Analytics to read through the whole video we’ll covering in our upcoming blogs!

Top Programming Languages That AI Engineers Can Choose From in 2020

Posted on June 16, 2020June 16, 2020 by Dexlab

Artificial Intelligence, the science of making computers function with the intelligence of the human brain sans the intervention of human beings, is the biggest find of the century. It is powering everything from our personal email to space exploration.

It is, thus, imperative to discuss the very platforms that make AI a reality. Computer programming languages too are fast evolving and there is no one such language that fits the needs of an AI engineer comprehensively. So we shall examine the plus points of the most popular programming languages to choose from this year.

Python

Python is an easy-to-learn programming language that helps AI novices enter the world of programming easily. Python not only has an excellent repository of libraries and a strong community support on the Internet, it is also extremely flexible as a programming language. Platform independence and extensive frameworks that are most needed for Deep Learning and Machine Learning are advantages Python boasts of. Some of its most widely used libraries are Tensor Flow, Scikit-Learn, PyTorch, Keras, SparkMLlib, MXNet and Theano.

Java

Java, widely held to be the best programming language in the world, has two decades worth of testimonials by AI engineers to back the claim. It is highly user friendly and flexible in nature, equipped with excellent platform independence. It is therefore widely sought after for developing AI models. It has some very strong libraries like TensorFlow, Deep Java library, Kubeflow, Open NLP among others.

R

R, created in 1995, is currently maintained by the R Development Core Team. It is the implementation of S programming language. It helps AI engineers develop statistical software and data analysis. What makes it robust, as a language for developers, is the fact that it facilitates crunching of large numbers. In this regard it scores over python. Moreover, says a report, with R you can work on various paradigms of programming such as functional programming, vector computation, and object-oriented programming.

Prolog

Short for Logic Programming, this language is especially suited to build AI models on and develop NLP packages with. For instance, Prolog is used to build chatbots very effectively. Eliza, the first ever chatbot, was built with the assistance of Prolog. “Prolog offers two approaches for implementing AI that has been in practice for a long time and is well-known among data scientists and researchers: The Symbolic Approach that includes rule-based expert systems, theorem provers and constraint-based approaches and the Statistical Approach that includes neural nets, data mining and machine learning.”

There are many more languages to choose from. Lisp, Julia and Haskell are some of these strong and worthy languages AI engineers can choose to use besides the ones listed above. Every programming language has its own merits and demerits. It is upto the AI engineer to choose wisely after conducting a thorough research and doing due diligence. Dexlab Analytics, a premier artificial intelligence training institute in Gurgaon, suggests the use of Python and R for building AI models.

How AI is Transforming The IT Industry

Posted on June 15, 2020June 15, 2020 by Dexlab

Artificial Intelligence, the science of making computers function with human-like intelligence, has taken the world by storm. It has transformed the biggest of businesses and industries, from healthcare to agriculture to space exploration. Artificial Intelligence has already become the biggest find of the century.

Information Technology, related to all things computers, software and data transmission, cannot be untouched by artificial intelligence. AI has already brought several advantages to the IT sector, a subject that we are going to examine in this blog brought to you by DexLab Analytics, a premier artificial intelligence training institute in Gurgaon.

Secure Systems

Security of data is of prime importance in today’s world where data is the new oil. Both government and private organisations are therefore striving to better protect the tons of data they are privy to. Through the use of algorithms, AI can provide the necessary security and help create a layered secure system. Not only that, it can also help detect security breaches and potential threats. According to a report, AI and Machine Learning have become crucial to data security in the IT industry.

Productivity Enhancement

In the IT industry, the most important thing developers are expected to perfect is programming. However, they face numerous challenges in the course of their stints in IT companies with problems like bugs in code and erroneous code marring their goals. AI can be used to solve this problem in that a series of algorithms can be used to aid programmers write better and bug-free code. By judging the structure of the code, AI powered systems can provide suggestions that can improve productivity and help save time in the production process.

Automating Backend Processes

AI is a great enabler of automation, work that can be accomplished without or with minimal human intervention. With the use of deep learning techniques, AI can go a long way in helping automate back end processes in IT companies. This will not only help save costs but also increase accuracy and reduce human effort. “AI enabled methods improve over time as the algorithms adjust to enhance productivity and learn from mistakes,” says a report.

Application Deployment

During software development, deployment control involves various stages and this means that the software versioning control is advantageous and crucial to the deployment stage. AI is known for its predictive powers and capabilities. It can thus be used for predicting problems during the versioning stage. This eases the entire chain of processes because programmers and developers do not have to wait till the last stage to know about hiccups or improve the application’s processes.

Server Optimization

In IT offices and workplaces, computer servers are more often than not loaded with requests in the millions. They in turn have to open those many corresponding web pages, a process that can make them slow and unresponsive. AI, as a service, can help solve this problem by optimising the host server to improve customer service and enhance operations. As the demand for IT increases across business sectors, AI will be increasingly used to integrate staffing demands and provide a seamless integration of current business functions with technological ones.

—EOM—

Classical Inferential Statistics: Theory of Sampling (Part -1)

Posted on June 12, 2020June 12, 2020 by Dexlab

Contents:

Introduction
Basic Building Blocks of Classical Sampling Theory
Types of Sampling
Sampling Distribution: Overview
Conclusion

1. Introduction:

Predictive models are developed over a specific time period and on a certain set of records. However, implementation happens on a mutually exclusive time period (Out of Time Sample). Therefore, the models developed need to be trained and validated on different datasets: 1. Model Development Data (training data) 2. In sample validation data 3. Out of time validation data. A predictive model is considered to be robust, if their performance remains more or less stable in the out of time samples. An important observation from the description above is the following: The entire data (Population) is never accessible for model development and hence, is unknown. Models are developed on subsets (Samples) which are representative of the entire data. Representativeness of the samples are important to ensure the robustness in the model performance. This blog explores the key concepts related to creating representative samples from the population. Section 2 describes the basic components of the classical sampling theory, Section 3 describes the key types of sampling, Section 4 introduces the concept of Sampling Distribution and Section 5 concludes with the key summary of findings.

2. Basic Building Blocks of Classical Sampling Theory:

Introduction To Population and Sample

The two basic blocks of Classical Sampling Theory are: 1. Population 2. Samples. Populationis defined as the base of all the observations which are eligible to be studied to address key questions relating to a statistical investigation or a business problem, irrespective of whether it can be accessed or not. In real time the entire population is always unknown since there is a part of the population which cannot be accessed due to different reasons such as: Data Archiving Problems, Data permissions, Data Accessibility etc. A representative subset of the population is called a sample. The distribution of the variables in the sample is used to form an idea about the respective distribution of the variables in the population.

In a real time, any predictive modelling exercise uses the samples, since they cannot practically use the population. The population is not accessible because of the following reasons:

Observation Exclusions used in models: Observation Exclusions are used in predictive models to remove unnecessary observations, which are redundant for analysis. For example, when developing a credit risk model, observations which are bankrupts or frauds are removed from the analysis, since frauds and bankrupts are a part of operational risk.
Variable Exclusions used in the models: Variable Exclusions are used in predictive models to remove unnecessary variables which are redundant for analysis. For example, when developing a credit risk model, variables which are market-oriented variables or operational variables are excluded.
Robustness Check of the developed models: The developed models are validated on multiple samples such as In-sample Validation data, Out of Time Validation samples Therefore, only a fraction of the dataset is available for model development. Hence, the population is always unknown, irrespective of the datasets, and hence the key statistical distributions of the population are anonymous

Mathematical Framework To Describe The Sampling Theory Framework:

Let X be a N x k vector (where N = Total number of rows that the matrix has (observation) and k = Total number of columns (variables)) which is normally distributed with mean μ and variance The population mean μ and variance both are unknown numerical features of the population distribution. These are called the Parameters: A functional form of all the population observations.

The key objective of the Classical Sampling Theory is to provide the appropriate guidelines for analysing the Population parameters based on the statistical moments of the sample. The statistical moments of the samples are called Estimators. The Estimators are a functional form of all the sample observations. For example, let us assume a subset of size ‘n’ is extracted from X such that the sample S is a n x k vector which is normally distributed.are the sample means and the sample variance respectively. The descriptive moments are called statistics. A Statistic is an estimator with a sampling distribution. (Detailed Discussion: Section 4). The key objective of the classical sampling theory is to estimate the population parameters using the sample statistics, such that any difference between the two measures are statistically insignificant and considered to be an outcome of sampling fluctuations.

Classical Sampling Methods:

3. Types of Sampling

Broadly, there are two types of sampling methods discussed under the Classical Sampling theory: (i) Random Sampling (ii) Purposive Sampling. The different types of sampling and a brief description of each is provided in the figure below:

Applications Of Sampling Methods:

In the real time predictive modelling exercise, Stratified Random Sampling is considered to be of a wider appeal, than the Simple Random Sampling. Business datasets contain different categorical variables like: Product Type, Branch Size category, Gender, Income Groups etc. While splitting the total data into development data and Validation data, it is important to ensure that representation of the key categorical variables is made in the samples. This is important to ensure representativeness of the sample and robustness of the model. In this case a stratified random sampling is more preferred than the Simple Random Sampling. The use of Simple Random Sampling is limited to the cases where the data is symmetric and not much of heterogeneity is observed among the distribution of the values of the variables. The following examples discuss the applications of the Classical Sampling methods:

Example01: Splitting the Model Development Data into Training and Validation dataset

Models, when developed needs to be validated. The standard practice is to divide the data into 70% – 30% proportion. The models are trained on 70% of the observations and validated using the remaining 30%. To ensure the robustness of the model the distribution of the target variable should be similar in both the development and validation datasets. Therefore, the target variable is used as the Strata variable.

Example02: Boot Strapping Analysis

Boot Strapping Exercises exhaustively use Simple Random Sampling with Replacement. It is a nonparametric resampling method used to assign measures of accuracy to sample estimates.

4. Sampling Distribution: Overview

Sampling distribution of a statistic may be defined as the probability law which the statistic follows, if repeated random samples of a fixed size are drawn from specified population. A number of samples, each of size n, are taken from the same population and if for each sample the values of the statistic is calculated, a series of values of the statistic will be obtained. If the number of samples is large, these may be arranged into a frequency table. The frequency distribution of the statistic that would be obtained if the number of samples, each of the same size (say n), were infinite is called the Sampling distribution of the statistic. The table below shows a Sample Distribution and its associated frequency distribution:

5. Conclusion:

The blog, brought to you by DexLab Analytics, a premier institute conducting statistical analytics courses in Gurgaon and business analysis training in Delhi, introduces the basic concepts of Classical Sampling Framework. The objective here has been to explore the broad tenets of sampling theory, such as the different methods of sampling, their usages and their respective advantages and disadvantages. The Stratification Random Sampling is a more versatile sampling method compared to Simple Random Sampling methods. The concept of Sampling Distribution has been introduced but not discussed in details. This is to be the subject matter of the next blog: Sampling Distributions and its importance in Sampling theory.

Basic of Statistical Inference Part-I from Dexlab Analytics

Akash Dasgupta
Research Associate, DexLab Analytics

Stay Home and Upskill to Beat the Impact of a Global Recession

Posted on June 11, 2020June 12, 2020 by Dexlab

The US economy, as it was officially announced by the United States National Bureau of Economic Research on June 8, entered a recession in February after hitting a peak of economic activity and growth. This is the first time the US economy has undergone a recession since the global financial crisis of 2008-09, says a report.

In the US alone, 19.6 lakh cases of covid-19 positive patients have been reported till date with 1.1 lakh cases of deaths recorded, the highest for any country in the world. In such a dire situation, the silver lining seems to be the fact that this recession, intensified by the lockdown that the country has imposed on itself to abate the spread of the disease, might be deep but short lived, The New York Times reported.

Irrespective of when the recession will end, poverty levels have already begun spiking the world over. The World Bank has said that, “the highest share of countries in 150 years would enter recessions at the same time. As many as 90% of the 183 economies () examined are expected to suffer from falling levels of gross domestic product (GDP) in 2020, even more than the 85% of nations suffering from recession during the Great Depression of the 1930s”, The Guardian reported.

This will lead to dramatic rise in levels of poverty the world over. However, India might fare better on the global front for more reasons than one. Some economists feel “the (Indian) economy may do better than some other developing economies, which are heavily dependent on world trade” because of “lower dependence on exports (that) means less exposure to the decline in world trade. This and the low price of crude oil, our biggest import, may mean that we don’t suffer an external shock”.

In such circumstances, it is advisable that you stay home and not despair. Doing nothing but fretting will only add to your woes and not help the situation. Neither will binge-watching web series help. Instead, what you can do is ready yourself for a post COVID-19 world. You can do this by primarily upskilling yourself i.e.upgrading your skill set.

The only way to do this is remotely, though online classes available by the dozen. In fact, celebrities like Shakira have begun taking online classes (she in ancient philosophy) this lockdown while others like director Kevin Smith have finished old pending projects. The best skills to upgrade would, however, be those pertaining to computer science courses like big data, machine learning, deep learning or even credit risk modelling. These high-in-demand courses will look good on your résumé and instantly add to your employability wherever you plan to move to next.

In India, DexLab Analytics, a premier institute offering some of the best credit risk modelling training courses and R programming courses in Gurgaon, suggests you try and learn a new programming language or enrol in a new business analytics course so your résumé stands stronger than it was before the lockdown. This will help you beat competition when you will be searching for work opportunities post the lockdown.

ROC-AUC-for-Multi-Class-Classification-Release Highlights for Scikit-learn 0.22

Posted on June 10, 2020August 10, 2020 by Dexlab

Today we are going to learn about the new releases from Scikit-learn version 0.22, a machine learning library in Python. We, through this video tutorial, aim to learn about the much talked about new release wherein ROC-AUC curve supports Multi Class Classification. Prior to this version, Scikit-learn did not have a function to plot the ROC curve.

To access our previous tutorial on the plotting of the ROC curve, click here.

The ROC-AUC score function can also be used in multi-class classification. Two averaging strategies are currently supported: the one-vs-one (OvO) algorithm computes the average of the pairwise ROC AUC scores and the one-vs-rest (OvR) algorithm computes the average of the ROC AUC scores for each class against all other classes.

In both cases, the multiclass ROC AUC scores are computed from probability estimates that a sample belongs to a particular class according to the model. The OvO and OvR algorithms support weighting uniformly (average=’macro’) and weighting by prevalence (average=’weighted’).

To begin with, we import multi classification, SVC and roc_auc_score. Then we specify the number of classes we want in the multi-classification function. Then we apply the SVC function and finally the roc_auc_score one. This function will give us the probable prediction for all the classes and we will then choose the one that has the highest probability. When we run it we get a ROC_AUC score of 0.99.

The code sheet is provided in a Github repository here.

For more on this do watch the video attached herewith. This tutorial was brought to you by DexLab Analytics. DexLab Analytics is a premiere Machine Learning institute in Gurgaon.

Watch the video here.

Dexlab Analytics Starts National Level Training On Data Analysis Using OpenAir package of R

Posted on June 9, 2020June 9, 2020 by Dexlab

From Saturday, 6th June 2020, a team of senior consultants at DexLab Analytics has been conducting a national level training for more than 40 participants who are research scholars, MPhil students and professors from colleges like IIT, CSIR, BHU and NIT, among others. This one of a kind, crowd-funded training is being conducted on “Environment Air pollution Data Analysis using OpenAir package of R”.

The training is a result of the lockdown wherein DexLab Analytics is working towards its upskilling initiatives for professionals and subject matter experts across India. The training is being conducted in DexLab Analytics’TraDigital format – real time, online, classroom styled, instructor-led training.

The attendees will be taking up these interactive classes from the safety and comfort of their homes. They will be getting assignments, learning material and recordings virtually.

The one-month-long training will be conducted in R Programming, Data Science and Machine Learning using R Programming from the perspective of Environmental Science. DexLab Analytics is conducting this training module in line with the tenets of ‘Atmanirbhar India’.

DexLab Analytics is a leading data science training institute in India with a vast array of state-of-the-art analytics courses, attracting a large number of students nationwide. It offers high-in-demand professional courses like Big Data, R Programming, Python, Machine Learning, Deep Learning, Data Science, Alteryx, SQL, Business Analytics, Credit Risk modeling, Tableau, Excel etc. to help young minds be data-efficient. It has its headquarters in Gurgaon, NCR.

For more information, click here –

www.prlog.org/12825521-dexlab-analytics-starts-national-level-training-on-data-analysis-using.html

Using Deep Learning To Track Tropical Cyclones: A Study

Posted on June 5, 2020June 5, 2020 by Dexlab

The severe cyclonic storm Nisarga approached the Maharashtra coast around Alibagh in Raigadh with “a sustained wind speed of 100-110 kmph” on June 3, 2020. Then it made landfall at Alibagh at around noontime. Landfall simply means that the storm, after having intensified over the ocean, has moved on to land.

Though the storm mellowed down in intensity as it approached the Maharashtra coast, government bodies took all precautions and evacuation work was done in advance on the basis of forecasts done by meteorologists and scientists.

To save lives and property, it is imperative to predict cyclones and the intensity with which they will strike. Deep Learning, a branch of artificial Intelligence, is helping scientists make breakthroughs in the science of forecasting cyclones.

Image Source: outlookindia.com

Existing Storm Forecast Models

Most conventional dynamical models make accurate short term predictions but they are computationally demanding and “current statistical forecasting models have much room for improvement given that the database of past hurricanes is constantly growing”, says a report.

A tropical cyclone forecast involves the prediction of several interrelated features like track, intensity, rainfall, storm surge etc. The development of current hurricane and cyclone forecasts have advanced over the years but they are largely statistical in nature. The main limitation of this method is the complexity and non-linearity of atmospheric systems.

Deep Learning Models

Recurrent Neural Networks in deep learning models have been, of late, used to study increasingly complicated systems instead of the traditional methods of forecasting because they promise more accuracy. RNNs are a class of artificial neural networks where the modification of weights allows the model to learn intricate dynamic temporal behaviours, says another report.

An RNN with the capability of modelling complex non-linear temporal relationships of a hurricane or a cyclone could increase the accuracy of predicting future cyclonic path forecasts.

Machine Learning

Generally speaking, there are two methods or approaches to detecting extreme weather events like tropical cyclones – the data driven method which includes machine learning and the model driven approach which includes numerical simulation.

“The model-driven approach has the limitation that the prediction error increases with lead time because numerical models are inherently dependent on initial values. On the other hand, machine learning, as a data-driven approach, requires a large amount of high-quality training data,” says a report.

High quality data is easy to procure given the large amounts of data generated from weather stations on a daily basis the world over. So the machine learning method is easier to work and generate results from.

Conclusion

So what was difficult to do, that is find suitable metrics to study and detect the path of tropical cyclones earlier, has now become easier to do and scientists have been able to achieve accuracy in their predictions through the use of neural networks and artificial intelligence in general. For more on the subject, do read our blog here and here. Dexlab Analytics is a premier Deep Learning training institute in Delhi.

How Artificial Intelligence Powers Earthquake Prediction

Posted on June 4, 2020June 4, 2020 by Dexlab

Artificial Intelligence is the key to the future of weather forecasting, a fact well known. But did you know it is also powering earthquake prediction the world over? Yes. Artificial Intelligence techniques like machine learning are gradually being enlisted in forecasting seismic activity.

While earthquake prediction has not yet become an exact science, efforts are on to make improvements and make forecasts reliable. For this, AI powered neural networks, the same technology behind the success of driverless cars and digital assistants, is being used to enhance research based on seismic data.

Neural Networks

A report says that, “Scientists say seismic data is remarkably similar to the audio data that companies like Google and Amazon use in training neural networks to recognize spoken commands on coffee-table digital assistants like Alexa.”

When it comes to studying earthquakes, it is the computer, a fast and able machine, looking for patterns in mountains of data rather than relying on the weary eyes of a scientist. Also, instead of a sequence of words, what the computer is studying is a sequence of ground-motion measurements.

Studying Aftershocks — Image Source: cbs8.com

Studying Aftershocks

Scientists in the US have experimented with neural networks to accelerate earthquake analysis and the speed at which they were producing results and studies was 500 times faster than they could in the past. Also, AI is not only useful in studying earthquakes but it is being used in forecasting earthquake aftershocks as well.

In fact, researchers say it is a time of great scientific advancement, so much so, that “technology can do as well as — or better than — human experts”.

Artificial Intelligence — Image Source: smithsonianmag.com

Artificial Intelligence

Geophysicist Paul Johnson’s team in the US has been studying earthquakes for quite some time now and it has made advancements in “using pattern-finding algorithms similar to those behind recent advances in image and speech recognition and other forms of artificial intelligence, (where) he and his collaborators successfully predicted temblors in a model laboratory system — a feat that has since been duplicated by researchers in Europe”, says a report.

Now Mr Johnson’s team has published a paper wherein artificial intelligence has been used to study slow slip earthquakes in the Pacific Northwest. While advancements are being made in the field of studying slow slip earthquakes, it is the bigger and more potent ones that really need to be studied. But they are rare. So the question remains – Will Machine Learning be able to analyse a small data set and predict with confidence the next big earthquake?

Machine Learning

Researchers claim “that their (machine learning) algorithms won’t actually need to train on catastrophic earthquakes to predict them.” Studies conducted recently suggest “seismic patterns before small earthquakes are statistically similar to those of their larger counterparts”. So, a computer trained on hundreds and thousands of those small temblors might be able enough to predict the big ones.

For more on artificial intelligence, and its varied applications, do peruse the DexLab Analytics website today. DexLab Analytics is a premier institute in India offering Machine Learning courses in Delhi.

Call us to know more

Gurgaon

Kolkata