Machine Learning Using Python Archives - Page 3 of 12 - DexLab Analytics | Big Data Hadoop SAS R Analytics Predictive Modeling & Excel VBA

Gradient Boosting In scikit-learn 0.22 For Handling Missing Values

Posted on August 10, 2020August 10, 2020 by Dexlab

A new tutorial session regarding the scikit-learn 0.22 is here and our sole focus is going to be updating your knowledge regarding the new features that have been added to this library. For this particular session we have decided to introduce you to the concept of gradient boosting that can handle the missing values. This concept is being introduced to clear out a previous misconception regarding the functioning of gradient boosting for this particular purpose.

The earlier notion surrounding GBM or, the gradient boosting algorithm in scikit-learn, was that it was unable to handle the missing values. In this tutorial we want to clarify that misconception, because, contrary to the notion XGBoost library or, XGB library is perfectly capable of handling the missing value analysis. It has been found that XGB library performs better than the normal method taken to find the missing values.

Now getting back to the scikit-learn 0.22 way of solving the issue of missing values. There has been an enhancement in the algorithm gradient boosting due to which you no longer have to handle the missing values because it will handle it of itself.

So take a look at how the concept of native support for missing values for gradient boosting works.

The ensemble algorithm, ensemble.HistGradientBoostingClassifier and ensemble.HistGradientBoostingRegressor, both classification regression now have the power of native support for missing values or, (NaNs). This is indicative of the fact that there is no need now for imputing data during training or predicting.

To gain an insight into how you perform this you need to follow the complete code sheet that you can find here

Now, as you go through the code you will find the word enable, which might surprise you and make you question why it says enable here? Well, this is because it is still being developed.

So, basically all of the algorithms in the scikit-learn 0.22 that are under development process have to run an extra line of code that goes like enable_hist_gradient_boosting. After further development there won’t be any need of that.

The video attached below will further explain how the algorithm works.

There will be more informative tutorial sessions like this, so to stay updated keep following the DexLab Analytics blog.

Watch the video here.

KNN Imputer – Release Highlights for Scikit-learn 0.22

Posted on July 1, 2020August 10, 2020 by Dexlab

Today we are going to learn about the new feature of Scikit-learn version 0.22 called KNN Imputation. This feature now enables us to support imputation for completing missing values using k-Nearest Neighbours (KNN). To track our tutorials on other new releases from scikit-learn, read our blog here and here.

Introduction

Each sample’s missing values are imputed using the mean value from nearest neighbours found in the training set. Two samples are close if the features that are neither missing are close. By default, a Euclidean distance metric that supports missing values, nan_euclidean_distances, is used to find the nearest neighbours.

Input and Output

So, what we do first is to import libraries like NumPy and run them. Then we create as many rows as we wish to. Then we run the function KNN Imputer and we can decide how many neighbours we want. We first, as is the procedure to use scikit-learn goes, create an object and then run it. Then we can directly put the input values in imputer.fit_transform and get the output values in the form of patterns detected in the input values.

The code sheet for this tutorial is provided in a Github repository here

For more on this do watch the video attached herewith. This tutorial was brought to you by DexLab Analytics. DexLab Analytics is a premiere Machine Learning institute in Gurgaon.

Watch the video here.

Machine Learning Algorithms – With Python (Part II)

Posted on June 22, 2020June 22, 2020 by Dexlab

In the first part of this blog, we covered Parametric and Non-Parametric Machine Learning algorithms and Supervised and Unsupervised Machine Learning Algorithms. If you haven’t gone through it yet, check it out here: dexlabanalytics.com/blog/machine-learning-algorithms-with-python-part-i

In this blog we are going learn about Semi Supervised Machine Learning algorithms.

What are Semi Supervised ML algorithms?

Those algorithms in which only half of the historical data’s target data has been specified are called semi-supervised algorithms. The way to go about solving this is by making a model on the basis of the portion of historical data that has the target specified and then apply this model to the rest of the data to predict the outcomes. Now, combine the two sets of data, get the target variable and make a model on the basis of this target variable.

New Nomenclature

In the equation Y= B0 + B1X, Y is called the Target Variable while in statistics it is called the Dependent Variable. And X is called Features or Attributes whereas in statistics it is called Independent Variable. B0 and B1 are called Weights while in statistics they are called Coefficients (Intercept and Slope, respectively).

In the equation Ÿ – Y = error, the error in statistics is called Residual but in Machine Learning it is called Cost Function. And the elements of the historical data set that in statistics are known as Records or Observations, in machine learning are known as Instances.

What is Bias Variance Trade-Off?

In parametric algorithms like linear regressions, several assumptions are made before building a model. These assumptions can be things like having only those inputs that have a relationship with the target variable or the fact that the error should be random. The benefit of this process is the fact that Ÿ or the predicted results are consistent and there is not much variance in them.

Now, if we are to take a Decision Tree or any other non-parametric Machine Learning algorithm, a small change in the data set forces a large variance in the Target variable. But, unlike in parametric ML algorithms, there are no basic assumptions in non-parametric assumptions. So, in such a case, the error or mean square error, is a combination of the square of bias and variance.

MSE = Bias2 + Variance

Increasing any one (the square of the bias) will lead to a decrease in the other (variance) and vice versa.

In this case, we need to balance or trade off the two – the square of the bias and the variance.

While the bias cannot be changed much, we can control the variance by increasing or decreasing the parameters of the experiment.

What is Overfitting and Underfitting?

Overfitting is the condition when the accuracy figure of the ‘trained’ data set is larger in number than the accuracy figure of the ‘tested’ unseen data set. This is an undesirable condition. Underfitting is the opposite wherein the accuracy figure of the trained data is lower than that of the tested unseen data. This is also undesirable. What we seek to aim at is an equal accuracy in both the tested and trained models.

To limit Overfitting we must –

Use a resampling technique to estimate model accuracy by repeating experiments with the data and then drawing an average of the accuracy figures.
Hold back a validation data set to test your model on and increase the number of models to experiment on the trained data set.

We would like to conclude out second part of this tutorial here. For more on this, visit the third blog on Machine Learning Algorithms with Python.

(Translated from 28:00 – 1:19:00)

Machine Learning Algorithms – With Python (Part I)

Posted on June 17, 2020June 22, 2020 by Dexlab

Our industry experts introduce beginners to Machine Learning Algorithms with Python. In this blog, we will go through various Machine Learning Algorithms to understand the concepts better. This is the first part of a series.

Machine Learning, a subset of Artificial Intelligence, is a process of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that computing systems can learn from data, identify patterns in them and make intelligent decisions with minimal human intervention.

Parametric and Non-Parametric ML Algorithms

We first divide the mathematical methods for decision making in to sections – parametric and non-parametric algorithms. Parametric has a functional form while non-parametric has no functional form.

Functional form comprises a simple formula like 2+2=4 or Y=F(X). So if you input a value, you are to get a fixed output value. That means, if the data set is changed or being changed, there is not much variation in the results. But in non-parametric algorithms, a small change in data sets can result in a large change in the results.

But we do not desire this. We do not want this massive change in results in investments, for instance. We have various ways to solve this difficulty. For example, in statistics, you must have learnt the Central Limit Theorem – As the number of samples increase, the data will start following the normal distribution.

Here is an experiment on decision making with the help of non-parametric algorithm. We first take a random sample, and we apply an algorithm to it to get a result. We repeat this process several times and get an average of the results. In this way, the variation in our results goes down considerably. We will get a central tendency.

Take for example stock market data where prices are totally random. There is no fixed pattern to it. It is a manmade phenomenon. In the same way, we can make predictions in data sets only when there is a particular pattern. It becomes that much more difficult to make predictions in the absence of a clear pattern. In such a case, we take thousands of samples and work them to get a result before investing. We can use a Decision Tree like Random Forest for this.

Supervised and Unsupervised Algorithms

Now, secondly, we can term ML algorithms as supervised or unsupervised algorithms. Suppose we have data under sub-heads – Name, Age, Gender and Salary and Period of Service. Now, consider the model wherein we are asked to predict the period of service of an employee based on data provided under the rest of the sub-heads based on existing employee data.

Now, in this example, the period of service is the Target. The data sets on the basis of which the prediction will be made – Name, Age, Gender, Salary – is the Input. In such a model, where the target variable is specified, we term it as supervised machine learning algorithm. We do this according to a formula – Y=B0 + B1X1.

In unsupervised learning, the target variable is not provided and all we can do is divide the historical data in clusters. For example, Google Translate runs on a supervised model as do chatbots. Data is not only the new oil, it is everything. And there will come a time of data colonisation whereby the organisation with the best data will rule. The better the date, the better our ML models. Who has the best data sets in the world? Google and Amazon, among others, do.

So this is it, about supervised and unsupervised machine learning. For more on this, do watch our intensive video tutorial on ML algorithms.

(Translated till first 28:00 minutes)

This is the first blog of the series, stay tuned with Dexlab Analytics to read through the whole video we’ll covering in our upcoming blogs!

How AI is Powering Manufacturing in 2020

Posted on May 19, 2020May 19, 2020 by Dexlab

The world has seen a transformation in its economic activities since the coronavirus pandemic broke out. Economies have come to a grinding halt and manufacturing has dipped. Now what nations need is resilience and strength to carry on production in all sectors. What they are most depending on is the power of Artificial Intelligence to enhance the manufacturing process and help save money and drive down costs.

Here are some examples of how AI is powering the manufacturing sector in 2020.

AI is being used to transform machinery maintenance and quality in manufacturing operations today, according to Capgemini.
Caterpillar’s Marine Division is using machine learning to analyze data on how often its shipping equipment should be cleaned helping it save thousands of dollars.
The BMW Group is using AI to study manufacturing component images in and spot deviations from the standard production procedure in real-time.

In fact, a study shows that in the four earlier global economic downturns companies using AI were actually successful in increasing both sales and profit margins. Companies are all striving to utilize human experience, insights and AI techniques to give manufacturing a fillip in these times of a crisis.

Manufacturing using AI in real-time

Real-time monitoring of the manufacturing process is advantageous because it translates to sorting out production bottlenecks, tracking scrap rates and meeting customer deadlines among other things. The huge cache of data used can be utilized to build machine learning models.

Supervised and unsupervised machine learning algorithms can study multiple production shifts’ real-time data within seconds and predict processes, products, and workflow patterns that were not known before. A report suggests 29% of AI implementations in manufacturing are for maintaining machinery and production assets.

Detecting Outages

It was found that the most popular use of AI in manufacturing is predicting when equipment are likely to fail and suggesting optimal times to conduct maintenance. Companies like General Motors analyze images of its robots from cameras mounted above to spot anomalies and possible failures in the production line and thus preempt outages.

Optimizing Design

General Motors uses AI algorithms to give and produce optimized product design. General Motors can achieve the goal of rapid prototyping with the help of AI and ML algorithms. Designers provide definitions of the functional needs, raw materials, manufacturing methods and other constraints and the company along with AutoDesk has customized Dreamcatcher to optimize for weight and other vital criterion. In this way, AI comes together with human endeavor to produce a-class product designs that cost lesser.

Inconsistencies

Nokia has begun using a video application that takes the help of machine learning to alert an assembly operator if there are inconsistencies in the production process in one of its factories in Oulu, Finland. It alerts a machine operator about inconsistencies in the production of electronic items and this helps preempt poor production process and helps the company save on a lot of money and capital.

There are many other production processes AI is helping revolutionize. Only time will tell how much of AI will power the manufacturing sector. But this technological advancement is surely making an impact on economies worldwide. Meanwhile, for more details, do peruse the DexLab Analytics website. DexLab Analytics is a premiere machine learning institute in Gurgaon.

Stacking Regressor – Latest Releases of Scikit-Learn 0.22

Posted on May 18, 2020August 10, 2020 by Dexlab

Today we are going to learn about the new releases from Scikit-learn version 0.22, a machine learning library in Python. First we learn how to install it on our systems. Then, we come to the much talked about new release called stacking regression.

Now, how does stacking regression work? Well, you have been using machine learning algorithms like Decision Tree or Random Forest. Have you heard of Voter Classifier? It is an algorithm in Scikit-learn. Ensemble algorithm is a combination of two or more algorithms to make it stronger.

When working on a set of data, we must apply all these algorithms to get predicted values. Then we vote out classified predicted values in Voter Classifier. Stacking Classifier is different. What we are doing in it is stacking together the predicted values to make a new input.

Initially, we make prediction by using various algorithms separately. Their results or output are then concatenated together. Then we use this output as a new input and apply the algorithms to it to get target variable. This method is known as stacking regression.

We try this out on a data set that can be taken from a github repository the link to which is given below.

Then we use two algorithms as estimators. Then we use stacking regression to build a model. For more on this do watch the video attached herewith. This tutorial was brought to you by DexLab Analytics. DexLab Analytics is a premiere Machine Learning institute in Gurgaon.

5 Chatbots You Should Know About

Posted on May 8, 2020May 8, 2020 by Dexlab

Chatbots or “conversational agents” are software applications that mimic or imitate written or spoken human speech for the purposes of facilitating a conversation or interaction with a human being.

These applications have become one of the most ubiquitous software applications out there with the advancement of machine learning technology and NLP.

“Today’s chatbots are smarter, more responsive, and more useful – and we’re likely to see even more of them in the coming years… chatbots are used most commonly in the customer service space, assuming roles traditionally performed by living, breathing human beings such as Tier-1 support operatives and customer satisfaction reps.”

Conversational agents are becoming a common occurrence partly due to the fact that barriers to entry in creating chatbots such as sophisticated programming knowledge have become redundant.

How Chatbots work

The crux of chatbot technology is natural language processing or NLP, the same technology “that forms the basis of the voice recognition systems used by virtual assistants such as Google Now, Apple’s Siri, and Microsoft’s Cortana.” “Chatbots process the text presented to them by the user…infer what they mean and/or want, and determine a series of appropriate responses based on this information.”

Here are 5 companies using chatbots for various roles like marketing, communicating with marginalized groups and patients suffering from sleeplessness and memory loss.

Endurance

Russian technology company Endurance developed a companion chatbot to help dementia patients cope with decreased verbal ability. Many patients with Alzheimer’s disease use the chatbot to converse with. In turn, the chatbot identifies deviations in conversational patterns of the patient that might indicate a problem with memory and recollection.

Casper

Casper’s Insomnobot 3000 is a conversational agent that aims to help insomniacs by posing as a companion to talk to while the rest of the world sleeps. However, at this point, “Insomnobot 3000 is a little rudimentary.”

UNICEF

International child advocacy nonprofit UNICEF is using chatbots to help people living in developing countries speak out about the most urgent needs in their communities. The bot, named U-Report, focuses on large-scale data gathering via polls. UNICEF then uses feedback as the basis for potential policy recommendations.

MedWhat

This chatbot aims at making medical diagnoses faster, easier, and more transparent for both patients and physicians. MedWhat is powered by a highly sophisticated machine learning system that offers increasingly accurate responses to user questions based on behaviors that it “learns” by interacting with human beings. Also, it acts as a repository of a vast source of medical journals and medical advice.

Roof Ai

Roof Ai is a chatbot that helps real-estate marketers to “automate interacting with potential leads and lead assignment via social media”. The bot identifies potential leads via social media and responds immediately, irrespective of the time of the day. “Based on user input, Roof Ai prompts potential leads to provide a little more information, before automatically assigning the lead to a sales agent.”

To learn more about machine learning powered technology, follow DexLab Analytics. DexLab Analytics is a premiere institute for Machine Learning training in Gurgaon.

The Impact of latitude on The Spread of COVID-19 (Part-I)

Posted on April 29, 2020May 4, 2020 by Dexlab

The COVID-19 pandemic has hit us hard as a people and forced us to bow down to the vagaries of nature. As of April 29, 2020, the number of persons infected stands at 31,39,523 while the number of persons dead stands at 2,18,024 globally.

This essay is on the phenomenon of detecting geographical variations in the mortality rate of the COVID-19 epidemic. This essay explores a specific range of latitudes along which a rapid spread of the infection has been detected with the help of data sets on Kaggle. The findings are Dexlab Analytics’ own. Dexlab Analytics is a premiere institute that trains professionals in python for data analysis.

For the code sheet and data used in this study, click below.

The instructor has imported all Python libraries and the visualisation of data hosted on Kaggle has been done through a heat map. The data is listed on the basis of country codes and their latitudes and there is a separate data set based on the figures from the USA alone.

Fig. 1.

The instructor has compared data from amongst the countries in one scenario and among states in the USA in another scenario. Data has been prepared and structured under these two heads.

Fig. 2.

The instructor has prepared the data according to the mortality rate of each country and it is updated to the very day of working on the data, i.e. the latest updated figures are presented in the study. When the instructor runs the program, a heat map is produced.

For more on this, do go through the half-an-hour long program video attached herewith. The rest of the essay will be featured in subsequent parts of this series of articles.

A Guide to Free Ebooks on Statistics and Machine Learning

Posted on April 24, 2020April 24, 2020 by Dexlab

Machine Learning is an acquired knowledge science. It has to be taught and studied. For this, it is imperative to have the best books on the subject at hand. However, most books on the subject are expensive and not easily accessible. This is only fair given the amount of hard word that goes into writing these books.

In a situation this critical, it is best to rely on the good old Internet for assistance. There are some good Samaritans who have chosen to make their works freely available to all. Here is a great guide to free ebooks available online so you can brush up on your concepts and be industry ready at the earliest.

Think Stats – Probability and Statistics for Programmers by Allen B Downey

For the free ebook click here http://www.greenteapress.com/thinkstats.

This is an introduction to statistics and probability for those who have a basic grounding in Python programming. “It’s based on a Python library for probability distributions (PMFs and CDFs). To make things easier for the reader, most of the exercises have short programs,” says a report.

Bayesian Reasoning and Machine Learning by David Barber

For the free ebook click here http://web4.cs.ucl.ac.uk/staff/D.Barber/textbook/091117.pdf.

When it comes to Bayesian statistics, this book is a classic. “This takes a Bayesian statistics approach to machine learning.”This is a book worth checking out for anyone getting into the machine learning field and trying to make a career out of the subject.

An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

For the free ebook click here http://faculty.marshall.usc.edu/gareth-james.

This popular entry is an introduction to data science through machine learning. “This book gives clear guidance on how to implement statistical and machine learning methods for newcomers to this field. It’s filled with practical real-world examples of where and how algorithms work. For those with an inclination towards R programming, this book even has practical examples in R.”

Understanding Machine Learning by ShaiShalev-Shwartz and Shai Ben-David

For the free ebook click here https://www.cse.huji.ac.il/~shais/UnderstandingMachineLearning/index.html.

“This book gives a structured introduction to machine learning. It looks at the fundamental theories of machine learning and the mathematical derivations that transform these concepts into practical algorithms. Following that, it covers a list of ML algorithms, including…stochastic gradient descent, neural networks, and structured output learning.”

A Programmer’s Guide to Data Mining by Ron Zacharski

For the free ebook click here http://guidetodatamining.com.

This book has chapters covering recommendation systems. “It takes a…visually entertaining look at social filtering and item-based filtering methods and how to use machine learning to implement them. Other concepts like Naive Bayes and Clustering are also covered. There is a chapter on Unstructured Text and how to deal with it, in case you are thinking about getting into Natural Language Processing. Examples in Python are also available in case you want to practice.”

For more on Machine Learning do peruse the DexLab Analytics website. DexLab Analytics is a premiere institute offering Machine Learning courses in Delhi.

Call us to know more

Gurgaon

Kolkata