In our previous blog we studied about the basic concepts of Linear Regression and its assumptions and let’s practically try to understand how it works.
Given below is a dataset for which we will try to generate a linear function i.e.
y=b0+b1Xi
Where,
y= Dependent variable
Xi= Independent variable
b0 = Intercept (coefficient)
b1 = Slope (coefficient)
To find out beta (b0& b1) coefficients we use the following formula:-
Let’s start the calculation stepwise.
First let’s find the mean of x and y and then find out the difference between the mean values and the Xi and Yie. (x-x ̅ ) and (y-y ̅ ).
Now calculate the value of (x-x ̅ )2 and (y-y ̅ )2. The variation is squared to remove the negative signs otherwise the summation of the column will be 0.
Next we need to see how income and consumption simultaneously variate i.e. (x-x ̅ )* (y-y ̅ )
Now all there is left is to use the above calculated values in the formula:-
As we have the value of beta coefficients we will be able to find the y ̂(dependent variable) value.
We need to now find the difference between the predicted y ̂ and observed y which is also called the error term or the error.
To remove the negative sign lets square the residual.
What is R2 and adjusted R2 ?
R2 also known as goodness of fit is the ratio of the difference between observed y and predicted and the observed y and the mean value of y.
Hopefully, now you have understood how to solve a Linear Regression problem and would apply what you have learned in this blog. You can also follow the video tutorial attached down the blog. You can expect more such informative posts if you keep on following the DexLab Analytics blog. DexLab Analytics provides data Science certification courses in gurgaon.
Today’s blog explores another vital statistical concept Linear Regression, let’s begin. Linear regression is normally used in statistics for predictive modeling. It tries to model a relationship between two independent (explanatory variable) and dependent (explained variable) variables X and Y by fitting a linear equation (Y=bo+b1X+Ui) to an observed data.
Assumptions of linear regression
Ui is a random real variable, where Ui is the difference between the observed dependent variable Y and predicted Y variable.
The mean of Ui in any particular period is zero.
The variance of Ui is constant in each period i.e for all values of X, Ui will show the same dispersion around their mean
The variable Ui has a normal distribution i.e the value of Ui (for each Xi) have a bell shaped symmetrical distribution about their zero mean.
The random terms of different observations are independent i.e the covariance of any Ui with any other Uj is equal to zero.
Ui is independent of the explanatory variable X.
Xi are a set of fixed values in the hypothesised process of repeated sampling which underlies the linear regression model.
In case there are more than one explanatory variables then they are not perfectly linearly correlated.
Linear Regression equation can be written as:
Where,
is the dependent variable
X is the independent variable.
b0 is the intercept (where the line crosses the vertical y-axis)
b1 is the slope
Ui is the error term (difference between ) also called residual or white noise.
Simple linear regression follows the properties of Ordinary Least Square (OLS) which are as follows:-
Unbiased estimator:- E()=b ie. an estimator is unbiased if its bias is 0; E() – b = 0
Minimum Variance:- An estimate is best when it has the smallest variance as compared to any other estimate obtained from other econometric method.
Efficient estimator:- When it has both the previous properties ie.
Linear estimator
Best, Linear, Unbiased estimator (BLUE)
Minimum mean squared error (MSE) estimator:- It is a combination of the unbiasedness and minimum variance properties. An estimator is a minimum MSE estimator if it has the smallest mean square error.
With that the discussion on Linear Regression wraps up here, hopefully it cleared away any confusion you might have and helped you get a grasp on the concept. We have a video discussion on this same topic, which is attached below this blog, check it out for further reference.
Internet of Things or IOT devices are a rage now, as these devices staying connected to the internet can procure data and exchange the same using the sensors embedded in those. Now the data which is being generated in copious amount needs to be processed and in comes IoT Analytics. This platform basically is concerned with analyzing the large amount of data generated by the devices. The interconnectivity of devices is helping different sectors be in sync with the world, and the timely extraction of data is of utmost significance now as it delivers actionable insights. This is a highly skilled job responsibility that could only be handled by professionals having done artificial intelligence course in delhi.
This particular domain is in the nascent stage and it is still growing, however, it is needless to point out that IoT analytics holds the clue to business success, as it enables the organizations to not only extract information from heterogeneous data but also helps in data integration. With the IoT devices generating almost 5 quintillion bytes of data, it is high time the organizations start investing in developing IoT analytics platform and building a data expert team comprising individuals having a background in Machine Learning Using Python. Now let’s have a look at the ways IoT analytics can boost business growth.
Optimized automated work environment
IoT analytics can optimize the automated work environment, especially the manufacturing companies can keep track of procedures without involving human employees and thereby lessening the chances of error and enhancing the accuracy of predicting machine failure, with the sensors monitoring the equipments and tracing every single issue in real-time and sending alerts to make way for predictive maintenance. The production flow goes on smoothly as a result without developing any glitch.
Increasing productivity
In an organization gauging the activity of the employees assumes huge significance as it directly impacts the productivity of the company, with sensors being strategically placed to monitor employee activity, performance, moods and other data points, this job gets easier. The data later gets analyzed to give the management valuable clues that enable them to make necessary modifications in policies.
Bettering customer experience
Regardless of the nature of your business, you would want to make sure that your customers derive utmost satisfaction. With IoT data analytics in place you are able to trace their preferences thanks to the data streaming from devices where they have already left a digital footprint of their shopping as well as searching patterns. This in turn enables you to offer tailor-made service or products. Monitoring of customer behavior could lead to devising marketing strategies that are information based.
Staying ahead by predicting trends
One of the crucial aspects of IoT analytics is its ability to predict future trends. As the smart sensors keep tracking data regarding customer behavior, product performance, it becomes easier for businesses to analyze future demands and also the way trends will change to make way for emerging ones and it enables the businesses to be ready. Having access to a future estimate prepares not just businesses but industries be future ready.
Smarter resource management
Efficient utilization of resources is crucial to any business, and IoT analytics can help in a big way by making predictions on the basis of real-time data. It allows companies to measure their current resource allocation plan and make adjustments to make optimal usage of the available resources and channelizing that in the right direction. It also aids in disaster planning.
Ever since we went digital the streaming of large quantity of data has become a reality and this is going to continue in the coming decades. Since, most of the data generated this way is unstructured there needs to be cutting edge platforms like IoT analytics available to manage the data and processing it to enable industries make informed decisions. Accessing Data Science training, would help individuals planning on making a career in this field.
In this blog we will be introducing you to the Gaussian Distribution/Normal Distribution. Knowledge of the distribution of your data is quite important as it tells you the trend your data follows and a continuous observation of the trend helps you predict the future observations more accurately.
One of the most important distribution in statistics is Gaussian Distribution also known as Normal Distribution follows, that the mean, median and mode of the data are equal or almost equal. The idea behind this is that the data you collect should not have a very high standard deviation.
How to generate a normally distributed data in Python
First we will import all the necessary libraries
Now we will use .normal() method from Numpy library to generate the data where 50 is the mean, .1 is the deviation and 500 is the number of observations to be generated.
To plot the data and have a look at the data distribution we will be using .distplot() method from the Seaborn library and to make our plot visually better we will be using .set_style() method to change the background of our graph.
In the above line of codes we are also using Matplotlib library to add axis labels and title to the graph. We are also adding an argument fontsize to adjust the size of the font.
The above graph is a bell shaped curve with the peak of the curve in the center of the graph. This is one of the most important assumption of the Gaussian distribution on that the curve is symmetric at the center, some of the other assumptions are:-
Assumptions of the Gaussian distribution:
The mean, median and mode are equal.
Exactly half of the values are to the left of the center and exactly half of the values are to the right.
The total area under the curve is 1.
It has a continuous probability distribution.
68% of the data is -1 to 1 standard deviation away from the mean.
95% of the data is -2 to 2 standard deviation away from the mean.
99.7% of the data is -3 to 3 standard deviation away from the mean.
The last three assumptions can be proven with the help of standard normal distribution.
What is standard normal distribution?
Standard normal distribution also known as Z-score is a special case of normal distribution where we convert the normally distributed data into data deviations. The mean of such a distribution is 0 and the standard deviation is 1.
Let’s see how we can achieve the standard normal distribution in Python.
We will be using the same normally distributed data as above.
First we will be calculating the mean and standard deviation of the data we created with the help of the above code by using .mean() and .std() method.
Now to calculate the Z-score we will first make an empty list and then append the calculated values one by one in that list with the help of a for-loop.
As you can see in the above code we are first subtracting the value from the mean and then dividing it by the standard deviation.
Now let’s see how the calculated data visually looks like.
When we look at the above graph we can clearly see that the data is by max 3 standard deviations away from the mean.
For further explanation check out the video attached down the blog.
So, with this we come to the end of today’s discussion on Gaussian distribution, hopefully, you found this explanation helpful. For such informative posts keep an eye on the Dexlab Analytics blog. Dexlab Analytics is a premier institute that offers cutting edge courses such as credit risk analysis course in Delhi.
When we take a look at a video or, a bunch of images we know what’s what just by taking one look, it is our innate ability that gradually developed. Well, sophisticated technologies such as object detection can do that too. It might sound futuristic but it is happening now in reality. Object detection is a technique of the AI subset computer vision that is concerned with identifying objects and defining those by placing into distinct categories such as humans, cars, animals etc.
It combines machine learning and deep learning to enable machines to identify different objects. However, image recognition and object detection these terms are often used interchangeably but, both techniques are different. Object detection could detect multiple objects in an image or, in a video. The demand for trained experts in this field is pretty high and having a background in deep learning for computer vision with python can help one build a dream career.
Object detection has found applications across industries. Let’s take a look at some of these applications.
Tracking objects
It is needless to point out that in the field of security and surveillance object detection would play an even more important role. With object tracking it would be easier to track a person in a video. Object tracking could also be used in tracking the motion of a ball during a match. In the field of traffic monitoring too object tracking plays a crucial role.
Counting the crowd
Crowd counting or people counting is another significant application of object detection. During a big festival, or, in a crowded mall this application comes in handy as it helps in dissecting the crowd and measure different groups.
Self-driving cars
Another unique application of object detection technique is definitely self-driving cars. A self-driving car can only navigate through a street safely if it could detect all the objects such as people, other cars, road signs on the road, in order to decide what action to take.
Detecting a vehicle
In a road full of speeding vehicles object detection can help in a big way by tracking a particular vehicle and even its number plate. So, if a car gets into an accident or, breaks traffic rules then it is easier to detect that particular car using object detection model and thereby decreasing the rate of crime while enhancing security.
Detecting anomaly
Another useful application of object detection is definitely spotting an anomaly and it has industry specific usages. For instance, in the field of agriculture object detection helps in identifying infected crops and thereby helps the farmers take measures accordingly. It could also help identify skin problems in healthcare. In the manufacturing industry the object detection technique can help in detecting problematic parts really fast and thereby allow the company to take the right step.
Object detection technology has the potential to transform our world in multiple ways. However, the models still need to be developed further so that these can be applied across devices and platforms in real-time to offer cutting-edge solutions. Pursuing a Python Certification course can help develop the required skills needed for making a career in the field of machine learning.
This is the 3rd part of the ongoing Regex or, regular expression in Python series where we are discussing how to handle textual data. In the second part we introduced you to the re library and in this third segment, we are going to be discussing how to substitute characters or, words with re library.
Re library has a wide range of methods to deal with textual data, one such method is .sub() which helps us substitute alphabets or words based on the patterns we build. This method can be used with .match() method and .search() method, both having differences in the way they extract a pattern.
Difference between .match() & .search()
.match() :- This method extracts the required text only at the begging.
.search() :-This can extract the required string from the entire text but only at the first occurrence
In the above code you can see that even though the word “Hello” is in the middle of the text we are able to fetch it because of the special attribute of the .search() method. Here we are again using “Hello dexlab…!” as an example and .compile() is being used to create and apply our pattern.
Now suppose if we want to substitute the word “Hello” with another string “Hi” we will have to use .sub() method from the re library. But there are ways to use this method directly or indirectly.
First let’s see the direct method.
In the above line of code we first mention what we want to substitute and with what and then we add the text.
Second way to do this is by first using .compile() method to build the pattern and then use that pattern to substitute the alphabet or word.
The above pattern in the .complie() method states that there is a word with the first alphabet in uppercase combined with lowercase alphabets to be substituted with the word “Hi”. This pattern can match any string with the same characteristics, for example:-
Look at the text used in the .sub() method, now instead of “Hello” we have “Pello” with the same characteristics substituted with the word “Hi”. But one must not forget this pattern can also be used in the .sub() method directly and the use of .compile() method is optional. .compile() method is used only to create an object based code.
So, this wraps up the discussion on how to substitute characters or, words with re library. Hopefully, you found this blog informative, if you wish to find more Python Certification course topics, keep following the Dexlab Analytics blog.
This is the second part of the Regex series, where we will be continuing on by introducing you to another library in python, the re library. The previous blog introduced you to the Regex library in Python, and you learned about meta-characters and literals which could be used for creating patterns. This particular segment is about introducing the re library to help you with textual data analysis.
Re library in python holds the key to deal with all the problems relating to textual data analysis. This library provides a range of methods that can help you build patterns and extract or substitute the desired string. For example, suppose you want to change all the negative words to positive in a novel, for that all you need to have is a soft copy of the same and then you can import the re library and use its predefined methods first to make a pattern to extract the words and then substitute it to make the required changes.
Here one such method which we are going to use today from the re library is .compile() combined with .match() method to build and extract the pattern with the help of literals and meta-characters explained in the previous blog.
.compile() and .match() Methods
.complie() is used to build the pattern. You can use the meta-characters and literals within the parenthesis to build the pattern of the word which you want to extract or change. This is practically the first step without which not much can be done in re library. But why do we need a pattern and what is it that makes it necessary? To answer this question I am going to use few steps:-
First thing to do is to observe the word or the string and see what is it that makes that specific word or words which you want to extract different from the rest of the text i.e. is the word a combination of digit or alphabet or special character, is there a special character before and after the word etc.
Now recall all the meta-characters we have studied so far.
Combine them with the .complie() method which basically helps us bring the meta-characters together.
Now all there is left for us to do is apply the pattern on the text.
Therefore knowledge of meta-characters is necessary to form patterns to manipulate your textual data.
Now let’s see how to import re library and practically solve few of the Regex questions.
Use the above code to import the Regex library.
Question 1: How to make a pattern to extract the entire string “Hello dexlab…!!”?
In the above code we are using a combination of. (period)and * where
. (period) means match alpha-numeric or special character
* means match anything zero or more times
When combined together this means that you can match alpha-numeric or special characters zero or more times. Therefore the end result is:-
.match()method has a special property that it can match anything only at the beginning of the line. Suppose I want to extract “Hello” from a string, .match()method will only work when word is at the 0 index.
Question: How to extract and match only special characters?
The above code is matching only the space which is at the 0th index but not all the simultaneous special characters. To make a pattern that can match all the special characters we can use *
You can try ? and + to check the difference it makes on the output.
Question: How to extract numbers and special characters?
Now you must be wondering why the above code did not recognize the special characters after the numbers like @ and space. Here you must remember that the output you get is based on the pattern you make. In the above code we mentioned nothing that matches anything after the numbers. So we further need to expand our line of codes.
Question: How to extract only the output?
You can use the slice operator [] to extract only the text by using 0 index.
You must have picked up the fundamentals of the re library from the blog, watch the video attached below to follow the tutorial step by step. Follow the Regex series to gain expertise in textual data handling. Dexlab Analytics blog has more interesting and informative posts on Python Programming training.
Python is a versatile programming language and it has a rich library. In the visualization series we introduced you to different libraries used for data visualization purposes. Now, we introduce you to the Regex library in Python for handling textual data.
In Python to perform pattern recognition on textual data Regex is a library that provides a range of methods which when used with right pattern gives us the desired results. For example, if you want to change the spelling of colour to color in your text you can easily do so with the help of a given method provided that you form the pattern correctly.
Type of textual data in Regex
Literals:- In Python literals are the characters or words with their original meaning intact like the word dog means a literal dog and there is no hidden meaning behind that word.
Meta-characters:- These are the words or characters which hold special meaning for example \n means a new line or \t means tab separated values.
Given below are few of the meta-characters used in python with their meanings:-
\w – Matches alpha-numeric characters i.e. \w= 1, \w= a, \w\w= a1
\W– Matches special characters i.e. \W= %
Dog[ogn]– Matches a single character within the square bracketsi.e. Dogo, Dogg, Dogn
Dog(ogn) – Matches the entire string within the parenthesisi.e. Dogogn
Dog(ogn|aaa)– Matches either ogn or aaa i.e. Dogogn or Dogaaa
*– Matches 0 or more characters i.e. tre* = tree, tre*= tr, tre*= treeeeee
?– Matches 0 or 1 character i.e. colou?r= color, colou?r= colour
+ – Matches 1 or more character i.e. tre+= tree, tre+= treee, tre+≠tre
. – Matches alpha-numeric or special characters but only one time i.e. tre.= tree, tre.= tre#, tre.=tre1, tre.≠tre#1
The above meta-characters alone or in combination are used to form a pattern which then are used for text mining for example tre.* means match anything 0 or more times that means now we can match tre#1 or tre.
Watch the video tutorial attached below to learn more about the fundamentals of this library.
Hopefully you found the discussion on Regex library helpful and at the end of it you must have become familiar with the way this particular library works. To learn more about python for data analysis, keep on exploring Dexlab Analytics blog, where you will always find informative posts.
In our fifth installment of the visualization series using Python programming language, we introduce you to another powerful library in Python that is the Pandas_bokeh library. So, let’s find out what you can achieve with Pandas_bokeh library.
Pandas_bokeh is a library which can help you create interactive graphs in python. One can zoom in, zoom out, select a certain portion of the graph to see, move the plot left, right and center, create tabs in case they want to see a single plot at a time, create multiple plots at a time, create widgets like dropdown list, check boxes, radio buttons, slider etc. It is similar to the shiny app which is used in the r programming language but simpler and faster.
How to install pandas_bokeh?
In the above code we are changing our jupyter notebook code cell into a command line by using ! and then we can use pip (python installation package) to install the library.
How to create a simple line plot using pandas_bokeh ?
The first thing to do is import the libraries which we will be using to create a line plot.
We will be creating our own dataset here and for that we need to import Numpy and Pandas libraries. Also we will be importing .figure() method from plotting module to create our canvas on which we will be building our graph from the scratch and we will also be importing .output_notebook() method to visualize our graph on jupyter notebook and to visualize our graph on a new tab and save at the same time we can use .output_file() method.
The Dataset we are creating here will have three columns ‘Days’, ‘Sales’ and ‘Date’.
We will be creating a dataset with hundred observations in each column so for that we are using .rand() method to generate hundred random numbers and that will be our ‘Sales’ column data. Now for our ‘Days’ column we will be creating a loop which will run hundred times and each time an array index value will be saved in a variable c which has an empty string and .split() method is then used to create a list of that string.
For creating a ‘Date’ column we will be using the following code
At last create a data frame we will be using .DataFrame() method.
Now to create two line graphs on a single canvas we will be using object-oriented programming.
To build the graph on the jupyter notebook we are using .output_notebook() method and in case you want to plot the graph on a new tab you can use .output_file(“filename.html”) method.
In the above line of codes we are creating two separate data frames df_d and df_d1 each containing Monday and Friday’s sales and dates separately now all we need to do is build a canvas using .figure() method and use few other arguments like x_axis_type to define the data type of the x axis and x_axis_label and y_axis_label to set graph labels, to adjust width and height of the canvas we have used plot_width and plot_height argument and to set title and title location we have used title and title_location. Once we have our canvas ready we can use .line() method and add x axis and y axis data to plot our graphs.
To interact with your graph you can use the icons on the right hand side corner which will help you zoom in and out, look at a certain part of the graph, scroll to zoom in and out, save your plot and reset the changes made by you using the side icons.
The video tutorial attached below will help you gain better understanding.
At the end of this segment you must have become familiar with the nuances of the Pandas_bokeh library. As you continue on with the series, you will realize that you are becoming an expert in visualization. On Dexlab Analytics blog, you will find interesting blogs on various topics related to Python certification training.