 Python certification Archives - DexLab Analytics | Big Data Hadoop SAS R Analytics Predictive Modeling & Excel VBA

## Statistical Application in R & Python: Normal Probability Distribution Gauss, the famous French Mathematician is responsible for developing one of the most significant distributions in all of statistics, i.e. – The Normal Distribution. Please refer to the blog on Central Limit Theorem: www.dexlabanalytics.com/blog/the-almighty-central-limit-theorem. It will help you fully grasp the significance of the Normal Distribution. However, if you want to revisit our series of blogs by following it from the start, you can reach STATISTICAL APPLICATION IN R & PYTHON: CHAPTER 1 – MEASURE OF CENTRAL TENDENCY right now!

Essentially, the Normal Distribution provides “approximations” to most other distributions such as the Binomial, Poisson, Gamma, Exponential, etc. This is to say as sample sizes get statistically large enough, most distributions approximate into a normal shaped curve.

Every distribution has important features known as its “parameters”. Normal distribution has two parameters. These are Mean ( ) and Variance (σ²). The normal distribution has a bell-shaped curve, where the probability of likelihood peaks at its mean in the middle.

The Normal Distribution has vast practical applications in the field of Business, Finance, Medicine, and Physics and so on. Things like weights, heights, IQ scores follow the Normal Distribution.

Normal Distribution, Gaussian distribution, is a continuous probability distribution and is defined by the Probability Density Function (PDF). Where,  #### Application:

Assume that the credit score fits a Normal Distribution.

Suppose Mr. Arjun’s last 10 month’s credit score are:

789, 635, 739, 687, 724, 810, 817, 735, 819, 820

What is the probability that the percentage of credit score will 825 or more in the 11th month?

 Months Credit Score January 789 February 635 March 739 April 687 May 724 June 810 July 817 August 735 September 819 October 820 #### Calculating Normal Distribution in R: If we go to calculate Normal Probability Distribution in R, we can predict that the probability of the 11th month credit score will be 825 or greater than that is 14.60%, whereas in another case, the probability of the 11th month credit score will be 825 or less than that is 85.40%.

#### Calculate Normal Distribution in Python:

Make a data frame of the data and calculate Mean and Standard Deviation for calculate Normal Distribution. Now, we can easily calculate Normal Distribution in Python So, in calculating the Normal Probability Distribution in Python, we can predict that the probability of the 11th month credit score will be 825 or greater than that is 14.60%, whereas in another case, the probability of the 11th month credit score will be 825 or less than that is 85.40%.

#### Conclusion:

Normal Distribution is used for calculating parameters. It is represented by the bell curve, where the total area of the curve is 1. Normal Distribution has its use in Finance, Business, Salaries, Blood Pressures, Measurement etc and many other fields.

Here, we have used Normal Distribution to predict Mr. Arjun’s 11th month credit score, and set the target (825). By Normal Distribution we can predict the percentage of possibility to achieve the target.

Calculating Binomial Distribution might be tricky for many but with Dexlab Analytics it won’t be hassle anymore. So, get hold of our STATISTICAL APPLICATION IN R AND PYTHON: CALCULATING BINOMIAL DISTRIBUTION blog, to get around all your problems.

## Statistical Application in R and Python: Calculating Binomial Distribution In this blog, we will take a look at the Binomial distribution. This blog is among the series of blogs through which you’ll have a vivid idea of the Statistical Application using R and Python. Statistical Application In R & Python: Chapter 1 – Measure Of Central Tendency is the first of such blogs.

The binomial distribution is an extension of the Bernoulli distribution. In Bernoulli, we have only one parameter, i.e. the probability of success.

Now, consider a case where we have “n” number of trials and we want to predict the probability of success from it. This is the Binomial case.

Binomial distribution has two parameters, i.e.: number of trails (n) AND probability of success (p). The mean of the binomial is a product of its two parameters, i.e. n multiplied by p. It is a discrete probability distribution. Here, each trial is assumed to have only two outcomes, either success or failure.

If X be a discrete random variable (taking only non-negative values), it is said to be following binomial distributions with a probability mass function as:-  #### Application:

A food shop starts a offer for a festive season, They have 12 different baskets, each basket has 5 combos and only 1 of them is non-veg. Find the probability of having 4 or less non-veg combos, if a consumer tries every combos at random.

Since, only 1 out of 5 combos is non-veg, the probability of choose a non-veg combos by random is 1/5 = 0.2 #### Calculate Binomial Distribution in R: In R the probability of one non-veg combos choose by random in 5 is 13.28%, whereas the probability of four or less combos choose by random in a twelve baskets is 92.44%

#### Calculate Binomial Distribution in Python: In Python the probability of one non-veg combos choose by random in 5 is 16.66%.

#### Conclusion:-

Binomial Distribution is the process by which we can calculate the probability of success from “n” number of trails. In Binomial Distribution we can find only two outcomes like “Yes” or “No”.

Dexlab Analytics is a pioneering institute of Data Science, with peerless trainers to help you ease your journey with Python Certification, R Programming Certification and Big Data Certification along with numerous other advanced and/or career oriented courses in Computer Science.

## How to Parse Data with Python Before we begin with our Python tutorial on how to parse data with Python, we would like you to download this machine learning data file, and then get set to learn how to parse data.

The data set we have provided in the above link, mimics exactly the way the data was when we visited the web pages at that point of time, but the interesting thing about this is we need not visit the page even. We actually have the full HTML source code, so it is just like parsing the website without the annoying bandwidth use.

Now, the first thing to do when we start is to correspond the date to our data, and then we will pull the actual data.

Here is how we start:

```import pandas as pd
import os
import time
from datetime import datetime

path = "X:/Backups/intraQuarter"```

Looking for a Machine Learning course online? We have Big Data courses that will bring big dreams to reality.

As given above, we are importing the Pandas for the Pandas module, OS, that is so we can interact with the directories, date and time for managing the date and time information.

Furthermore, we will finally define the path, which is the path to the intraQuarter folder than one will need to unzip the original zip file, which you just downloaded from the website.

```def Key_Stats(gather="Total Debt/Equity (mrq)"):
statspath = path+'/_KeyStats'
stock_list = [x for x in os.walk(statspath)]
#print(stock_list)```

We began our functions, with the specification that we are going to try to collect all the Debt/equity values.

The path to the stats directory is Statspath.

To list all the contents in the directory, you can use stock_list which is a fast one-liner for the loop that uses os.walk.

Then the next step is to do this:

```    for each_dir in stock_list[1:]:
each_file = os.listdir(each_dir)
if len(each_file) > 0:```

Mentioned above is a cycling through of directory (which is every stock ticker). Then the next step is to list “each_file”, which is each file within that very stock’s directory. If in case the length of each_file which is in fact is a list of all of the files in the stock’s directory, is greater than 0 only then will we want to proceed. However, there are some stocks with no files or data:

```            for file in each_file:

date_stamp = datetime.strptime(file, '%Y%m%d%H%M%S.html')
unix_time = time.mktime(date_stamp.timetuple())
print(date_stamp, unix_time)
#time.sleep(15)

Key_Stats()```

Finally, at the end, we must run a loop that pulls the date_stamp, from each file. All our files are actually stored under their ticket, with a file name for the exact date and time from which the information is being taken out.

It is from there that we will explain to date-time what the format for our date stamp is, and then we will convert it to a Unix time stamp.

To know more about data parsing or anything else in python, learn Machine Learning Using Python with the experts at DexLab Analytics.

This post originally appeared onpythonprogramming.net/parsing-data-website-machine-learning

.

+91 931 572 5902