Software tools : SAS, R, Python etc Archives - Page 2 of 5 - DexLab Analytics | Big Data Hadoop SAS R Analytics Predictive Modeling & Excel VBA

How to Structure Python Programs? An Extensive Guide

How to Structure Python Programs? An Extensive Guide

Python is an extremely readable and versatile high-level programming language. It supports both Object-oriented programming as well as Functional programming. It is generally referred to as an interpreted language which means that each line of code is executed one by one and if the interpreter finds an error it stops proceeding further and gives an error message to the user. This makes Python a widely regarded language, fueling Machine Learning Using Python, Text Mining with Python course and more. Furthermore, with such a high-end programming language, Python for data analysis looks ahead for a bright future.

Data Science Machine Learning Certification

In the Structure of Python

Computer languages have a structure just like human languages. Therefore, even in Python, we have comments, variables, literals, operators, delimiters, and keywords.

To understand the program structure of Python we will look at the following in this article: –

  1. Python Statement
    • Simple Statement
    • Compound Statement
  2. Multiple Statements Per Line
  3. Line Continuation
    • Implicit Line Continuation
    • Explicit Line Continuation
  4. Comments
  5. Whitespace
  6. Indentation
  7. Conclusion

Python Statement

A statement in Python is a logical instruction that the interpreter reads and executes. The interpreter executes statements sequentially, one by one. In Python, it could be an assignment statement or an expression. The statements are mostly written in such a style so that each statement occupies a single line.

Simple Statements

A simple statement is one that contains no other statements. Therefore, it lies entirely within a logical line. An assignment is a simple statement that assigns values to variables, unlike in some other languages; an assignment in Python is a statement and can never be part of an expression.

Compound Statement

A compound statement contains one or more other statements and controls their execution. A compound statement has one or more clauses, aligned at the same indentation. Each clause has a header starting with a keyword and ending with a colon (:), followed by a body, which is a sequence of one or more statements. When the body contains multiple statements, also known as blocks, these statements should be placed on separate logical lines after the header line, indented four spaces rightward.

Multiple Statements per Line

Although it is not considered good practice multiple statements can be written in a single line in Python. It is advisable to avoid multiple statements in a single line. But, if it is necessary, then it can be written with the help of semicolon (;) as the terminator of every statement.

Line Continuation

In Python there might be some cases when a single statement is too long that does not fit the browser window and one needs to scroll the screen left or right. This can be a case of assignment statement with many terms or defining a lengthy nested list. These long statements of code are generally considered a poor practice.

To maintain readability, it is advisable to split the long statement into parts across several lines. In Python code, a statement can be continued from one line to the next in two different ways: implicit and explicit line continuation.

Implicit Line Continuation

This is the more straightforward technique for line continuation. In implicit line continuation, one can split a statement using either of parentheses ( ), brackets [ ] and braces { }. Here, one needs to enclose the target statement using the mentioned construct.

Explicit Line Continuation

In cases where implicit line continuation is not readily available or practicable, there is another option. This is referred to as an explicit line continuation or explicit line joining. Here, one can right away use the line continuation character (\) to split a statement into multiple lines.

Comments

A comment is text that doesn’t affect the outcome of a code; it is just a piece of text to let someone know what you have done in a program or what is being done in a block of code. This is especially helpful when a code is written and someone is analyzing it for bug fixing or making a change in logic, by reading a comment one can understand the purpose of code much faster than by just going through the actual code.

There are two types of comments in Python.
1. Single line comment
2. Multiple line comment

Single line comment

In python, one can use # special character to start the comment.

Multi-line comment

To have a multi-line comment in Python, one can use Triple Double Quotation at the beginning and the end of the comment.

Whitespace

One can improve the readability of the code with the use of whitespaces. Whitespaces are necessary for separating the keywords from the variables or other keywords. Whitespace is mostly ignored by the Python interpreter.

Indentation

Most of the programming languages provide indentation for better code formatting and don’t enforce to have it. However, in Python, it is mandatory to obey the indentation rules. Typically, we indent each line by four spaces (or by the same amount) in a block of code. Also for creating compound statements, the indentation will be of utmost necessity.

Conclusion

So, this article was all about how to structure the Python program. Here, one can learn what constitutes a valid Python statement and how to use implicit and explicit line continuation to write a statement that spans multiple lines. Furthermore, one can also learn about commenting Python code, and about the use of whitespace and indentation to enhance the overall readability.

Our Machine Learning Certifications have undergone an industrial upgradation

We hope this article was helpful to y ou. If you are interested in similar blogs, stay glued to our website, and keep following all the news and updates from Dexlab Analytics.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Statistical Application in R & Python: EXPONENTIAL DISTRIBUTION

Statistical Application in R & Python: EXPONENTIAL DISTRIBUTIONStatistical Application in R & Python: EXPONENTIAL DISTRIBUTION

In this blog, we will explore the Exponential distribution. We will begin by questioning the “why” behind the exponential distribution instead of just looking at its PDF formula to calculate probabilities. If we can understand the “why” behind every distribution, we will have a head start in figuring out its practical uses in our everyday business situations.

Much could be said about the Exponential distribution. It is an important distribution used quite frequently in data science and analytics. Besides, it is also a continuous distribution with one parameter “λ” (Lambda). Lambda as a parameter in the case of the exponential distribution represents the “rate of something”. Essentially, the exponential distribution is used to model the decay rate of something or “waiting times”.

Data Science Machine Learning Certification

For instance, you might be interested in predicting answers to the below-mentioned situations:

  • The amount of time until the customer finishes browsing and actually purchases something in your store (success).
  • The amount of time until the hardware on AWS EC2 fails (failure).
  • The amount of time you need to wait until the bus arrives (arrival).

In all of the above cases if we can estimate a robust value for the parameter lambda, then we can make the predictions using the probability density function for the distribution given below:

Application:-

Assume that a telemarketer spends on “average” roughly 5 minutes on a call. Imagine they are on a call right now. You are asked to find out the probability that this particular call will last for 3 minutes or less.

 

 

Below we have illustrated how to calculate this probability using Python and R.

Calculate Exponential Distribution in R:

In R we calculate exponential distribution and get the probability of mean call time of the tele-caller will be less than 3 minutes instead of 5 minutes for one call is 45.11%.This is to say that there is a fairly good chance for the call to end before it hits the 3 minute mark.

Calculate Exponential Distribution in Python:

We get the same result using Python.

Conclusion:

We use exponential distribution to predict the amount of waiting time until the next event (i.e., success, failure, arrival, etc).

Here we try to predict that the probability of the mean call time of the telemarketer will be less than 3 minutes instead of 5 minutes for one call, with the help of Exponential Distribution. Similarly, the exponential distribution is of particular relevance when faced with business problems that involve the continuous rate of decay of something. For instance, when attempting to model the rate with which the batteries will run out. 

Data Science & Machine Learning Certification

Hopefully, this blog has enabled you to gather a better understanding of the exponential distribution. For more such interesting blogs and useful insights into the technologies of the age, check out the best Analytics Training institute Gurgaon, with extensive Data Science Courses in Gurgaon and Data analyst course in Delhi NCR.

Lastly, let us know your opinions about this blog through your comments below and we will meet you with another blog in our series on data science blogs soon.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

An All-Inclusive Guide on Python and its Changing Trends

An All-Inclusive Guide on Python and its Changing Trends

Python is an extremely readable and versatile high-level programming language. Many companies such as Google, YouTube, Dropbox use the language for developing applications. It also finds its use extensively in diverse fields as in Python for data analysis, Machine Learning Using Python, Natural Language Processing, Web Development, Scientific Computing, Image processing, Robotics, Computer Vision and many more.

It supports both Object-oriented programming and Functional programming. Python is generally referred to as an interpreted language which implies that each line of code is executed one by one and if the interpreter finds an error, it stops immediately with an error message on the screen.

Another important feature of Python is its interactive prompt. A Python statement can be typed and immediately executed, which is in sharp contradiction to any other compiled language.

What are Python 2.x and Python 3.x?

There are two main versions of Python: Python 2.x and Python 3.x. If someone is new to Python, then he/she might be in confusion about which version to use. However, in the current scenario, we can easily migrate from Python 2 to Python 3, as the Python Software Foundation has finally taken the step to formally announce that Python 2 will reach the end of life (EOL) on January 1st, 2020.

Key differences between Python 2.x and Python 3.x

This article discusses the differences between these two versions of Python, making Python 3 less confusing for a new programmer.

  1. Print Function

In Python 2, print is a statement. There is no need of parenthesis.

In Python 3, print is a function. It needs parenthesis.

  1. Integer Division

In Python 2, if the division operator is performed on two integers, then the output will be an integer for example: – 7/3 = 2.

In Python 3, if the division operator is performed on two integers, then the output will be accurate. It can also be in float for example: – 7/3 = 2.33.

To get the result in an integer only a different division operator is used that is (//) it returns an integer result for example, – 7//3 = 2.

 3. Unicode Support

Both the versions of Python can handle strings (sequences of characters) differently.

Python 2 uses the ASCII encoding standard by default. ASCII is limited to representing 256 characters. This limits the flexibility of Python to encode the characters, particularly non-standard ones. Using Unicode in Python 2 requires extra syntax—for example when using print, the input text is to be wrapped in the Unicode() function to handle special characters.

In Python 3, Unicode is the default. The Unicode standard is much more versatile—it supports over 128,000 characters. There is no need for an extra syntax to define the Unicode values—they get printed automatically as utf-8 strings.

  1. Range Function

In Python 2, the range function returns a list of numbers.

In Python 2, the xrange class represents an iterable that provides the same object.

 In Python 3, original range function is removed and xrange is renamed to range:

In Python 3, it is needed to convert the range object to a list if someone desires the same result as the range function provides in Python 2.

  1. ­­­­Input() Method

Mainly what is expected from the input() method is that it reads input as string, then it can be converted into any datatype as per the requirement.

In Python 2, it has both the input() and raw_input() methods for taking input. The difference between the raw_input() and input()is that the raw_input() reads input as a string while the input() reads input as string only if it is inside quotes else reads as an integer.

In Python 3, there is no raw_input() method. The raw_input() method is replaced by input() in python 3. 

If someone still wants to use the input() method like in python 2, then it can be availed by using eval() method.

There are many other differences between Python 2 and Python 3 like: –

  1. Next() Method

In Python 2, .next() method is used and in Python 3 next() function is used to iterate the next element of an iterator.

  1. Raising Exception

To raise an exception in Python 3, the argument should be in parenthesis, while in Python 2, it is not necessary.

  1. Handling Exception

Handling exception is also changed in Python 3, “as” keyword is used in Python 3, while it is not necessary in Python 2.

So, if someone is a beginner, then it is strongly recommended to use Python 3 because it is the future of Python and also January 1, 2020, will be the last day of Python 2. It means that no improvement will be done anymore after that day, even if someone finds a security problem in it.

Data Science Machine Learning Certification

It is highly recommended to upgrade the version of the programming language to Python 3. Some ways can help the Python 2 users in porting their code from Python 2 to Python 3 and get the feel of Python 3 and figure out how it is different from Python 2. The code can be imported by using tools like “Futurize” and “Modernize”. Also, if someone wants to check the availability of Python 3 as part of his tests, then “caniusepython3.check()” can be used.

As a final note, everyone must look for upgrading their Python version to Python 3 to understand the subtleties of the new version and usher in the future. However, if you are interested in Deep learning for computer vision with Python and similar courses, then opt for the premium Python training institute in Delhi now!


.

R Vs Python: A Debate Forever

R Vs Python: A Debate Forever

In this blog, we will bring forth the age old question and check which one is better, R programming and Python programming, when it comes to data science?

To be very honest, this question does not have a strict answer to it. However, in this blog we will lay down the key components of both the languages to give you a clearer picture. In the end, please decide for yourself and leave your comments in the section below.

The aim of this blog is to objectively put forward the pros and cons of both languages strictly from the perspective of data science.

We will discuss only about three main components, which are as follows:

  • Syntax
  • Performance
  • Applicability

There are other metrics, such as, trends in Industries and adaptation in the recent years which are beyond the scope of this blog. However, you can safely declare Python as the clear winner if those perspectives were concerned.

So let’s get started:

Syntax

Both R and Python are object-oriented languages. This is to say that everything is created as an object in which the information is mapped with the idea of using that object later in the analysis. However, when it comes to the syntax, i.e., the grammar of programming, R and Python are indeed very different.

R Programming

R programing is more suited to more seasoned coders who have prior experience of coding. The syntax is actually very similar to that of the previous languages, such as C, or C++ or Java and so on. The fundamental rules are that of C programming language. Also, use of semicolons is deemed optional in R. However, semicolons are necessary for multiple lines in a code inside a code block.

Deep Learning and AI using Python

Python

Python on the other hand, is the language more adaptable to the new generation of programmers. You can come from a non-programming background and still learn Python with relative ease.

Python is one of the most user friendly languages for the beginners. The syntax is designed to prioritize readability over preciseness of the code. In layman’s terms – coding in Python is very close to reading and writing with hand. In this regard, it is really popular amongst beginners in Data Science.

Performance

The performance is essentially measured by speed essentially when it comes to programming.

R Programming

As far as the general consensus goes R programming is much slower in terms of speed. The reason behind this is that R programming was initially designed to be used by statisticians for data analysis. Thus, R programming stresses more on precision than the speed.

Python

Python on the other hand, is relatively faster than R. Python offers the same level of precision whilst acting on a faster speed.

Note – The speed is taken into account independent of packages and libraries.

Applicability

Lastly, we will discuss the popular domains in which these languages are used.

2

R Programming

As mentioned above, R was developed specifically for statisticians. For this reason, R is mainly used in various research organizations and academia in general. However, R is now quickly being absorbed in the enterprises as well, mainly because of its popularity and the availability of a large number of packages for statistical computation.

Python

Python is a gene

As Python is a general-purpose programming language we can use to build different kinds of applications. We can use Python to build web applications using popular frameworks like Django or Flask.

Lately, Python is becoming popular amongst data scientists as the language of choice given the simplicity of syntax, high speed and performance it has to offer. There has been a trend which has seen a sharp rise in the adaptability of Python over R in the last few years in Data Science.

So, there you have it folks. Decide for yourself now! We will meet you soon in the next blog.

Dexlab Analytics is a pioneering institute of Data Science and Big Data Analytics with all-inclusive Big data courses in Delhi along with numerous other efficacious courses like Hadoop certification in Delhi, R programming courses in Gurgaon and Python for Data Analysis under experienced trainers and professionals.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Statistical Application in R & Python: Poisson Distribution

Statistical Application in R & Python: Poisson Distribution

Continuing with the series of blogs, the first of which was Statistical Application In R & Python: Normal Probability Distribution, here we bring you a post on how you can calculate Poisson distribution effortless using R & Python. So, stay tuned!

Poisson distribution is a counting process which is a discrete probabilistic model. It has only one parameter, (lambda or “m”) which is essentially the average rate of change. Poisson distribution is used to model “number of anything”. The probability distribution function of a Poisson distribution is given by the below expression.

If m is the mean occurrence per interval, then the probability of having x occurrence with in a given interval is:

Application:

A business firm receives on an average 6.5 telephone calls per day during the time period 11:00 – 11:15 A.M., Find the probability that on a certain day, the firm receives exactly9 calls during the same period.

The random variable x is the ‘number of telephone calls received during the period 11:00 – 11:15 A.M, since x is assumed to Poisson distribution. The parameter m is equal to the mean of the distribution; i.e.  m = 6.5 and x = 9, then the equation is:

Calculate Poisson Distribution in R:

So, while calculating Poisson distribution in R, we notice that the probability of occurring exactly 9 calls instead of average 6.5 calls in a given particular time (11:00 A.M – 11:15 A.M ) = 85.81%

Calculate Poisson Distribution in Python:

So, while we calculate Poisson distribution in Python, we notice that the probability of occurring exactly 9 calls instead of average 6.5 calls in a given particular time (11:00 A.M – 11:15 A.M) = 85.81%

Conclusion:

Companies can use the Poisson distribution to contrive effective steps to improve their operational efficiency. For instance, an analysis done with the Poisson distribution might reveal how a company can arrange staffing in order to be able to handle the peak periods efficiently, when the customer service calls keep on pouring.

In this problem we see that the business firm receives on an average 6.5 telephone calls per day during the time period 11:00A.M – 11:15A.M, then the probability of the firm receives exactly 9 calls in a same is 85.81%.

Dexlab Analytics is the best Python training institute in Delhi, bringing you the all-inclusive courses of Python for Data Analysis and R Predictive Modelling Certification, among others to start your career in Data Science and Analytics.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Statistical Application in R & Python: Normal Probability Distribution

Statistical Application in R & Python: Normal Probability Distribution

Gauss, the famous French Mathematician is responsible for developing one of the most significant distributions in all of statistics, i.e. – The Normal Distribution. Please refer to the blog on Central Limit Theorem: www.dexlabanalytics.com/blog/the-almighty-central-limit-theorem. It will help you fully grasp the significance of the Normal Distribution. However, if you want to revisit our series of blogs by following it from the start, you can reach STATISTICAL APPLICATION IN R & PYTHON: CHAPTER 1 – MEASURE OF CENTRAL TENDENCY right now!

Essentially, the Normal Distribution provides “approximations” to most other distributions such as the Binomial, Poisson, Gamma, Exponential, etc. This is to say as sample sizes get statistically large enough, most distributions approximate into a normal shaped curve.

Every distribution has important features known as its “parameters”. Normal distribution has two parameters. These are Mean ( ) and Variance (σ²). The normal distribution has a bell-shaped curve, where the probability of likelihood peaks at its mean in the middle.

The Normal Distribution has vast practical applications in the field of Business, Finance, Medicine, and Physics and so on. Things like weights, heights, IQ scores follow the Normal Distribution.

Normal Distribution, Gaussian distribution, is a continuous probability distribution and is defined by the Probability Density Function (PDF).

Where,

Application:

Assume that the credit score fits a Normal Distribution.

Suppose Mr. Arjun’s last 10 month’s credit score are:

789, 635, 739, 687, 724, 810, 817, 735, 819, 820

What is the probability that the percentage of credit score will 825 or more in the 11th month?

Months

Credit Score

January

789

February

635

March

739

April

687

May

724

June

810

July

817

August

735

September

819

October

820

 

Calculating Normal Distribution in R:

If we go to calculate Normal Probability Distribution in R, we can predict that the probability of the 11th month credit score will be 825 or greater than that is 14.60%, whereas in another case, the probability of the 11th month credit score will be 825 or less than that is 85.40%.

Calculate Normal Distribution in Python:

Make a data frame of the data and calculate Mean and Standard Deviation for calculate Normal Distribution.

Now, we can easily calculate Normal Distribution in Python

So, in calculating the Normal Probability Distribution in Python, we can predict that the probability of the 11th month credit score will be 825 or greater than that is 14.60%, whereas in another case, the probability of the 11th month credit score will be 825 or less than that is 85.40%.

Conclusion:

Normal Distribution is used for calculating parameters. It is represented by the bell curve, where the total area of the curve is 1. Normal Distribution has its use in Finance, Business, Salaries, Blood Pressures, Measurement etc and many other fields.

Here, we have used Normal Distribution to predict Mr. Arjun’s 11th month credit score, and set the target (825). By Normal Distribution we can predict the percentage of possibility to achieve the target.

Calculating Binomial Distribution might be tricky for many but with Dexlab Analytics it won’t be hassle anymore. So, get hold of our STATISTICAL APPLICATION IN R AND PYTHON: CALCULATING BINOMIAL DISTRIBUTION blog, to get around all your problems.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Python vs. Scala: Which is Better for Data Analytics?

Python vs. Scala: Which is Better for Data Analytics?

Data Science and Analytics seem to be synonymous to progress as far as the field of computer science is concerned. Now, with the rise of these technologies, everything goes down to the programming languages, which single-handedly help in the growth of them. 

This gave rise to Python, now known as the most significant language in the world of technology. Scala is another versatile language which is not unknown to the researchers and tech geeks. These two languages are the most talked about in the industry today. Nevertheless, both of them are extensively used in data analytics and data science. However, the debate regarding which one to opt for among the two has always been constant. But worry no longer because here we will discuss both of them, in brief, to help you with your choice!

Deep Learning and AI using Python

Python

Python is really one of the most popular languages in the industry. The open-source nature of the language makes it a popular choice for scripting and automation works. 

Besides, Python is powerful, effective, and easy to learn. Moreover, Neural Network Machine learning Python boasts of its efficient high-level data structures and for object-oriented programming.

Advantages

  • Easy to learn and effective too.
  • Exhaustive support from active communities.
  • Python enjoys built-in support for the datatypes.

Disadvantages

  • Your computer might slow down a little when you are running Python. This is in contrast to when you are running other languages like C or Java.

Scala

If you want an object-oriented, functional programming language, then Scala would certainly be your first choice. It was basically built for the Java Virtual Machine (JVM) and remains the most compatible programming language with Java code till date.

Advantages

  • Scala can utilise the majority of the JVM libraries, thus helping them to be embedded in the enterprise code.
  • It shares an array of readable syntax features of the popular languages, like Ruby.
  • Scala brags about numerous incredible features like string comparison advancements, pattern matching and its likes.

2

Disadvantages

  • Scala has a limited number of users in the communities, which encourages lesser interactions and stunted growth.
  • At times the type-information in Scala is really complex to comprehend. This difficulty can be attributed to the functional and object-oriented nature of the language.

We hope that this article helps you to have a brief insight into two of the most demanding programming languages: Python and Scala.

Now, if you want to enrol yourself in Computer vision course Python, you can reach us right at Dexlab Analytics, the most reputable institute for Big Data Analytics. Also, if you are looking for all-inclusive Deep learning for computer vision Course, turn no further than our premium institute to shoot your career up!

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Statistical Application in R and Python: Calculating Binomial Distribution

Statistical Application in R and Python: Calculating Binomial Distribution

In this blog, we will take a look at the Binomial distribution. This blog is among the series of blogs through which you’ll have a vivid idea of the Statistical Application using R and Python. Statistical Application In R & Python: Chapter 1 – Measure Of Central Tendency is the first of such blogs.

The binomial distribution is an extension of the Bernoulli distribution. In Bernoulli, we have only one parameter, i.e. the probability of success.

Now, consider a case where we have “n” number of trials and we want to predict the probability of success from it. This is the Binomial case.

Binomial distribution has two parameters, i.e.: number of trails (n) AND probability of success (p). The mean of the binomial is a product of its two parameters, i.e. n multiplied by p. It is a discrete probability distribution. Here, each trial is assumed to have only two outcomes, either success or failure.

If X be a discrete random variable (taking only non-negative values), it is said to be following binomial distributions with a probability mass function as:-


Application:

A food shop starts a offer for a festive season, They have 12 different baskets, each basket has 5 combos and only 1 of them is non-veg. Find the probability of having 4 or less non-veg combos, if a consumer tries every combos at random.

Since, only 1 out of 5 combos is non-veg, the probability of choose a non-veg combos by random is 1/5 = 0.2

Calculate Binomial Distribution in R:

In R the probability of one non-veg combos choose by random in 5 is 13.28%, whereas the probability of four or less combos choose by random in a twelve baskets is 92.44%

Calculate Binomial Distribution in Python:

In Python the probability of one non-veg combos choose by random in 5 is 16.66%.

Conclusion:-

Binomial Distribution is the process by which we can calculate the probability of success from “n” number of trails. In Binomial Distribution we can find only two outcomes like “Yes” or “No”.

Dexlab Analytics is a pioneering institute of Data Science, with peerless trainers to help you ease your journey with Python Certification, R Programming Certification and Big Data Certification along with numerous other advanced and/or career oriented courses in Computer Science.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Statistical Application of R & Python: Know Skewness & Kurtosis and Calculate it Effortlessly

Statistical Application of R & Python: Know Skewness & Kurtosis and Calculate it Effortlessly

This is a blog which shall widen your approach on the Statistical Application using R & Python. You perhaps already have been calculating Geometric Mean using R & Python and are already aware of the Application of Harmonic Mean using R & Python. However, if you are eager to further your knowledge about Skewness & Kurtosis and interested to know of their application using R and Python, then this is the right place.

Skewness:

Skewness is a metric which tells us about the location of my dataset. That is, if you want to know where most of the values are concentrated on an ascending scale.

Skewness is of two kinds: Positive skew and Negative skew. A positively skewed dataset will have most of the values concentrated at the beginning of the scale. Eg: If a woman is asked to rate 100 tinder profiles based on the looks on a scale of 1 – 10, 1 being the ugliest and 10 being the most handsome. Then the resulting ratings will be positively skewed. This is to say that women are harsh critiques of looks.

Now, consider another example: Say if the wealth of the 1% richest people were to be plotted on a scale of say $0 – $200 billion. Then, most of the values will be concentrated at the end of the scale. This will be an example of a negatively skewed dataset.

In essence, skewness is the third central moment about mean and gives us a feel for the location of the data set values. It is recommended to go through STATISTICAL APPLICATION IN R & PYTHON: CHAPTER 1 – MEASURE OF CENTRAL TENDENCY to have an understanding of the Central Tendency and its measures. Having no skewness will mean the data set is fairly symmetrical and has a bell shaped curve.

Where n is the sample size, Xi is the ith X value, X is the average and S is the sample standard deviation.  Note the exponent in the summation.  It is “3”.

Kurtosis:

Kurtosis is a statistical measure that’s used to describe, or Skewness, of observed data around the mean, sometimes referred to as the volatility to volatility. Kurtosis is used generally in the statistical field to describe trends in charts. Kurtosis can be present in a chart with fat tails and a low, even distribution, as well as be present in a chart with skinny tails and a distribution concentrated toward the mean.

Kurtosis for a normal distribution is 3.  Most software packages use the formula:


The types of kurtosis are:-


Application:

A person tries to analyze last 12months interest rate of the investment firm to understand the risk factor for the future investment.

The interest rates are:

12.05%, 13%, 11%, 18%, 10%, 11.5%, 15.08%, 21%, 6%, 8%, 13.2%, 7.5%.

Here is the table:

Months

(One Year)

Interest

Rate (%)

April12.05
May13
June11
July18
August10
September11.5
October15.08
November21
December6
January8
February13.2
March7.5


Calculate skewness & Kurtosis in R:

Calculate skewness & Kurtosis in R:
Calculating the Skewness & Kurtosis of interest rate in R, we get the positive skewed value, which is near to 0. The skewness of the interest rate is 0.5585253.

The kurtosis of the interest rate is 2.690519

Kurtosis is less than 3, so this is Platykurtic distribution.

Calculate Skewness & Kurtosis in Python:

Calculate Skewness & Kurtosis in Python:
Calculate Skewness & Kurtosis in Python:
Calculating the Skewness & Kurtosis of interest rate in Python, we get the positive skewed value and near from 0. The skewness of the interest rate is 0.641697.

The kurtosis of the interest rate is 0.241602.

Kurtosis is less than 3, so this is Platykurtic distribution.

Conclusion:

Firstly, according to the output of the data the value is positively skewed(R & Python), positive skewness indicates a distribution with an asymmetric tail extending toward more positive values.

And the kurtosis is less than 3 (R & Python), it is a platykurtic distribution. Positive kurtosis indicates a relatively peaked distribution. And the distribution is light tails.

Secondly, the value of the skewness and kurtosis are different in R and Python, but the actual effects are more or less the same. The results are different because skewness and kurtosis are calculated with different formulae or method for the measurement like Bowley’s measure, Pearson’s(First, Second) measures, Fisher’s measure & Moment’s measure. And different software (ex. R, Python, SAS, Excel etc) using different processes to calculate skewness & kurtosis brings the same ultimate result. The numerical values change only when the numbers are also changed. So, we sometimes get different results.

2

There are numerous other blogs that you can follow with Dexlab Analytics. Also, if you want to explore computer vision course Python, neural network machine learning Python and more extensive courses on R & Python, then you can also join us and boost both your passion and career.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Call us to know more