Python Certification Training in Delhi Archives - Page 8 of 9 - DexLab Analytics | Big Data Hadoop SAS R Analytics Predictive Modeling & Excel VBA

Application of Median Using R And Python: Calculating Median On the Go

Application of Median Using R And Python: Calculating Median On the Go

This blog is in continuation of STATISTICAL APPLICATION IN R & PYTHON: CHAPTER 1 – MEASURE OF CENTRAL TENDENCY and takes you through a comprehensive way to calculate the Median in R and Python.

The term ‘Median’ is derived from the Latin word – ‘Medius’ means the center of something. In mathematics, Median is treated is that unique observation which would divide your data set into two equal halves.

If you are still unclear about Mean and/or seeking easier ways to calculate Mean using R & Python, then check APPLICATION OF HARMONIC MEAN USING R AND PYTHON and CALCULATING GEOMETRIC MEAN USING R AND PYTHON.

Median is special because unlike its rival, the Mean, Median is not ridiculed by the curse of extreme values. To illustrate the curse of extreme values, we bring you the following example:

Imagine I had the following data about the average annual salaries:

In Lacs

8.5
9
11
7
8
8.5
36

The mean of the above data set is: 88/7 = 12.57 lacs.

Whereas, to get the median we would have to first arrange the data into ascending order and look for the midpoint of my data i.e.,(1/2 + n/2)th observation. Where “n” is the number of observations.

The median would then be:

7
8
8.5
8.5
9
11
36

Median is the 4th observation, which is 8.5 lacs.

Looking at the mean and median, it would be fair to conclude that median is the better choice to accurate summarizing the data set whenever extreme values are present. However, this may be a crude generalization which should be taken with a pinch of salt. Despite its flaws, the mean still has statistical properties used in predictive analytics which the median lacks.

Application:

A construction company gave wages to their 10 labor (Let name A to J)  as a weekly basis, the wages are 2000, 2100, 1900, 2150, 2500, 2450, 1800, 2600, 2200, 2300. Compute the Median wages of the construction company.

Sr.NoLaborsWages (Weekly)
1A2000
2B2100
3C1900
4D2150
5E2500
6F2450
7G1800
8H2600
9I2200
10J2300

Calculation Median in R:

Python Certification

The Median wage is 2175, calculate in R.

Calculate Median in Python:

Create a data frame of the data in Python.

R Programming Certification

Now, calculate Median in Python.

R Programming Certification

The Median wage is 2175, calculated in R.

This concludes the post. If you have any queries with regards to this post, you can reach us at Dexlab Analytics. Furthermore, you can also look up for interesting and quality courses of R Programming Certification, Python Certification. Also, you can enroll with us for our combined courses of Data Science with Python Certification, Deep Learning and AI using Python, among others. So, hurry up and grab the best course!

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Application of Mode using R and Python

Application of Mode using R and Python

Mode, for a given set of observations, is that value of the variable, where the variable occurs with the maximum or the highest frequency.

This blog is in continuation with STATISTICAL APPLICATION IN R & PYTHON: CHAPTER 1 – MEASURE OF CENTRAL TENDENCY. However, here we will elucidate the Mode and its application using Python and R.

Mode is the most typical or prevalent value, and at times, represents the true characteristics of the distribution as a measure of central tendency.

Application:

The numbers of the telephone calls received in 245 successive one minute intervals at an exchange are shown in the following frequency distribution table:

 

No of Calls
Frequency
0
14
1
21
2
25
3
43
4
51
5
40
6
51
7
51
8
39
9
12
Total
245

 

 [Note: Here we assume total=245 when we calculate Mean from the same data]

Evaluate the Mode from the data.

Evaluate the Mode from the data

Calculate Mode in R:

Calculate mode in R from the data, i.e. the most frequent number in the data is 51.

The number 51 repeats itself in 5, 7 and 8 phone calls respectively.

Calculate Median in Python:

First, make a data frame for the data.

Now, calculate the mode from the data frame.

Calculate mode in Python from the data, i.e. the most frequent number in the data is 51.

The number 51 repeats itself in 5, 7 and 8 phone calls respectively.

Mode is used in business, because it is most likely to occur. Meteorological forecasts are, in fact, based on mode calculations.

The modal wage of a group of the workers is the wages which the largest numbers of workers receive, and as such, this wage may be considered as the representative wage of the group.

In this particular data set we use the mode function to know the occurrence of the highest number of phone calls.

It will thus, help the Telephone Exchange to analyze their data flawlessly.

2

Note – As you have already gone through this post, now, if you are interested to know about the Harmonic Mean, you can check our post on the APPLICATION OF HARMONIC MEAN USING R AND PYTHON.

Dexlab Analytics is a formidable institute for Deep learning for computer vision with PythonHere, you would also find more information about courses in Python, Deep LearningMachine Learning, and Neural Networks which will come with proper certification at the end.

We are there in the Social Media where you can follow us both in Facebook and Instagram.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Application of Harmonic Mean using R and Python

Application of Harmonic Mean using R and Python

Harmonic mean, for a set of observations is the number of observations divided by the sum of the reciprocals of the values and it cannot be defined if some of the values are zero.

This blog is in continuation with STATISTICAL APPLICATION IN R & PYTHON: CHAPTER 1 – MEASURE OF CENTRAL TENDENCY. However, here we will discover Harmonic mean and its application using Python and R.

2

Application:

A milk company sold milk at the rates of 10,16.5,5,13.07,15.23,14.56,12.5,12,30,32, 15.5, 16 rupees per liter in twelve different months (January-December), If an equal amount of money is spent on milk by a family in the ten months. Calculate the average price in rupees per month.

Table for the problem:

Month

Rates (Rupees/Liter)

January

10

February

16.5

March

5

April

13.07

May

15.23

June

14.56

July

12.5

August

12

September

30

October

32

November

15.5

December

16

Calculate Harmonic Mean in R:-

So, the average rate of the milk in rupees/liter is 12.95349 = 13 Rs/liter (Approx)

We get this answer from the Harmonic Mean, calculated in R.

Calculate Harmonic Mean in Python:-

First, make a data frame of the available data in Python.

Now, calculate the Harmonic mean from the following data frame.

So, the average rate of the milk in rupees/liter is 12.953491609077956 = 13 Rs/Liter (Approx)

We get this answer from Harmonic mean, calculated in Python.

Summing it Up:

In this data, we have a few large values which are putting an effect on the average value, if we calculate the average in Arithmetic mean, but in Harmonic mean, we get a perfect average from the data, and also for calculating the average rate.

Use of Harmonic mean is very limited. Harmonic mean gives the largest value to the smallest item and smallest value to the largest item.

Where there are a few extremely large or small values, Harmonic mean is preferable to Arithmetic mean as an average.

The Harmonic mean is mainly useful in averages involving time, rate & price.

Deep Learning and AI using Python

Note – If you want to learn the calculation of Geometric Mean, you can check our post on CALCULATING GEOMETRIC MEAN USING R AND PYTHON.

Dexlab Analytics is a peerless institute for Python Certification Training in Delhi. Therefore, for tailor-made courses in Python, Deep Learning, Machine Learning, Neural Networks, reach us ASAP!

You can even follow us on Social Media. We are available both in Facebook and Instagram.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Python is the Leader in Data Science: Know Why

Python is the Leader in Data Science: Know Why

From being simple and effective to being updated and thereby, solving almost everything that the booming industry of Data Science of today can look up to, Python boasts of it all.

It’s not a shock that Python is finding its uses in an array of industries. It is, in fact, the language that the Data Scientists rely on. Thus, our tailored courses of Python Certification Training in Delhi would be helpful for all in this digital age.

Let’s see some more of the advantages for which Python stands distinguished among the other programming languages:

Handling Data without a Hassle

The field of Data Science is entrusted with the handling of incredibly large amounts of data which is found to be intricate to compute. However, with Python, it is now simpler than ever. Any of the other high-level programming languages would make it rather difficult and messy compared to the peerless Python, if we talk about analytical and quantitative computing.

Open Source Programming Language

Python is an open-source programming language. Wonder why this programming language is the most preferred still?

It truly opens a whole lot of opportunities that the language can build upon, being open-source in nature. Furthermore, there is not a single restriction regarding Python. Thus, you can be as creative as you wish on this programming language.

It is Powerful and Easy to Use

Python is an easy language right from the start for which it has become so popular. Any of the beginners with just the rudimentary knowledge can start fine with Python. Besides, once you are on with this programming language, you can start progressing with it day by day at your own pace.

The implementation of the code has a slower approach in the languages: Java, C and C#, but if you try Python, you would discover that it is fast to debug and effective to perform. The prompt results in coding would aid with an added boost in your work.

In the Library of Python

Python is an all-absorbing language that even supports the cutting edge technologies of Machine Learning and Artificial Intelligence. And on top of it, Python also offers its users a colossal database of libraries. Therefore, you can simply check in the libraries, import them and then implement all of them in your day to day coding.

It is Highly Scalable

In the parameter of scalability, Python superbly stands out. The programming languages: R and Java certainly falls short in this factor. Thus, with the ease of scalability and quicker turnaround times, data scientists and nearly all of the organisations exploring Data Science, are choosing Python over any other existing languages.

Data Science Machine Learning Certification

It is Peerless in Visualisation and Graphics

As the smooth rendering of quality graphics and visualisation is the demand of the age, Python fits in quite comfortably here. With an exhaustive range of options for visualisation, which are simple and efficient, the world of Data Science is rooting for Python.

With all the benefits that you can reap, Python for data analysis is a must, if you want to be absorbed in the industry of Data Science.


.

Calculating Geometric Mean Using R and Python

Calculating Geometric Mean Using R and Python

In this blog, we are going to discuss the Geometric Mean and its application using Python and R.

Geometric Mean of group of ‘n’ observations is the nth root of their product. It is defined only when all observations have the same sign and none of them is zero.

Application:

Calculate the Geometric Mean of the salary increment of 12 employees. From the following table, calculate the average salary increment of the year (2019-2020):-

 

Name

Salary Increment in

Percentage (%)

Ritesh

10.09%

Heena

15.45%

Kritika

9%

Anuradha

13.06%

Gaurav

20%

Prakash

14%

Aarti

16%

Meena

6.25%

Utkarsh

12.85%

Chirag

10%

Neha

18%

Smrita

21.36%

 

Calculate the Geometric Mean in R:

So, from the data of the employee’s in R we calculate the G.M. and get that the average salary increment in the year (2019-2020) = 13.17618 or 13.18% (approx).

Calculate the Geometric Mean in Python:

First, make a data frame in Python from the following table.

Now, calculate the Geometric Mean from the data-frame.

So, from the data of the employee’s in Python we calculate the G.M. and get that the average salary increment in the year (2019-2020) = 13.176183416401196 or 13.18% (approx).

We use Geometric Mean for calculating ratios, rates and percentages. And it is not affected by the extreme value or outlier. In this particular problem, we use Geometric Mean because an average of the salary increment of the employee’s not affected by the extreme highest or extreme lowest value, that’s why the salary increment rates of Meena and Smrita do not have any effect on the total average rate.

Geometric Mean gives small value than Arithmetic Mean.

2

Note: This is a continuation of the blog: Statistical Application in R & Python: Chapter 1 – Measure of Central Tendency. It would be better to go through the first installment and then read this one. More blogs are to be followed, so stay tuned.

DexLab Analytics is a premier Python training institute in Delhi. Our industry-relevant courses are carefully crafted by experts. Follow us on Facebook and Instagram.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Statistical Application In R & Python: Chapter 1 – Measure Of Central Tendency

Statistical Applcation In R & Python: Chapter 1 – Measure Of Central Tendency

Statistical analysis helps explore data relationship and develop high-end models to frame better decisions. It’s an intricate process of collecting and evaluating data to define the nature of data that has to be analyzed.

Below, we dig into the basics of statistical application in R and Python using the measure of central tendency.

  • Introduction:-

As body methods for the study of numerical data, if some rows or columns are too long, in such cases, it becomes necessary to summarize data in an easily manageable form. The purpose is to serve by classifying the data in the form of frequency distribution and various graphs. When data relate to a variable, the process of summarization can be taken a step further by using certain descriptive measures. The dim is to focus on certain features that are central frequency and description.

2

  • Central Tendency :

In a set of data, they have a tendency, notwithstanding their variability, to cluster-around a central value and the tendency of the quantitative statistical observations is called central tendency.

The three measures of the central tendency are commonly used is:-

  • Mean
  • Median
  • Mode

The description of these 3 estimators start below:-

  • Mean:-

Mean is the average of central tendency and is the most commonly used measures.

The concept of mean is divided into three parts:-

  • Arithmetic mean.
  • Geometric mean.
  • Harmonic mean.

Mainly the mean refers to an arithmetic mean.

  • Arithmetic Mean (A.M.):-

The arithmetic mean of a set of observations is defined to be their sum, divided by the number of observations.

For n numbers of observation (x1,x2,… ,xn )

  • Weighted A.M.

For frequency distribution where  have  frequencies. (i=1,2,3…)

  • Application of A.M.:-

Let’s, calculate the mean of Age, Height & Weight from the given data.

NameSexAgeHeightWeight
RiteshM246.9112.5
HeenaF235.6584
KritikaF236.5398
AnuradhaF246.28102.5
GauravM246.35102.5
PrakashM225.7383
AartiF225.9884.5
MeenaF256.25112.5
UtkarshM236.2584
ChiragM225.999.5
NehaF215.1350.5
SmritaF246.4390

Calculating Mean in Python:

Therefore,

Age (Mean) = 23.08333333, Height (Mean) = 6.12, weight(Mean) = 85.625

Calculating Mean in R:

  • Application of Weighted A.M.:-

The weighted mean is denoted that the mean with frequency.

Data to solve:

Calculate the average price per ton of coal purchased by the industry for the half-year.

Month

Price Per TonTons Purchased

January

Rs. 52.4926

February

Rs. 62.2334
MarchRs. 87.26

40

AprilRs. 45.25

54

MayRs. 78.56

13

June

Rs. 69.25

45

Data to solve:

Month

Price (Rs)

Per Ton

(x)

Tons

Purchased

(f)

fx=y

(Main Data)

January

 52.49261364.74

February

 62.2334

2115.82

March

 87.2640

3490.4

April

45.2554

2443.5

May

 78.5613

1021.28

June

69.2545

3116.25

Total395.04N=212

13551.99

 

The price is denoted as x (52.49, 62.23, 87.26, 45.25, 78.56, 69.25 [in Rs.])=395.04

The amount of purchased (frequency) is denoted by f (26, 34, 40, 54, 13, 45) = 212 (N)

Then multiply the x and f and we get the total amount which is denoted by y, fx(y) = 13551.99

Calculate Weighted Mean in R:

Calculate Weighted Mean in Python:

To calculate the weighted mean from R & Python we get the same result = 63.9244811.

Want to know more about the nature of data? Keen to perform high-end statistical analysis using Python and R? Follow DexLab Analytics, an excellent Python training center in Gurgaon, India. Our team of consultants will help you learn the basics of R and Python in the easiest manner possible.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

The Rising Popularity of Python in Data Science

The Rising Popularity of Python in Data Science

Python is the preferred programming language for data scientists. They need an easy-to-use language that has decent library availability and great community participation. Projects that have inactive communities are usually less likely to maintain or update their platforms, which is not the case with Python.

What exactly makes Python so ideal for data science? We have examined why Python is so prevalent in the booming data science industry — and how you can use it for in your big data and machine learning projects.

Deep Learning and AI using Python

Why Python is Dominating?

Python has long been known as a simple programming language to pick up, from a syntax point of view, anyway. Python also has an active community with a vast selection of libraries and resources. The result? You have a programming platform that makes sense of how to use emerging technologies like machine learning and data science.

Professionals working with data science applications don’t want to be bogged down with complicated programming requirements. They want to use programming languages like Python and Ruby to perform tasks in a hassle-free way.

Ruby is excellent for performing tasks such as data cleaning and data wrangling, along with other data pre-processing tasks. However, it doesn’t feature as many machine learning libraries as Python. This gives Python the edge when it comes to data science and machine learning.

Python also enables developers to roll out programs and get prototypes running, making the development process much faster. Once a project is on its way to becoming an analytical tool or application, it can be ported to more sophisticated languages such as Java or C, if necessary.

Newer data scientists gravitate toward Python because of its ease of use, which makes it accessible.

Why Python is Ideal for Data Science?

Data science involves extrapolating useful information from massive stores of statistics, registers, and data. These data are usually unsorted and difficult to correlate with any meaningful accuracy. Machine learning can make connections between disparate datasets but requires serious computational sophistry and power.

Python fills this need by being a general-purpose programming language. It allows you to create CSV output for easy data reading in a spreadsheet. Alternatively, more complicated file outputs that can be ingested by machine learning clusters for computation.

2

Consider the Following Example:

Weather forecasts rely on past readings from a century’s worth of weather records. Machine learning can help make more accurate predictive models based on past weather events. Python can do this because it is lightweight and efficient at executing code, but it is also multi-functional. Also, Python can support object-orientated and functional styles, meaning it can find an application anywhere.

There are now over 70,000 libraries in the Python Package Index, and that number continues to grow. As previously mentioned, Python offers many libraries geared toward data science. A simple Google search reveals plenty of Top 10 Python libraries for data science lists. Arguably, the most popular data analysis library is an open-source library called pandas. It is a high-performance set of applications that make data analysis in Python a much simpler task.

No matter what data scientists are looking to do with Python, be it predictive causal analytics or prescriptive analytics, Python has the toolset to perform a variety of powerful functions. It’s no wonder why data scientists embrace Python.

If you are interested in Python Certification Training in Delhi, drop by DexLab Analytics. With a team of expert consultants, we provide state-of-the-art Machine Learning Using Python training courses for aspiring candidates. Check out our course itinerary for more information.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Demand for Data Analysts is Skyrocketing – Explained

Demand for Data Analysts is Skyrocketing - Explained

The salary of analytics professionals outnumbers that of software engineers by more than 26%. The wave of big data analytics is taking the world by storm. If you follow the latest studies, you will discover that there has been a prominent growth in median salary over several experience levels in the past three years (2016 to 2018). In 2019, the average analytics salary has been capped at 12.6 lakh per annum.

The key takeaway is that the salary structure of analytics professionals continues to beat other tech-related job roles. In fact, data analysts are found out-earning their Java correspondents by nearly 50% in India alone. A latest survey provides an encompassing view of base and compensation salaries in data science along with median salaries followed across diverse job categories, regions, education profiles, experience, tools and skills.

In this regard, a spokesperson of a prominent data analytics learning institute was found saying, “The demand for AI skills is expected to increase rapidly, which is also reflected by the fact that AI engineers command a higher salary than peers.” She further added, “Many of our clients have realized that investing in data-driven skills at the leadership level is a determining factor for the success of digital and AI initiatives in the organization. With the increasing adoption of digital technologies, we expect an enduring growth of Data Science and AI initiatives to offer exciting and lucrative career options to new age professionals,”

Over time, we are witnessing how markets are evolving while the demand for skilled data scientists is following an upward trend. It is not only the technology firms that are posting job offers, but the change is also evident across industries, like retail, medical, retail and CPG amongst others. These sectors are enhancing their analytical capabilities implying an automatic increase in the number of data-centric jobs and recruitment of data scientists.

Points to Consider:

  • In the beginning, nearly 76% of data analysts earn 6-lakh figure per annum.
  • The average analytics salary observed in 2018-19 is 12.6 lakh.
  • In terms of analytics career, Mumbai offers the highest compensation of 13.7 lakh yearly, followed by Bangalore at 13 lakh.
  • Mid-level professionals proficient in data analytics are more in demand.
  • Knowing Python is an added advantage; Python Programming training will help you earn more. Expect a package of 15.1 lakh.
  • Nevertheless, we often see a pay disparity for female data scientists against their male counterparts. While women’s take-home salary is 9.2 lakh, male from the same designation and profession earns 13.7 lakh per annum.

2

As endnotes, the demand for data science skills is skyrocketing. If you want to enter into this flourishing job market, this is the best time! Enroll in a good data analyst course in Delhi and mould your career in the shape of success! DexLab Analytics is a top-notch data analyst training institute that offers a plethora of in-demand skill training courses. Reach us for more.

 

This article has been sourced fromwww.tribuneindia.com/news/jobs-careers/data-analytics-professionals-ride-the-big-data-wave/759602.html

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Bayesian Thinking & Its Underlying Principles

Bayesian Thinking & Its Underlying Principles

In the previous blog on Bayes’ Theorem, we left off at an interesting junction where we just touched upon the ideas on prior odds ratio, likelihood ratio and the resulting Posterior Odds Ratio. However, we didn’t go into much detail of what it means in real life scenarios and how should we use them.

In this blog, we will introduce the powerful concept of “Bayesian Thinking” and explain why it is so important. Bayesian Thinking is a practical application of the Bayes’ Theorem which can be used as a powerful decision-making tool too!

We’ll consider an example to understand how Bayesian Thinking is used to make sound decisions.

For the sake of simplicity, let’s imagine a management consultation firm hires only two types of employees. Let’s say, IT professionals and business consultants. You come across an employee of this firm, let’s call him Raj. You notice something about Raj instantly. Raj is shy. Now if you were asked to guess which type of employee Raj is what would be your guess?

If your guess is that Raj is an IT guy based on shyness as an attribute, then you have already fallen for one of the inherent cognitive biases. We’ll talk more about it later. But what if it can be proved Raj is actually twice as likely to be a Business Consultant?!

This is where Bayesian Thinking allows us to keep account of priors and likelihood information to predict a posterior probability.

The inherent cognitive bias you fell for is actually called – Base Rate Neglect. Base Rate Neglect occurs when we do not take into account the underlying proportion of a group in the population. Put it simply, what is the proportion of IT professionals to Business consultants in a business management firm? It would be fair to assume for every 1 IT professional, the firm hires 10 business consultants.

Another assumption could be made about shyness as an attribute. It would be fair to assume shyness is more common in IT professionals as compared to business consultants. Let’s assume, 75% of IT professionals are in fact shy corresponding to about 15% of business consultants.

Think of the proportion of employees in the firm as the prior odds. Now, think of the shyness as an attribute as the Likelihood. The figure below demonstrates when we take a product of the two, we get posterior odds.

Plugging in the values shows us that Raj is actually twice as likely to be a Business consultant. This proves to us that by applying Bayesian Thinking we can eliminate bias and make a sound judgment.

Now, it would be unrealistic for you to try drawing a diagram or quantifying assumptions in most of the cases. So, how do we learn to apply Bayesian Thinking without quantifying our assumptions? Turns out we could, if we understood what are the underlying principles of Bayesian Thinking are.

Principles of Bayesian Thinking

Rule 1 – Remember your priors!

As we saw earlier how easy it is to fall for the base rate neglect trap. The underlying proportion in the population is often times neglected and we as human beings have a tendency to just focus on just the attribute. Think of priors as the underlying or the background knowledge which is essentially an additional bit of information in addition to the likelihood. A product of the priors together with likelihood determines the posterior odds/probability.

Rule 2 – Question your existing belief

This is somewhat tricky and counter-intuitive to grasp but question your priors. Present yourself with a hypothesis what if your priors were irrelevant or even wrong? How will that affect your posterior probability? Would the new posterior probability be any different than the existing one if your priors are irrelevant or even wrong?

Rule 3 – Update incrementally

We live in a dynamic world where evidence and attributes are constantly shifting. While it is okay to believe in well-tested priors and likelihoods in the present moment. However, always question does my priors & likelihood still hold true today? In other words, update your beliefs incrementally as new information or evidence surfaces. A good example of this would be the shifting sentiments of the financial markets. What holds true today, may not tomorrow? Hence, the priors and likelihoods must also be incrementally updated.

Conclusion

In conclusion, Bayesian Thinking is a powerful tool to hone your judgment skills. Developing Bayesian Thinking essentially tells us what to believe in and how much confident you are about that belief. It also allows us to shift our existing beliefs in light of new information or as the evidence unfolds. Hopefully, you now have a better understanding of Bayesian Thinking and why is it so important.

On that note, we would like to say DexLab Analytics is a premium data analytics training institute located in the heart of Delhi NCR. We provide intensive training on a plethora of data-centric subjects, including data science, Python and credit risk analytics. Stay tuned for more such interesting blogs and updates!

About the Author: Nish Lau Bakshi is a professional data scientist with an actuarial background and a passion to use the power of statistics to tackle various pressing, daily life problems.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Call us to know more