analytics training institute in Delhi Archives - Page 2 of 7 - DexLab Analytics | Big Data Hadoop SAS R Analytics Predictive Modeling & Excel VBA

## Statistical Application in R & Python: EXPONENTIAL DISTRIBUTION

In this blog, we will explore the Exponential distribution. We will begin by questioning the “why” behind the exponential distribution instead of just looking at its PDF formula to calculate probabilities. If we can understand the “why” behind every distribution, we will have a head start in figuring out its practical uses in our everyday business situations.

Much could be said about the Exponential distribution. It is an important distribution used quite frequently in data science and analytics. Besides, it is also a continuous distribution with one parameter “λ” (Lambda). Lambda as a parameter in the case of the exponential distribution represents the “rate of something”. Essentially, the exponential distribution is used to model the decay rate of something or “waiting times”.

For instance, you might be interested in predicting answers to the below-mentioned situations:

• The amount of time until the customer finishes browsing and actually purchases something in your store (success).
• The amount of time until the hardware on AWS EC2 fails (failure).
• The amount of time you need to wait until the bus arrives (arrival).

In all of the above cases if we can estimate a robust value for the parameter lambda, then we can make the predictions using the probability density function for the distribution given below:

#### Application:-

Assume that a telemarketer spends on “average” roughly 5 minutes on a call. Imagine they are on a call right now. You are asked to find out the probability that this particular call will last for 3 minutes or less.

Below we have illustrated how to calculate this probability using Python and R.

#### Calculate Exponential Distribution in R:

In R we calculate exponential distribution and get the probability of mean call time of the tele-caller will be less than 3 minutes instead of 5 minutes for one call is 45.11%.This is to say that there is a fairly good chance for the call to end before it hits the 3 minute mark.

#### Calculate Exponential Distribution in Python:

We get the same result using Python.

#### Conclusion:

We use exponential distribution to predict the amount of waiting time until the next event (i.e., success, failure, arrival, etc).

Here we try to predict that the probability of the mean call time of the telemarketer will be less than 3 minutes instead of 5 minutes for one call, with the help of Exponential Distribution. Similarly, the exponential distribution is of particular relevance when faced with business problems that involve the continuous rate of decay of something. For instance, when attempting to model the rate with which the batteries will run out.

Hopefully, this blog has enabled you to gather a better understanding of the exponential distribution. For more such interesting blogs and useful insights into the technologies of the age, check out the best Analytics Training institute Gurgaon, with extensive Data Science Courses in Gurgaon and Data analyst course in Delhi NCR.

Lastly, let us know your opinions about this blog through your comments below and we will meet you with another blog in our series on data science blogs soon.

#### Interested in a career in Data Analyst?

To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

## Alteryx is Inclined to Make Things Easy

Alteryx Analytics is primarily looking to ease the usability of the platform in all of the updates that are yet to come. The esteemed data analytics platform is concentrating on reducing the complexities to attract more users and thus, widen their age-old user base beyond that of the data scientists and data analytics professionals.

Alteryx is headquartered in Irvine, California. It was founded as SRC LLC in 1997 and comes with a suite of four tools to help the world of data scientists and data analysts to manage and interpret data easily. Alteryx Connect, Alteryx Designer, Alteryx Promote and Alteryx Server are the main components of the analytics platform of Alteryx. Thus, it is worth mentioning that the Alteryx Certification Course is a must if you are looking to make a career out of data science/data analytics.

#### A Quick Glance at the Recent Updates

The reputed firm launched a recent version of Alteryx 2019.3, in October, and is likely to release the Alteryx 2019.4 as a successor to it. The latter is scheduled for a December release.

#### What’s in the Update?

Talking about the all-new version Alteryx 2019.3, Ashley Kramer, senior vice president of product management at Alteryx, said that the latest version promises 25 new and upgraded features, all of them focussing on the user-friendliness of the platform at large.

One of the prominent features of the new version is a significant decrease in the total number of clicks that a user will take to arrive at the option of visualizing data to make analytic decisions.

Data profiling helps the users to visualize the data while they are working with it. Here, Alteryx discovered a painless way to work with data by modeling the bottom of the screen in a format similar to that of MS Excel.

All of these changes and additions are done keeping in mind the features that the “customers had been asking for,” according to Kramer.

Now, with the December update, which will come with an enhanced mapping tool, the Alteryx analytics will strive to further lower the difficulties surrounding the platform.

If you are interested in knowing all the latest features, it is better to join one of the finest AlterYX Training institutes in Delhi NCR, with exhaustive Analytics Courses in Delhi NCRalong with other demanding courses like Python for Data Analysis, R programming courses in Gurgaonmatchless course of Big Data, Data Analytics and more.

The blog has been sourced fromsearchbusinessanalytics.techtarget.com/news/252474294/Alteryx-analytics-platform-focuses-on-ease-of-use

#### Interested in a career in Data Analyst?

To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

## How to Start a Successful Data Science Career?

The most common question we come across in DexLab Analytics HQ is how to take a step into the world of analytics and data science. Of course, grabbing a data science job isn’t easy, especially when there is so much hype going around. This is why we have put together top 5 ways to bag the hottest job in town. Follow these points and swerve towards your dream career.

#### Enhance Your Skills

At present, LinkedIn in the US alone have 24,697 vacant data scientist positions. Python, SQL and R are the most common skills in demand followed by Tensorflow, Jupyter Notebooks and AWS. Gaining statistical literacy is the best way to grab these hot positions but for that, you need hands-on training from an expert institute.

If interested, you can check out analytics courses in Delhi NCR delivered by DexLab Analytics. They can help you stay ahead of the curve.

#### Create an Interesting Portfolio

A portfolio filled with machine learning projects is the best bet. Companies look for candidates who have prior work experience or are involved in data science projects. Your portfolio is the potential proof that you are capable enough to be hired. Thus, make it as attractive as possible.

Include projects that qualify you to be a successful data scientist. We would recommend including a programming language of your choice, your data visualization skill and your ability to employ SQL.

#### Get Yourself a Website

Want to standout from the rest? Build up your website, create a strong online presence and continuously add and update your Kaggle and GitHub profile to exhibit your skills and command over the language. Profile showcasing is of utmost importance to get recognized by the recruiters. A strong online presence will not only help you fetch the best jobs but also garner the attention of the leads of various freelance projects.

#### Be Confident and Apply for Jobs You Are Interested In

It doesn’t matter if you possess the skills or meet the job requirements mentioned on the post, don’t stop applying for the jobs that interest you. You might not know every skill given on a job description. Follow a general rule, if you qualify even half of the skills, you should apply.

However, while job hunting, make sure you contact recruiters, well-versed in data science and boost your networking skills. We would recommend you visit career fairs, approach family, friends or colleagues and scroll through company websites. These are the best ways to look for data science jobs.

#### Improve Your Communication Skills

One of the key skills of data scientists is to communicate insights to different users and stakeholders. Since data science projects run across numerous teams and insights are often shared across a large domain, hence superior communication skill is an absolute must-have.

Want more information on how to become a data scientist? Follow DexLab Analytics. We are a leading data analyst training institute in Delhi offering in-demand skill training courses at affordable prices.

The blog has been sourced fromwww.forbes.com/sites/louiscolumbus/2019/04/14/how-to-get-your-data-scientist-career-started/#67fdbc0e7e5c

#### Interested in a career in Data Analyst?

To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

## 5 Full-Stack Data Science Projects You Need to Add to Your Resume Now

Small or big, most of the organizations seek aspiring data scientists. The reason being this new breed of data experts helps them stay ahead of the curve and churns out industry-relevant insights.

It hardly matters if you are a fresher or a college dropout, with the right skill-set and basic understanding of nuanced concepts of machine learning, you are good to go and pursue a lucrative career in data science with a decent pay scale.

However, whenever a company hires a new data scientist, the former expects that the candidate had some prior work experience or at least have been a part in a few data science-related projects. Projects are the gateway to hone your skills and expertise in any realm.  In such projects, a budding data scientist not only learns how to develop a successful machine learning model but also solves an array of critical tasks, which needs to be fulfilled single-handedly. The tasks include preparing a problem sheet, crafting a suitable solution to the problem, collect and clean data and finally evaluate the quality of the model.

Below, we have charted down top 5 full-stack data science projects that will boost your efforts of preparing an interesting resume.

#### Face Detection

In the last decade, face detection gained prominence and popularity across myriad industry domains. From smartphones to digitally unlocking your house door, this robust technology is being used at homes, offices and everywhere.

Project: Real-Time Face Recognition

Tools: OpenCV, Python

Algorithms: Convolution Neural Network and other facial detection algorithms

#### Spam Detection

Today, the internet plays a crucial role in our lives. Nevertheless, sharing information across the internet is no mean feat. Communication systems, such as emails, at times, contain spam, which results in decreased employee productivity and needs to be avoided.

Project: Spam Classification

Tools: Python, Matplotlib

Algorithm: NLTK

#### Sentiment Analysis

If you are from the Natural Language Processing and Machine Learning domain, sentiment analysis must have been the hot-trend topic. All kinds of organizations use this technology to understand customer behaviors and frame strategies. It works by combining NLP and suave machine learning technologies.

Project: Twitter Sentiment Analysis

Tools: NLTK, Python

Algorithms: Sentiment Analysis

#### Time Series Prediction

Making predictions regarding the future is known as extrapolation in the classical handling of time series data. Modern researchers, however, prefer to call it time series forecasting. It is a revolutionary phenomenon of taking models perfect on historical data and using them for future prediction of observations.

Project: Web Traffic Time Series Forecasting

Tools: GCP

Algorithms: Long short-term memory (LSTM), Recurrent Neural Networks (RNN) and ARIMA-based techniques

#### Recommender Systems

Bigwigs, such as Netflix, Pandora, Amazon and LinkedIn rely on recommender systems. The latter helps users find out new and relevant content and items. In simple terms, recommender systems are algorithms that suggest users meaningful items based on his preferences and requirements.

Project: Youtube Video Recommendation System

Tools: Python, sklearn

Algorithms: Deep Neural Networks, classification algorithms

If you are a budding data scientist, follow DexLab Analytics. We are a premier data science training platform specialized in a wide array of in-demand skill training courses. For more information on data science courses in Gurgaon, feel free to drop by our website today.

The blog has been sourced fromwww.analyticsindiamag.com/5-simple-full-stack-data-science-projects-to-put-on-your-resume

#### Interested in a career in Data Analyst?

To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

## Top 6 Data Science Interview Red Flags

Excited to face your first data science interview? Probably, you must have double-checked your practical skills and theoretical knowledge. Technical interviews are tough yet interesting. Cracking them and bagging your dream job is no mean feat.

Thus, to lend you a helping hand, we’ve compiled a nifty list of some common red flags that plague data science interviews. Go through them and decide how to handle them well!

#### Boring Portfolio

Having a monotonous portfolio is not a crime. Nevertheless, it’s the most common allegation against data scientists by the recruiters. Given the scope, you should always exhibit your organizational and communication abilities in an interesting way to the hiring company. A well-crafted portfolio will give you instant recognition, so why not try it!

#### Sloppy Code

Of course, your analytical skills, including coding is going to be put to test during any data science interview. A quick algorithm coding test will bring out the technical value you would add to the company. In such circumstances, writing a clumsy code or a code with too many bugs would be the last thing you want to do. Improving the quality of coding will accelerate your hiring process for sure.

#### Confusion about Job Role

No wonder if you walk up to your interviewer having no idea about your job responsibilities, your expertise and competence will be questionable. The domain of data science includes a lot of closely related job profiles. But, they differ widely in terms of skills and duties. This is why it’s very important to know your field of expertise and the skills your hiring company is looking for.

#### Zero Hands-on Experience

A decent, if not rich, hands-on experience in Machine Learning or Data Science projects is a requisite. Organizations prefer candidates who have some experience. The latter may include data cleaning projects, data-storytelling projects or even end-to-end data projects. So, keep this in mind. It will help you score well in the upcoming data science interview.

#### Lack of Knowledge over Data Science Technicalities

Data analytics, data science, machine learning and AI – are all closely associated with one another. To excel in each of these fields you need to possess high technical expertise. Being technically sound is the key. An interview can go wrong if the recruiter feels you lack command over data science technicalities, even though you have presented an excellent portfolio of projects.

Therefore, you have to be excellent in coding and harbor a vast pool of technical knowledge. Also, be updated with the latest industry trends and robust set of algorithms.

#### Ignoring the Basics

It happens. At times, we fumble while answering some very fundamental questions regarding our particular domain of work. However, once we come out of the interview venue, we tend to know everything. Reason: lack of presence of mind. Therefore, the key is to be confident. Don’t lose your presence of mind in the stifling interview room.

Thus, beware of these drooping gaps; being a victim of these critical objections might keep you away from bagging that dream data analyst job. Instead, work on them and win a certain edge over others while cracking the toughest data science interview session.

#### Note:

If interested in Data Science Courses in Gurgaon, check out DexLab Analytics. We are a premier training platform specialized in in-demand skills, including machine learning using Python, Alteryx and customer analytics. All our courses are industry-relevant and crafted by experts.

The blog has been sourced from upxacademy.com/eleven-most-common-objections-in-data-science-interviews-and-how-to-handle-them

#### Interested in a career in Data Analyst?

To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

## Basics of a Two-Variable Regression Model: Explained

In continuation of the previous Regression blog, here we are back again to discuss the basics of a two-variable regression model. To read the first blog from the Regression series, click here www.dexlabanalytics.com/blog/a-regression-line-is-the-best-fit-for-the-given-prf-if-the-parameters-are-ols-estimations-elucidate.

In Data Science, regression models are the major driver to interpret the model with necessary statistical methods, practically as well as theoretically. One, who works extensively with business data metrics, will be able to solve various tough problems with the help of a regression theory. The key insight of the regression models lies in interpreting the fitness of the models. But it differs from the standard machine learning techniques such that, for improvement in the performance of the model being predicted, the major interpretable coefficients are never sacrificed. Thus, a sense in regression models can be considered as the most important tool to be chosen for solving any practical problem.

Let’s consider a simple example to understand regression analysis from scratch. Say, we want to predict the sales of a Softlines eCommerce company for this year during the festivals of Diwali. There are a lot of factors to generate impacts on the sales value, as there are hundreds of factors persisting within the model. We can consider our own judgement to get the impacting factors. Now, here in our model, the value of sales that we want to predict is the dependent variable, whereas the impacting factors are considered as the independent variables. To analyse this model in terms of regression, we need to gather all the information about the independent variables from the past few years, and then act on it according to the regression theory.

Before getting into the core theory, there are some basic assumptions for such a two-variable regression model and they are as follows:

• Variables are linearly related: The variables in a 2-variable Regression Model are linearly related, the linearity being in parameters, though not always in variables, i.e. the power in which the parameters appear should be of 1 only and should not be multiplied or divided by any other parameters. These linearly related variables are basically of two types (i) independent or explanatory variables & (ii) dependent or response variables.
• Variables can be represented graphically: The idea behind this assumption guarantees that observations must be real numbers represented on graph papers.
• Residual term and the estimated value of the variables are uncorrelated.
• Residual terms and explanatory variables are uncorrelated.
• Error variables are uncorrelated with mean 0 & common variance σ2

Now, how can a PRF for expanding an economic relationship between 2 variables be specified?

Well, Population regression function, or more generally, the population regression curve, is defined as the locus of the conditional means of the dependent variables, for a fixed value of the explanatory variables. More simply, it is the curve connecting the means of the sub-populations of Y corresponding to the given values of the regressor X.

Formally, a PRF is the locus of all conditional means of the dependent variables for a given value of the explanatory variables. Thus; the PRF as economic theory would suggest would be:

Where 9(X) is expected to be an increasing function of X, if the conditional expectation is linear in X. then

Hence, for any ith observations:

However, the actual observation for the dependent variable is Yi. Therefore; Yi – E(Y/Xi) = ui, which is the disturbance term or the stochastic term of the Regression Model.

Thus,

…………………… (A)

• is the population regression function and this form of specifying the population regression function is called the stochastic specification of the PRF.

#### Stochastic Specification of the Model:

Yi = α + βXi + ui is referred to as the stochastic specification of the Population Regression Function, where ui is the stochastic or the random disturbance term. It explains everything’s net influence other than X variable on the ith observation. Thus, ui is a surrogate or proxy for all omitted or neglected variables which may affect Y but is not included in the model. The random disturbance term is incorporated into the model with the following assumptions:-

#### Proof:

Taking conditional expectation as both sides, we get:

Hence; E(ui) = 0

cov(ui,uj) = E(ui uj ) = 0 ∀ i ≠ j i.e. the disturbance terms are distributed independently of each other.

#### Proof:

Two variables are said to be independently distributed, or stochastically independent; if the conditional distributions are equal to the corresponding marginal distributions.

Hence; cov(ui,uj )= E(ui uj ) = 0 Thus, no auto correction is present among ui,s i.e. ui,s. s are identically and independently distributed Random Variables. Hence, ui,s are all Random Samples.

#### Proof:

The conditional variance between two error terms can be given as given independence &

All these assumptions can be embodied in the simple statement: ui~N(0,σ2) where ui,s are iid’s ∀ I, Which heads “the ui are independently distributed identically distributed with mean 0 & variance σ2”.

#### Last Notes

The benefits of regression analysis are immense. Today’s business houses literally thrive on such analysis. For more information, follow us at DexLab Analytics. We are a leading data science training institute headquartered in Delhi NCR and our team of experts take pride in crafting the most insight-rich blogs. Currently, we are working on Regression Analysis. More blogs are to be followed on this model. Keep watching!

#### Interested in a career in Data Analyst?

To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

## A Regression Line Is the Best Fit for the Given PRF If the Parameters Are OLS Estimations – Elucidate

Regression analysis is extensively used in business applications. It’s one of the most integral statistical techniques that help in estimating the direction and strength between two or more (financial) variables – thus determining a company’s sales and profits over the past few years.

In this blog, we have explained how a regression line is the best fit for a given PRF if the parameters are all OLS estimations.

The OLS estimators for a given regression line has been obtained as: a = y ̅ – bx ̅ and b = (Cov(x,y))/(v(x)). The regression line on the basis of these OLS estimate has been given as: Y ̂_ i-Y ̅ = b(x_i-x ̅ )….. (1)

The regression line (1) constructed above is a function of the least square i.e. the parameters of the regression equation have been selected so that the residual sum of squares is minimized. Thus, the estimators ‘a’ & ‘b’ explains the population parameters, the best relative to any other parameters. Given, the linearity of the parameters, these estimators share the minimum variations with the population parameters, i.e. they explain the maximum variations in the model, in relation to the population parameters, as compared to any other estimators, in a class of unbiased estimators.

Thus, the regression line would be the ‘best fit’ for a given PRF. If ‘a’ & ‘b’ are best linear unbiased estimators for  respectively. Thus, to show ‘best fit’, we need to prove:

1. #### To ‘b’ is Best unbiased estimator for :-

From the OLS estimation; we have ‘b’ as:

i.e.b is a linear combination of w’is & y’is.

Hence; ‘b’ is a linear estimator for β. Therefore, the regression line would be linear in parameters as far as ‘b’ is concerned.

Now,

Let us test for the prevalence of this conditions:

For unbiasedness, we must have :- E(b)=β. To test this, we take expectation on both sides of (3) & get:

From (1) & (4) we can say that ‘b’ is a linear unbiased estimator for β.

To check whether ‘b’ is the best estimator or not we need to check whether it has the minimum variance in a class of linear unbiased estimator.

Thus, we need to calculate the variance for ‘b’ & show that it is the minimum in a class of unbiased estimations. But, first, we need to calculate the variance for ‘b’.

Now; we need to construct another linear unbiased estimator and find its variance.

Let another estimator be: b^*=∑ci yi….(6)  For unbiasedness ∑ci =0,∑cixi =1.

Now; from (6) we get:

∴b* is an unbiased estimator for  Now; the variance for  can be calculated as:-

Now;

Hence; from (9) we can say V(b) is the least among a class of unbiasedness estimators.

Therefore, ‘b’ is the best linear unbiased estimator for  in a class of linear unbiased estimators.

1. #### To prove ‘a’ is the best linear unbiased estimator for α:-

Form the OLS estimation we have ‘a’ as:-

Here; ‘b’ is a function of Y and Y is a linear function of X(orUi).

‘a’ is also a linear function of Y. i.e. has linearity.

There, ‘a’ is a linear estimator for   ……. (11)

Now, for ‘a’ to be an unbiased estimator; we must have From (10) we have:-

Taking expectations on both sides of the equation; we get:

Therefore, ‘a’ is an unbiased estimator for  ……… (12)

From (11) & (12) ‘a’ is a linear unbiased estimator for

Now, if ‘a’ is to be the best estimator for then is most have the minimum variance. Thus; we first need to calculate the variance of ‘a’.

Now,

Now; let us consider an estimator in the class of linear unbiased estimator.

Further we know,

Now;

Hence;

Now;

Therefore;

Hence; from (16) we can say that is the Min Variance Unbiased estimator in a class of unbiased estimator.

Hence; we can now safely conclude that a regression line composed of OLS estimators is the ‘best fit’ line for a given PRF, compared to any other estimator.

Thus, the best-fit regression line can be depicted as

Thus, a regression line is the best fit for a given PRF if the estimators are OLS.

#### End Notes

The beauty and efficiency of Regression method of forecasting never fail to amaze us. The way it crunches the data to help make better decisions and improve the current position of the business is incredible. If you are interested in the same, follow us at DexLab Analytics. A continues blog series on regression model and analysis is upcoming. Watch this space for more.

DexLab Analytics offers premium data science courses in Gurgaon crafted by the experts. After thorough research, each course is prepared keeping student’s needs and industry demands in mind. You can check out our course offerings here.

#### Interested in a career in Data Analyst?

To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

## Bayesian Thinking & Its Underlying Principles

In the previous blog on Bayes’ Theorem, we left off at an interesting junction where we just touched upon the ideas on prior odds ratio, likelihood ratio and the resulting Posterior Odds Ratio. However, we didn’t go into much detail of what it means in real life scenarios and how should we use them.

In this blog, we will introduce the powerful concept of “Bayesian Thinking” and explain why it is so important. Bayesian Thinking is a practical application of the Bayes’ Theorem which can be used as a powerful decision-making tool too!

We’ll consider an example to understand how Bayesian Thinking is used to make sound decisions.

For the sake of simplicity, let’s imagine a management consultation firm hires only two types of employees. Let’s say, IT professionals and business consultants. You come across an employee of this firm, let’s call him Raj. You notice something about Raj instantly. Raj is shy. Now if you were asked to guess which type of employee Raj is what would be your guess?

If your guess is that Raj is an IT guy based on shyness as an attribute, then you have already fallen for one of the inherent cognitive biases. We’ll talk more about it later. But what if it can be proved Raj is actually twice as likely to be a Business Consultant?!

This is where Bayesian Thinking allows us to keep account of priors and likelihood information to predict a posterior probability.

The inherent cognitive bias you fell for is actually called – Base Rate Neglect. Base Rate Neglect occurs when we do not take into account the underlying proportion of a group in the population. Put it simply, what is the proportion of IT professionals to Business consultants in a business management firm? It would be fair to assume for every 1 IT professional, the firm hires 10 business consultants.

Another assumption could be made about shyness as an attribute. It would be fair to assume shyness is more common in IT professionals as compared to business consultants. Let’s assume, 75% of IT professionals are in fact shy corresponding to about 15% of business consultants.

Think of the proportion of employees in the firm as the prior odds. Now, think of the shyness as an attribute as the Likelihood. The figure below demonstrates when we take a product of the two, we get posterior odds.

Plugging in the values shows us that Raj is actually twice as likely to be a Business consultant. This proves to us that by applying Bayesian Thinking we can eliminate bias and make a sound judgment.

Now, it would be unrealistic for you to try drawing a diagram or quantifying assumptions in most of the cases. So, how do we learn to apply Bayesian Thinking without quantifying our assumptions? Turns out we could, if we understood what are the underlying principles of Bayesian Thinking are.

#### Rule 1 – Remember your priors!

As we saw earlier how easy it is to fall for the base rate neglect trap. The underlying proportion in the population is often times neglected and we as human beings have a tendency to just focus on just the attribute. Think of priors as the underlying or the background knowledge which is essentially an additional bit of information in addition to the likelihood. A product of the priors together with likelihood determines the posterior odds/probability.

#### Rule 2 – Question your existing belief

This is somewhat tricky and counter-intuitive to grasp but question your priors. Present yourself with a hypothesis what if your priors were irrelevant or even wrong? How will that affect your posterior probability? Would the new posterior probability be any different than the existing one if your priors are irrelevant or even wrong?

#### Rule 3 – Update incrementally

We live in a dynamic world where evidence and attributes are constantly shifting. While it is okay to believe in well-tested priors and likelihoods in the present moment. However, always question does my priors & likelihood still hold true today? In other words, update your beliefs incrementally as new information or evidence surfaces. A good example of this would be the shifting sentiments of the financial markets. What holds true today, may not tomorrow? Hence, the priors and likelihoods must also be incrementally updated.

#### Conclusion

In conclusion, Bayesian Thinking is a powerful tool to hone your judgment skills. Developing Bayesian Thinking essentially tells us what to believe in and how much confident you are about that belief. It also allows us to shift our existing beliefs in light of new information or as the evidence unfolds. Hopefully, you now have a better understanding of Bayesian Thinking and why is it so important.

On that note, we would like to say DexLab Analytics is a premium data analytics training institute located in the heart of Delhi NCR. We provide intensive training on a plethora of data-centric subjects, including data science, Python and credit risk analytics. Stay tuned for more such interesting blogs and updates!

About the Author: Nish Lau Bakshi is a professional data scientist with an actuarial background and a passion to use the power of statistics to tackle various pressing, daily life problems.

#### Interested in a career in Data Analyst?

To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

## The Almighty Central Limit Theorem

The Central Limit Theorem (CLT) is perhaps one of the most important results in all of the statistics. In this blog, we will take a glance at why CLT is so special and how it works out in practice. Intuitive examples will be used to explain the underlying concepts of CLT.

First, let us take a look at why CLT is so significant. Firstly, CLT affords us the flexibility of not knowing the underlying distribution of any data set provided if the sample is large enough. Secondly, it enables us to make “Large sample inference” about the population parameters such as its mean and standard deviation.

The obvious question anybody would be asking themselves is why it is useful not to know the underlying distribution of a given data set?

To put it simply in real life, often times than not the population size of anything will be unknown. Population size here refers to the entire collection of something, like the exact number of cars in Gurgaon, NCR at any given day. It would be very cumbersome and expensive to get a true estimate of the population size. If the population size is unknown its underlying distribution will be known too and so will be its standard deviation. Here, CLT is used to approximate the underlying unknown distribution to a normal distribution. In a nutshell, we don’t have to worry about knowing the size of the population or its distribution. If the sample sizes are large enough, i.e. – we have a lot of observed data, it takes the shape of a symmetric bell-shaped curve.

Now let’s talk about what we mean by “Large sample inference”. Imagine slicing up the data into ‘n’ number of samples as below:

Now, each of these samples will have a mean of their own.

Therefore, effectively the mean of each sample is a random variable which follows the below distribution:

Imagine, plotting each of the sample mean on a line plot, and as “n”, i.e. the number of samples goes to infinity or a large number the distribution takes a perfect bell-shaped curve, i.e – it tends to a normal distribution.

Large sample inferences could be drawn about the population from the above distribution of x̅. Say, if you’d like to know the probability that any given sample mean will not exceed quantity or limit.

The Central Limit Theorem has vast application in statistics which makes analyzing very large quantities easy through a large enough sample. Some of these we will meet in the subsequent blogs.

Try this for yourself: Imagine the average number of cars transiting from Gurgaon in any given week is normally distributed with the following parameter . A study was conducted which observed weekly car transition through Gurgaon for 4 weeks. What is the probability that in the 5th week number of cars transiting through Gurgaon will not exceed 113,000?

If you liked this blog, then do please leave a comment or suggestions below.

About the Author: Nish Lau Bakshi is a professional data scientist with an actuarial background and a passion to use the power of statistics to tackle various pressing, daily life problems.

About the Institute: DexLab Analytics is a premier data analytics training institute headquartered in Gurgaon. The expert consultants working here craft the most industry-relevant courses for interested candidates. Our technology-driven classrooms enhance the learning experience.

#### Interested in a career in Data Analyst?

To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

+91 931 572 5902