online courses Archives - DexLab Analytics | Big Data Hadoop SAS R Analytics Predictive Modeling & Excel VBA

## Time Series Analysis & Modelling with Python (Part II) – Data Smoothing

Data Smoothing is done to better understand the hidden patterns in the data. In the non- stationary processes, it is very hard to forecast the data as the variance over a period of time changes, therefore data smoothing techniques are used to smooth out the irregular roughness to see a clearer signal.

In this segment we will be discussing two of the most important data smoothing techniques :-

• Moving average smoothing
• Exponential smoothing

Moving average smoothing

Moving average is a technique where subsets of original data are created and then average of each subset is taken to smooth out the data and find the value in between each subset which better helps to see the trend over a period of time.

Lets take an example to better understand the problem.

Suppose that we have a data of price observed over a period of time and it is a non-stationary data so that the tend is hard to recognize.

 QTR (quarter) Price 1 10 2 11 3 18 4 14 5 15 6 ?

In the above data we don’t know the value of the 6th quarter.

….fig (1)

The plot above shows that there is no trend the data is following so to better understand the pattern we calculate the moving average over three quarter at a time so that we get in between values as well as we get the missing value of the 6th quarter.

To find the missing value of 6th quarter we will use previous three quarter’s data i.e.

MAS =  = 15.7

 QTR (quarter) Price 1 10 2 11 3 18 4 14 5 15 6 15.7

MAS =  = 13

MAS =  = 14.33

 QTR (quarter) Price MAS (Price) 1 10 10 2 11 11 3 18 18 4 14 13 5 15 14.33 6 15.7 15.7

….. fig (2)

In the above graph we can see that after 3rd quarter there is an upward sloping trend in the data.

Exponential Data Smoothing

In this method a larger weight ( ) which lies between 0 & 1 is given to the most recent observations and as the observation grows more distant the weight decreases exponentially.

The weights are decided on the basis how the data is, in case the data has low movement then we will choose the value of  closer to 0 and in case the data has a lot more randomness then in that case we would like to choose the value of  closer to 1.

EMA= Ft= Ft-1 + (At-1 – Ft-1)

Now lets see a practical example.

For this example we will be taking  = 0.5

Taking the same data……

 QTR (quarter) Price(At) EMS Price(Ft) 1 10 10 2 11 ? 3 18 ? 4 14 ? 5 15 ? 6 ? ?

To find the value of yellow cell we need to find out the value of all the blue cells and since we do not have the initial value of F1 we will use the value of A1. Now lets do the calculation:-

F2=10+0.5(10 – 10) = 10

F3=10+0.5(11 – 10) = 10.5

F4=10.5+0.5(18 – 10.5) = 14.25

F5=14.25+0.5(14 – 14.25) = 14.13

F6=14.13+0.5(15 – 14.13)= 14.56

 QTR (quarter) Price(At) EMS Price(Ft) 1 10 10 2 11 10 3 18 10.5 4 14 14.25 5 15 14.13 6 14.56 14.56

In the above graph we see that there is a trend now where the data is moving in the upward direction.

So, with that we come to the end of the discussion on the Data smoothing method. Hopefully it helped you understand the topic, for more information you can also watch the video tutorial attached down this blog. The blog is designed and prepared by Niharika Rai, Analytics Consultant, DexLab Analytics DexLab Analytics offers machine learning courses in Gurgaon. To keep on learning more, follow DexLab Analytics blog.

.

## Time Series Analysis Part I

A time series is a sequence of numerical data in which each item is associated with a particular instant in time. Many sets of data appear as time series: a monthly sequence of the quantity of goods shipped from a factory, a weekly series of the number of road accidents, daily rainfall amounts, hourly observations made on the yield of a chemical process, and so on. Examples of time series abound in such fields as economics, business, engineering, the natural sciences (especially geophysics and meteorology), and the social sciences.

• Univariate time series analysis- When we have a single sequence of data observed over time then it is called univariate time series analysis.
• Multivariate time series analysis – When we have several sets of data for the same sequence of time periods to observe then it is called multivariate time series analysis.

The data used in time series analysis is a random variable (Yt) where t is denoted as time and such a collection of random variables ordered in time is called random or stochastic process.

Stationary: A time series is said to be stationary when all the moments of its probability distribution i.e. mean, variance , covariance etc. are invariant over time. It becomes quite easy forecast data in this kind of situation as the hidden patterns are recognizable which make predictions easy.

Non-stationary: A non-stationary time series will have a time varying mean or time varying variance or both, which makes it impossible to generalize the time series over other time periods.

Non stationary processes can further be explained with the help of a term called Random walk models. This term or theory usually is used in stock market which assumes that stock prices are independent of each other over time. Now there are two types of random walks:
Random walk with drift : When the observation that is to be predicted at a time ‘t’ is equal to last period’s value plus a constant or a drift (α) and the residual term (ε). It can be written as
Yt= α + Yt-1 + εt
The equation shows that Yt drifts upwards or downwards depending upon α being positive or negative and the mean and the variance also increases over time.
Random walk without drift: The random walk without a drift model observes that the values to be predicted at time ‘t’ is equal to last past period’s value plus a random shock.
Yt= Yt-1 + εt
Consider that the effect in one unit shock then the process started at some time 0 with a value of Y0
When t=1
Y1= Y0 + ε1
When t=2
Y2= Y1+ ε2= Y0 + ε1+ ε2
In general,
Yt= Y0+∑ εt
In this case as t increases the variance increases indefinitely whereas the mean value of Y is equal to its initial or starting value. Therefore the random walk model without drift is a non-stationary process.

So, with that we come to the end of the discussion on the Time Series. Hopefully it helped you understand time Series, for more information you can also watch the video tutorial attached down this blog. DexLab Analytics offers machine learning courses in delhi. To keep on learning more, follow DexLab Analytics blog.

.

## Step-by-step guide to building a career in Data Science

With 2.5 quintillion bytes of data being created everyday companies are scrambling to build models and hire experts to extract information hidden in massive unstructured datasets and the data scientists have become the most sought-after professionals in the world.  The job portals are full of job postings looking for data scientists whose resume has the perfect combination of skill and experience. In this world which is being driven by the data revolution, achieving your big data career dreams need a little bit of planning and strategizing. So, here is a step-by-step guide for you.

Grabbing a high paying and skilled data job is not going to be easy, industries will only invest money on individuals with the right skillset. Your job responsibility will involve wading through tons of unstructured data to find pattern and meaning, making forecasts regarding marketing trends, customer behavior and deliver the insight in a presentable format to the company on the basis of which they are going to be strategizing.

So, before you even begin make sure that you have the tenacity and enthusiasm required for the job. You would need to undergo Data science using python training, in order to gain the necessary skills and knowledge and since this is an evolving field you should be ready to constantly upskill yourself and stay updated about the latest developments in the field.

Are you ready? If it’s a resounding yes, then, without wasting any more time let’s get straight to the point and explore the steps that will lead you to become a data scientist.

#### Step 1: Complete education

Before you pursue data science, you must complete your bachelors degree, if you are coming from computer science, applied mathematics, or, economics that could give you a head start. However, you need to undergo Data Science training, post that to acquire the required skillset.

#### Step 2: Gain knowledge of Mathematics and statistics

You do not need to have a PHD in either, but, since both are at the core of the data science you must have a good grasp on applied mathematics and statistics. Your task would require you to have knowledge regarding linear algebra, probability & statistics. So, your first step would be to update yourself and be familiar with the concepts if you happen to hail from a non-science background so that you can sail through the rest of the journey.

#### Step 3: Get ready to do programming

Just like mathematics and statistics, having a grip on a programming language preferably Python, is essential. Now, why do you need to learn coding? Well, coding is important as you have to work with large datasets comprising mostly unstructured data and coding will help you to clean, organize, read data and also process it. Now the stress is on Python because it is one of the widely used languages in the data science community and is comparatively easier to pick up.

#### Step 4: Learn Machine Learning

Machine learning plays a crucial role in data science as it helps finding patterns in data and making predictions. Mastering machine learning techniques would enable you develop algorithms for the models and create an automated system that enables you to make predictions in real-time. Consider undergoing a Machine Learning training gurgaon.

#### Step 5: Learn Data Munging, Visualization, and Reporting

It has been mentioned before that you would mostly be handling unstructured data, which means in order to process that data you must transform that data into a format that is easy to work with. Data munging helps you achieve that. Data visualization is again a must-have skill for a data scientist as it allows you to visually present your data findings that is easy to understand through graphs, charts, while data reporting lets you prepare and present reports for businesses.

#### Step 6: Be certified

Now that the field has advanced so much, there is a requirement for professionals who have undergone Data Science course. Doing a certification course would upskill you and arm you with industry knowledge. Reputed institutes like Dexlab Analytics offer cutting edge courses such as Python for data science training. If you just follow this step it would take care of the rest of the worries, the best part of getting your training is that here you will be taught everything from scratch so, no need to fret if you do not know programming language. Your learning would be aided by hands-on training.

#### Step 7: Practice your skills

You need to test the skills you have acquired and to hone the skills you must explore Kaggle, which lets your access resources you need and this platform also allows you to take part in competitions that further helps you sharpen your abilities. You should also keep on practicing by doing projects in order to put the theories into action.

#### Step 8: Work on your soft skills

In order to be a professional data scientist you must acquire soft skills as well. So along with working on your communication skills, you must also need to develop problem solving skills while learning how business organizations function to understand what would be required of you when you assume the role of a data scientist.

#### Step 9: Get an internship

Now that you have the skill and certification you need experience to get hired, build a resume stressing on the skills you have acquired and search the job portals to land an internship. It would not only enhance your resume, but, it also gives you exposures to real projects, the more projects you handle the better and you would also learn from the experts there.

#### Step 10: Apply for a job

Once you have gathered enough experience start applying for full-time positions as now you have both skill and experience. But, do not stop learning once you land a job, because this field is growing many changes will happen so you have to mold yourself accordingly. Be a part of the community, network with people, keep on exploring  GitHub and find out what other skills you require.

So, those were the steps you need to follow to build a rewarding career in data science. The job opportunities are plenty and to grab the right job you must do big data training in gurgaon. These courses are aimed to prepare individuals for the industry, so get ready for an exciting career!

.

## DexLab Analytics Rated One of The Best Institutes in India

Analytics India Magazine (AIM), one of the foremost journals on big data and AI in India, has rated Dexlab Analytics’ credit risk modelling course one of the best in India and recommended it be taken up to learn the subject in 2020. Dexlab Analytics is on AIM’s list of nine best online courses on the subject.

In an article, the AIM has rated DexLab Analytics as a premier institute offering a robust course in credit risk modelling. Credit risk modelling is “the analysis of the credit risk that helps in understanding the uncertainty that a lender runs before lending money to borrowers”.

The article describes the Dexlab Analytics course as offering learners “an opportunity to understand the measure of central tendency theorem, measures of dispersion, probability theory and probability distribution, sampling techniques, estimation theory, types of statistical tests, linear regression, logistic regression. Besides, you will learn the application of machine learning algorithms such as Decision tree, Random Forest, XGBoost, Support Vector Machine, banking products and processes, uses of the scorecard, scorecard model development, use of scorecard for designing business strategies of a bank, LGD, PD, EAD, and much more.”

The other bodies offering competent courses on the subject on AIM’s list are Udemy, SAS, Redcliffe Training, EDUCBA, Moneyweb CPD HUB, 365 DataScience and DataCamp.

Analytics India Magazine chronicles technological progress in the space of analytics, artificial intelligence, data science & big data by highlighting the innovations, players, and challenges shaping the future of India through promotion and discussion of ideas and thoughts by smart, ardent, action-oriented individuals who want to change the world.

Since 2012, Analytics India Magazine has been dedicated to passionately championing and promoting the analytics ecosystem in India. We have been a pre-eminent source of news, information and analysis for the Indian analytics ecosystem, covering opinions, analysis, and insights on key breakthroughs and future trends in data-driven technologies as well as highlighting how they’re being leveraged for future impact.

Dexlab Analytics has been thriving as one of the prominent institutes offering the best selection of courses on Big Data Hadoop, R Programming, Python, Business Analytics, Data Science, Machine Learning, Deep Learning, Data Visualization using Tableau and Excel. Moreover, it aims to achieve Corporate Training Excellence with each training it conducts.

.

## AI-Smart Assistants: A New Tech Revolution in the Make

2018 has begun. And this year is going to witness a mega revolution in the field of technology – the rise of AI-powered digital assistants. Striking improvements in key technologies, like natural language processing and voice recognition are making smart assistants more productive, helping us use electronic devices just by interacting with them.

Smart voice assistants are going mainstream. From Apple’s Siri to Google’s Assistant to Samsung’s Bixby, superior digital assistants are on a quest to make our lives easier, while taking us a step closer to a world where each one of us will have our own personal, 24/7 –all-ears AI assistants to fulfill our every wish and command.

## Microsoft Excel is Revamping Itself and We Can’t be Happier

Good News: Your favorite Microsoft Excel is about to get a lot smarter, and tech-savvy.

How? Thanks to machine learning and improved connection with the outside world.

Recently, Microsoft’s general manager for Office, Jared Spataro and company’s director of Office 365 ecosystem marketing, Rob Howard talked about how Excel is soon going to understand more about the inputs given and drag out additional information from the internet, as and when deemed necessary.

Today, data lakes are springing up here and there. And with that, the composition structure of data lakes is changing. As more and more data are moving towards cloud, data lakes are shifting focus towards cutting edge sources, like NoSQL, while cloud data warehouses are emerging across hybrid deployments.

A humongous amount of data is being churned out on digital platform each day. IBM says as much as 2.5 quintillion bytes of data is created on a daily basis. Now, this ever-expanding amount of data needs for proper storage system – for that, data lakes have been constructed to hold data in its raw form. In these vast storehouses, data remain mostly in their unstructured state, which is pulled out by data scientists to remodel and transform them into versatile data sets for future use.

## Data Governance: How to Win Over Data and Rule the World

Data is the buzzword. It is conquering the world, but who conquers data: the companies that use them or the servers in which they are stored?

Let’s usher you into the fascinating world of data, and data governance. FYI: the latter is weaving magic around the Business Intelligence community, but to optimize the results to the fullest, it needs to depend heavily on a single factor, i.e. efficient data management. For that, highly-skilled data analysts are called for – to excel on business analytics, opt for Business Analytics Online Certification by DexLab Analytics. It will feed you in the latest trends and meaningful insights surrounding the daunting domain of data analytics.

A substantial part of the Apache project, Hadoop is an open source, Java-based programming software framework that is used for storing data and running applications on different clusters of commodity hardware. Be it any kind of data, Hadoop acts as a massive storage unit backed by gargantuan processing power and an ability to tackle virtually countless tasks and jobs, simultaneously.

In this blogpost, we are going to discuss top 10 Hadoop interview questions – cracking these questions may help you bag the sexiest job of this decade.

#### What are the components of Hadoop?

There are 3 layers in Hadoop and they are as follows:

• Storage layer (HDFS) – Also known as Hadoop Distributed File System, HDFS is responsible for storing various forms of data as blocks of information. It includes NameNode and DataNode.
• Batch processing engine (MapReduce) For parallel processing of large data sets across a standard Hadoop cluster, MapReduce is the key.
• Resource management layer (YARN) Yet Another Resource Negotiator is the powerful processing framework in Hadoop system that keeps a check on the resources.

Hadoop distribution includes a generic application programming interface for drawing MapReduce jobs in programming languages like Ruby, Python, Perl, etc. and this is known as Hadoop streaming.

#### What are the different modes to run Hadoop?

• Local (standalone) Mode
• Pseudo-Distributed Mode
• Fully-Distributed Mode

#### How to restart Namenode?

Begin by clicking on stop-all.sh and then on start-all.sh

OR

Write sudo hdfs (then press enter), su-hdfs (then press enter), /etc/init.d/ha (then press enter) and finally /etc/init.d/Hadoop-0.20-name node start (then press enter).

#### How can you copy files between HDFS clusters?

Use multiple nodes and the distcp command to ensure smooth copying of files between HDFS clusters.

#### What do you mean by speculative execution in Hadoop?

In case, a node executes a task slower, the master node has the ability to start the same task on another node. As a result, the task that finishes off first will be accepted and the other one will be rejected. This entire procedure is known as “speculative execution”.

#### What is “WAL” in HBase?

Here, WAL stands for “Write Ahead Log (WAL)”, which is a file located in every Region Server across the distributed environment. It is mostly used to recover data sets in case of mishaps.

#### How to do a file system check in HDFS?

FSCK command is your to-go option to do file system check in HDFS. This command is extensively used to block locations or names or check overall health of any files.

Follow

hdfs fsck /dir/hadoop-test -files -blocks –locations

#### What sets apart an InputSplit from a Block?

A block divides the data, physically without taking into account the logical equations. This signifies you can posses a record that originated in one block and stretches over to another. On the other hand, InputSplit includes the logical boundaries of records, which are crucial too.

#### Why should you use Storm for Real-Time Processing?

• Easy to operate simple operating system makes it easy
• Fast processing it can process around 100 messages per second per node
• Fault detection it can easily detect faults and restarts functional attributes
• Scores high on reliability expect execution of each data unit at least for once
• High scalability it operates throughout clusters of machines

The article has been sourced from

Learn how Big Data Hadoop can help you manage your business data decisions from DexLab Analytics. We are a leading Big Data Hadoop training institute in Delhi NCR region offering industry standard big data related courses for data-aspiring candidates.