Dexlab, Author at DexLab Analytics | Big Data Hadoop SAS R Analytics Predictive Modeling & Excel VBA - Page 78 of 80

Delimiters And Delimited Data in SAS

In this blog post we will delve into the world of delimiters as found in the SAS system of data analysis tools. Delimiters are an essential part of SAS without whose guidance SAS would be in some blind inspite of all the data that surrounds it as the data supplied to it either internally or as an external file.

 

sas courses

What are Delimiters?

Delimiters are essentially symbols to SAS that lets SAS know that the data is separate. They distinguish one set or category of data from others. This should give you an idea about how essential a part of data analytics, a delimiter plays.

How are the Delimiters Symbolized?

SAS accepts the following symbols and key strokes as delimiters:

  •  ,
  •  :
  • Tab
  •  ~
  • &
  •  –

How to Import Delimited Data?

This command imports data infile

Data claim_data;

infile datalines dlm = “,”;

Input sex $ name $ claim_amount ; datalines;

Male,Mahesh,15000

Male,Naveen,10000

Female,Neeta,18000

Male,Amit,7500

Female,Geeta,12000

;

run;

data claim_data ;

Infile E:\Project FT\SAS\Course Material\Class 1\Claim Data comma.txt’ dlm = “,”;

Input sex $ name $ claim_amount ;

Run;

What are the Functions of Delimiters?

An INFILE option, the DSD or delimiter-sensitive data serves varied functions. They are as follows:

  • The default delimiters are changed to wanted ones from the default blank.
  • In case a row contains two delimiters, SAS interprets that there is an instance of a case of missing value.
  •  Delimiters also strip the quotes within which character values are placed.

So the command would boil down to:

data claim_data ;

Infile E:\Project FT\SAS\Course Material\Class 1\Claim Data comma.txt’ dsd;

Input sex $ name $ claim_amount ;

Run

How to Read Data from an External CSV file?

Our next task is to read data from external CSV files. In order to do so we have to input the following:

proc import datafile = “E:\Project FT\SAS\Course Material\Class 1\exam_results.csv”

dbms = csv replace out = class_10_result; Getnames = yes; run;

In a Nutshell

What are Format, Informat and DSD?

  • Informat : This command instructs SAS how exactly to going about reading the data.
  • Format : This instructs SAS about the exact way in which to show the details.
  • DSD : This defines how data is separated by a delimiter.

The Role of Big Data in the Largest Database of Biometric Information

BIG DATA

Aadhaar project from our very own India happens to on the most ambitious projects relying on Big Data ever to be undertaken. The goal is for the collection, storage and utilization of the biometric details of a population that has crossed the billion mark years ago. It is needless to say that a project of such epic proportions presents tremendous challenges but also gives rise to an incredible opportunity according to MapR, the company that is serving the technology behind the execution of this project.

2

Aadhaar is in its essence a 12 digit number assigned to a person / an individual by the UIDA , the abbreviated form of “Unique Identification Authority of India” The project was born in 2009 and had former Infosys CEO and co-founder Nandan Nilekani as its first chairman and the architect of this grand project which needed much input in terms of the tech involved.

The intention is to make it an unique identifier for all Indian citizens and prevent the use of false identities and fraudulent activities. MapR which is head-quartered in California is the distributor and developer of “Apache APA +0.00% Hadoop” has been putting into use its extensive experience in integrating web-scale enterprise storageand real-time database tech, for the purposes of this project.

According to John Schroeder who is the CEO and co-founder of MapR, the project presents multiple challenges including analytics, storage and making sure that the data involved remains accurate and secure amidst authentications that amount to several millions over the course of each passing day.Individual persons are provided with their number and a iris-scan or fingerprint is taken so that their identity might be proved and queried to and matched from the database backbone to a headshot photo of the person. Each day witnesses over a hundred million verifications of identity and all this needs to be done in real-time in about 200 milliseconds.

India has a percentage of rural population many of which are yet to be connected to the digital grid and as Schroeder continues the solution had to be economical and be reliable even under low bandwidth situations and technology behind it needed to be resilient which would work even with areas with low levels of connectivity.

6

For more information on big data and big data hadoop courses, peruse through the official site of DexLab Analytics. It is a major Big Data Hadoop institute in Gurgaon.

 

Source: Forbes

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Hadoop adopted, but not for Analytics?

In this era of Big Data, where we are dealing with wide variety of data having features like velocity,volume, veracity and unstructured data coming in from different sources in different formats like xml, logs, videos, images both structured and unstructured.

Everyone is talking about Hadoop’s capabilities in Big Data Analytics, fact is Hadoop is mostly being adopted for low cost Data Storage and ETL.

Hadoop_Analytics

Continue reading “Hadoop adopted, but not for Analytics?”

Ms Excel VBA May Be Used To Predict Sales

banner 11

MS Excel VBA has uses in various facets of day to day activities of businesses. But it is a little known factthat this tool may also be used to carry out advanced functions like predicting future sales. This blogpost will try to represent in as an illustrative way as possible how it manages to do the same with thelimitations imposed by the format of this text based post.Suppose we have sales data of two sets for 24 periods within the time range of January 2013 to the December of 2014.

What we are trying to do is to utilize the function LINEST in order to predict sales for the year 2015 by making use of method of least squares as well as regression analysis. The elementary knowledge of this function suggests that it is worth sharing. In the lines that follow we will try to examine the rudiments of this function and the formula that we will put into use in the calculations we make.

The use of the LINEST function lies in regression analysis in order to make calculations about a line that uses the least squares method and return a straight line that is best fitted to the data that you input and outputs an array that serves as a description of the particular line.

 

The equation that represents the line is: y = mx + b

 

Here m stands for the slope, x represents the period of time dealt with the data and b is the y intercept.It is to be noted that there is necessity of having a column that indicates the period number of the existing as well as future sales. It should also be noted that while you use the LINEST function so that you may calculate the values of the Y- Intercept along with that of the slope those cells need to reside side by side.

So, what one has to do is to highlight the two cells that you want to use in order to make calculations of the Y-Intercept and the Slope before typing in the LINEST function. In such cases all that is needed is to include the “y”s that are already known. It is up to you whether you want to provide other arguments as they are optional. Then all you need to do is press Ctrl+Shift+Enter and you get the Y-Intercept and the Slope.

The magic that takes place in the background is basically this, by making use of the Slope and the Y-Intercept Excel does its crystal ball gazing and takes the last sales data of the precious 24 months and creates a straight-line that forecasts the trend that is likely to persist in the coming 12 months.

2

What Excel actually does is that it takes the actual values and manipulates them slightly to that is able to come up with a straight-line that serves as the trend line that extends out to the periods of forecast and thus the actual values and the values of the forecast have trends that are similar.This is the basics of crystal ball gazing through the use of MS Excel.

 

Want to shine bright in MS Excel? Visit DexLab Analytics – their Advanced Excel course is truly remarkable. Enrol for Advanced Excel training today.

 

Interested in a career in Data Analyst?

To learn more about Machine Learning Using Python and Spark – click here.
To learn more about Data Analyst with Advanced excel course – click here.
To learn more about Data Analyst with SAS Course – click here.
To learn more about Data Analyst with R Course – click here.
To learn more about Big Data Course – click here.

10 Things You Might Not Have Known About Your Favorite SAS Authors

10 Things You Might Not Have Known About Your Favorite SAS Authors

If you are into the ecosphere of SAS it is most probable that at some point you have read their SAS books. But now it is time to be illuminated about a previously unknown side of your favorite SAS author.

  • Tricia Aanderud

    Tricia has the distinction of having 100 jokes committed to memory including some that would fall into the category of dubious taste.

  • Bill Benjamin

    Bill contemplated photography, art and even journalism classes during high school. When giving computers too a thought, the golden idea of flipping a coin to decide his destiny came upon him. And the result benefited the world of SAS.

  • Patricia Berglund

    She is game for adventures regarding sled dogs and racing too! Under her adaptation is a sled dog,  now retired from the wild sport.

  • Chuck Boiler

    Chuck has had a brief experience in teaching grade school where he fell in love with learning which he pursues through his popular SAS books.

  • Iain Brown

    Iain digs anything that is remotely associated with football and uses all the SAS tools to pick and maintain his very own fantasy team.

  • Michele Burlew

    Her SAS macro quoting functions comparison to cloaking devices referred to in Star Trek made by her husband found its way in the first edition of the book “SAS Macro Programming Made Easy”.

  • 2

  • Art Carpenter

    A backpacker by heart,  Art once hiked through the John Muir Trail along with his daughter.

  • Charlie Chase

    An avid jazz aficionado he has had the opportunity to watch Winton Marsalis, Dizzy Gillespie, David Benoit and Grover Washington Jr, his personal favorite,  in action.

  • Ron Cody

    One of the ruling passions of John is scuba diving and he is the sort who will work his way through datasets while being on vacation.

  • Lora D. Delwiche

    Lora spent a few months in Israel and Belgium while her husband took a long leave from his university job.

Endwords

Besides getting into the details of SAS authors, get yourself enrolled in state-of-the-art SAS courses. DexLab Analytics boasts of the best SAS certification and it will provide more of such trivia in the days to come. So stay tuned.

 

Interested in a career in Data Analyst?

To learn more about Machine Learning Using Python and Spark – click here.
To learn more about Data Analyst with Advanced excel course – click here.
To learn more about Data Analyst with SAS Course – click here.
To learn more about Data Analyst with R Course – click here.
To learn more about Big Data Course – click here.

Sure shot Ways to Crack Big Data Interviews

Sure shot Ways to Crack Big Data Interviews

If you are a Big Data analyst looking for open position in the entry to mid level range of experience then you should prepare yourself with the following resources in your arsenal before you storm an interview with all guns blazing.

  • Adequate Expertise of Analytical tools like SAS for the processing of data

Make sure that you assign most of the time you have set aside for the preparation of your upcoming interview to brush up your knowledge regarding the tools of analytics that are relevant in your context. Ensure that you acquire proficiency in the analytics tool of your choice. For positions of junior levels the importance of expertise with a particular analytical tool like Hadoop, R or SAS cannot be overstressed. In such circumstances the focus centers around data preparation and processing. It is highly advisable that you review concepts related to the import and manipulation of data, the ability to read data even if it not standard say for example data whose input file types are multiple in number and mixed data formats. You also get to show off your skills at efficiently joining multiple datasets, selecting conditionally the observations or rows of data, how to go about heavy duty data processing of which SQL or macros are the most critical.

  • Make a Proper Review of End to End Business Process

This is most relevant towards candidates who have prior experience at working in the Big Data and Analytics industry. Prior experience inevitably gives rise to interviewers wanting to know more about the responsibilities that you shouldered and your role in the business process and how you fitted in the context of the broader picture. You should be able to convey to the interviewer that the data source is understood by you along with its processing and use.

  • A solid concept of the rudiments of statistics and algorithms

Again this tip is also for those with prior experience. Recruiters seek to know whether you are aware of issues likely to be faced by you while you confront problems regarding data and business. Even freshers are expected to know the fundamental concepts of statistics like rejection criteria, hypothesis testing outcomes, measures of model validation and the statistics related assumptions that a candidate must know about in order to implement algorithms of various sorts. In order to crack the interview you must be prepared with adequate knowledge of concepts related to statistics.

  • Prepare Yourself with At Least 2 Case Studies related to Business

The person on the other side of the interview table will undoubtedly try to make an assessment about your knowledge as far as business analytics is concerned and not solely to the proficiency you command in your tool of choice. Devote time to review projects on analytics you already have worked on if you have prior experience. Be prepared to elucidate on the business problem, the steps that were involved in the processing of data and the algorithm put into use in the creations of the models and reasons behind, and the way the results of the model was implemented. The interviewer might also ask about the challenges faced by you at any stage of the whole process, so keep in mind the issues faced by you in the past and their eventual resolution.

2

  • Make Sure that Your Communication Remains Effective

If you are unable to effectively communicate then no much diligent preparations you make, they will be of no use. You can try out mock interviews and answering questions that the recruiter might ask. Spare yourself of the trouble of framing effective answers at the moment when the question is asked during an interview. Though you perhaps will be unable to anticipate each and every question, nevertheless but prior preparation will result in better and more coherent answers.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

The Pros and Cons of HIVE Partitioning

The Pros and Cons of HIVE Partitioning Hive organizes data using Partitions. By use of Partition, data of a table is organized into related parts based on values of partitioned columns such as Country, Department. It becomes easier to query certain portions of data using partition.

Partitions are defined using command PARTITIONED BY at the time of the table creation.

We can create partitions on more than one column of the table. For Example, We can create partitions on Country and State.

2

Syntax:

CREATE [EXTERNAL] TABLE table_name (col_name_1 data_type_1, ….)

PARTITIONED BY (col_name_n data_type_n , …);

Following are features of Partitioning:

  • It’s used for distributing execution load horizontally.
  • Query response is faster as query is processed on a small dataset instead of entire dataset.
  • If we selected records for US, records would be fetched from directory ‘Country=US’ from all directories.

Limitations:

  • Having large number of partitions create number of files/ directories in HDFS, which creates overhead for NameNode as it maintains metadata.
  • It may optimize certain queries based on where clause, but may cause slow response for queries based on grouping clause.

It can be used for log analysis, we can segregate the records based on timestamp or date value to see the results day wise / month wise.

Another use case can be, Sales records by Product –type , Country and month.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

The Professional Career Graph of a Data Scientist

It is indeed not a hard task to get hold of surveys of salaries of data scientists at senior and junior levels alike according to the place of work as well as the skill set possessed by the individual there are few readily available analysis of how the salary of a data scientist progressed over the course of careers than spanned over 25 years. This post seeks to fill in that gap by examining the career of Vincent Granville, a data scientist considered with high esteem in the Big Data industry.
The Professional Career Graph of a Data Scientist

Continue reading “The Professional Career Graph of a Data Scientist”

Why “R in Action” Sizzles in Smoke among R Programming Aficionados

Data Scientists are much in demand these days and almost everyone wants a part of the pie. You might wonder whether the second edition of “R in Action: Data Analysis and Graphics with R” up to the task. Read on to find out.

The first edition was released way back in 2011 and received a warm reception among those with an interest in R Programming. To state the review of the second edition in a nutshell- it does not disappoint.

Geared towards statisticians coming to terms with R, this book will however leave programmers wanting to get acquainted with R, without much clue.

Why “R in Action” Sizzles in Smoke among R Programming Aficionados

That does not mean that the book lacks detail about the fundamentals and intricate concepts associated with R. It just means that those who actually have some statistical work to do will derive greater benefits from it. This book is pitched towards the people who are wondering about how to do a certain task that they are able to do in another statistical package.

Continue reading “Why “R in Action” Sizzles in Smoke among R Programming Aficionados”

Call us to know more