R programming certification Archives - Page 3 of 3 - DexLab Analytics | Big Data Hadoop SAS R Analytics Predictive Modeling & Excel VBA

Introducing The New R Tools For Visual Studio

Introducing The New R Tools For Visual Studio

It is a great new development that the new Visual Studio now speaks the R Language!

Here is how:

RTVS-1

 

Decidedly now R is the most popular statistical data analysis language which is in use these days. The R tools for Visual Studio brings together the power of R and Visual Studio in the same pod, for a convenient, and easy to use plug-in that is not only free and open source, but is very user friendly. When it is combined with the powers of Visual Studio Community Edition, then you will receive a multilingual IDE, which is perpetually free for all small teams.

 

In order to showcase and inspire testing and evaluation from the developer community, the R tools package for Visual Studios has been launched as a public preview version.

 

R Programming courses in Gurgaon with placement assistance is provided by DexLab Analytics.

 

Here are the new exciting features being introduced in this preview release version:

 

  • Editor – this is a complete package for fine editing experience finished with R scripts and functions, which also include detachable/ tabbed windows, syntax highlighting and a lot more.
  • IntelliSense – this is also known as auto-completion and is available in both the editor as well as the Interactive R window
  • R Interactive Window – with this you can work directly with R console from within the Visual Studio
  • History window – one can search, view, and select previous commands and then send it to the Interactive Window.
  • A variable explorer – now get the advantage to drill deep into your R data structures and examine their values
  • Plotting – now check all your R plots within a Visual Studio tool window
  • Debugging – stepping, breakpoints, watch windows, call stacks and much more
  • R markdown ­– get to use R Markdown/knitr support with export to Word and HTML
  • Git – get control over source code through Git and GitHub
  • Extensions – more than 6000 extensions covering a wide spectrum from Data to Productivity to Language
  • Help – view R documentation with the use of ? and ?? in Visual Studio itself
  • A polyglot IDE – VS supports, R, Python, C and C++, C#, Node.js, SQL, etc projects can be managed simultaneously.

 

Some other features that were requested by the R developer community are the Package Manager GUI, Visual Studio Code (cross-plat), and more, which will be a part of one of our future updates.

Now use Azure ML SDK:

Now you can use the R SDK with the RTVS to access all your datasets and also workspaces on the Azure ML. You can use the environment to build and test the models locally and easily operationalize them at scale on Azure.

RTVS3-final

This SDK is not tied to RTVS, but it can be used from any environment to publish models to Azure ML.

Conclusion:

This new element to the analytics offerings viz. a powerful R authoring environment post their previous announcements of Microsoft R Open and Microsoft R server announcements that took place last year is an exciting development.

Let’s Take Your Data Dreams to the Next Level

For more exciting news on RTVS stay tuned to our regular blogs, because the time has never been better to be a data analyst.

Get R language certification in Delhi from industrial experts with years’ worth of experience at DexLab Analytics.

 

Interested in a career in Data Analyst?

To learn more about Machine Learning Using Python and Spark – click here.
To learn more about Data Analyst with Advanced excel course – click here.
To learn more about Data Analyst with SAS Course – click here.
To learn more about Data Analyst with R Course – click here.
To learn more about Big Data Course – click here.

How is data science helping NFL players win Super bowl?!

Recently, a discussion was held, which invited data scientists and analysts all over the world, to take part in the Science of Super Bowl discussion panel, this discussion was held by Newswise.

Data Science in Super bowl

We found one notable discussion topic, which answered three very important questions related to data science that the sports industry could use:

Continue reading “How is data science helping NFL players win Super bowl?!”

Making a Histogram With Basic R Programming Skills

DexLab Analytics over the course of next few weeks will cover the basics of various data analysis techniques like creating your own histogram in R programming. We will explore three options for this: R commands, ggplot2 and ggvis. These posts are for users of R programming who are in the beginner or intermediate level and who require accessible and easy to understand resources.

 

Making a Histogram With Basic R Programming Skills

 

Seeking more information? Then take up our R language training course in Gurgaon from DexLab Analytics.

 

What is a histogram?

A histogram is a category of visual representation of a dataset distribution. As such the shape of a histogram is its most common feature for identification. With a histogram one will be able to see which factor has the relatively higher amount of data and which factors or segments have the least.

 

Or put in simpler terms, one can see where the middle or median is in a data distribution, and how close or farther away the data would lie around the middle and where would the possible outliers be found. And precisely because of all this histograms will be the best way to understand your data.

 

But what can a specific shape of a histogram tell us? In short a typical histogram consists of an x-axis and a y-axis and a few bars of varying heights. The y-axis will exhibit how frequently the values on the x-axis are occurring in the data. The y-axis showcases the frequency of the values on the x-axis where the data occurs, the bar group ranges of either values or continuous categories on the x-axis. And the latter explains why the histograms do not have any gaps between the bars.

 

Let’s Take Your Data Dreams to the Next Level

How can one make a histogram with basic R?

Step 1: Get your eyes on the data:

As histograms require some amount of data to be plotted initially, you can carry that out by importing a dataset or simply using one which is built into the system of R. In this tutorial we will make use of 2 datasets the built-in R dataset AirPassengers and another dataset called as chol, which is stored into a .txt file and is available for download.

Step 2: Acquaint yourself with The Hist () function:

One can make a histogram in R by opting the easy way where they use The Hist () function, which automatically computes a histogram of the given data values. One would put the name of their dataset in between parentheses to use this function.

Here is how to use the function:

hist(AirPassengers)

 

But if in case, you want to select a certain column of a data frame like for instance in chol, for making a histogram. The hist function should be used with the dataset name in combination with a $ symbol, which should be followed by the column name:

 

2

Here is a specimen showing the same:

hist(chol$AGE) #computes a histogram of the data values in the column AGE of the dataframe named “chol”

Step 3: Up the level of the hist () function:

You may find that the histograms created with the previous features seem a little dull. That is because the default visualizations do not contribute much to the understanding of the histograms. One may need to take one more step to reach a better and easier understanding of their histograms. Fortunately, this is not too difficult to accomplish, R has several allowances for easy and fast ways to optimize the visualizations of the diagrams while still making use of the hist () function.

To adapt your histogram you will only need to add more arguments to the hist () function, in this way:

hist(AirPassengers,
     main="Histogram for Air Passengers",
     xlab="Passengers",
     border="blue",
     col="green",
     xlim=c(100,700),
     las=1,
     breaks=5)

This code will help to compute a histogram of data values from the dataset AirPassengers, with the name “Histogram for Air Passengers” as the title. The x-axis would be labelled as ‘Passengers’ and will have a blue border with a green colour to the bins, while limiting the x-axis with a range of 100 to 700 and rotating the printed values on the y-axis by 1 while changing  the bin width by 5.

We know what you are thinking – this is a humungous string of code. But do not worry, let us break it down into smaller pieces to see what each component holds. 

Name/colours:

You can alter the title of the histogram by adding main as an argument to the hist () function.

This is how:

hist(AirPassengers, main=”Histogram for Air Passengers”) #Histogram of the AirPassengers dataset with title “Histogram for Air Passengers”

For adjusting the label of the x-axis you can add xlab as the feature. Similarly one can also use ylab to label the y-axis.

This code would work:

hist(AirPassengers, xlab=”Passengers”, ylab=”Frequency of Passengers”) #Histogram of the AirPassengers dataset with changed labels on the x-and y-axes hist(AirPassengers, xlab=”Passengers”, ylab=”Frequency of Passengers”) #Histogram of the AirPassengers dataset with changed labels on the x-and y-axes

If in case you would want to change the colours of the default histogram you can simply choose to add the arguments border or col. Adjusting would be easy, as the name itself kind of gives away the borders and the colours of the histogram.

hist(AirPassengers, border=”blue”, col=”green”) #Histogram of the AirPassengers dataset with blue-border bins with green filling

Note: you must not forget to put the names and the colours within “ ”.

For x and y axes:

To change the range of the x and y axes one can use the xlim and the ylim as arguments to the hist function ():

The code to be used is:

hist(AirPassengers, xlim=c(100,700), ylim=c(0,30)) #Histogram of the AirPassengers dataset with the x-axis limited to values 100 to 700 and the y-axis limited to values 0 to 30

Point to be noted in this case, is the c() function is used for delimiting the values on the axes when one is suing the xlim and ylim functions. It takes 2 values the first being the begin value and the second being the end value.

Make sure to rotate the labels on the y-axis by adding 1as=1 as the argument, the argument 1as can be 0, 1, 2 or 3.

The code to be used:

hist(AirPassengers, las=1) #Histogram of the AirPassengers dataset with the y-values projected horizontally

 

Depending on the option one chooses the placement of the label will vary: like for instance, if you choose 0 the label will always be parallel to the axis (the one that is the default). And if one chooses 1, The label will be horizontally put. If you want the label to be perpendicular to the axis then pick 2 and for placing it vertically select 3.

For bins:

One can alter the bin width by including breaks as an argument, in combination with the number of breakpoints which one wants to have.

This is the code to be used:

hist(AirPassengers, breaks=5) #Histogram of the AirPassengers dataset with 5 breakpoints

If one wants to have increased control over the breakpoints in between the bins, then they can enrich the breaks arguments by adding in it vector of breakpoints, one can also do this by making use of the c() function.

hist(AirPassengers, breaks=c(100, 300, 500, 700)) #Compute a histogram for the data values in AirPassengers, and set the bins such that they run from 100 to 300, 300 to 500 and 500 to 700.

But the c () function can help to make your code very messy at times, which is why we recommend using add = seq(x,y,z) instead. The values of x, y and z are determined by the user and represented in a specific order of appearance, the starting number of x-axis  and the last number of the same as well as the intervals in which these numbers are to appear.

A noteworthy point to be mentioned here is that one can combine both the functions:

hist(AirPassengers, breaks=c(100, seq(200,700, 150))) #Make a histogram for the AirPassengers dataset, start at 100 on the x-axis, and from values 200 to 700, make the bins 150 wide

Here is the histogram of AirPassengers:

Here is the histogram of AirPassengers:
How to Make a Histogram with Basic R – (Image Courtesy  r-bloggers)

Please note that this is the first blog tranche in a list of 3 posts on creating histograms using R programming.

For more information regarding R language training and other interesting news and articles follow our regular uploads at all our channels. 

 
This post originally appeared onwww.r-bloggers.com/how-to-make-a-histogram-with-basic-r
 

Interested in a career in Data Analyst?

To learn more about Machine Learning Using Python and Spark – click here.

To learn more about Data Analyst with Advanced excel course – click here.
To learn more about Data Analyst with SAS Course – click here.
To learn more about Data Analyst with R Course – click here.
To learn more about Big Data Course – click here.

What Makes A Data Analyst?

Recently a world’s top magazine crowned the job role of Data Analyst as the sexiest job of the 21st Century. But there is still much confusion as to why this title is so much in demand in the job market presently and what makes it such a desirable gig. So, we would like to take this opportunity to describe in detail what the job role of data analysts entail, why is it so much in demand, what can an aspiring data analyst expect in this job, what are the payment packages and what the future holds for this breed of pseudo-scientists. Continue reading “What Makes A Data Analyst?”

Is Data an Asset or Liability

While many firms are stating that they leverage their data to gain valuable insights and translate them into profit. But the basic question remains whether data is an asset or a liability. This is the mind-numbing question that haunts all IT managers and must be given ample concentration on what is collected in terms of data and how can it be managed efficiently.

 

Is data an asset or liability

 

There can be two approaches to answer this question, the first being that data could be an asset if used ethically and correctly. But when no actionable insight can be gathered from data, it is a liability in the same lines as an old non-performing loan. Optimum use of data is elemental to the operations of any data driven initiative. The main reason behind this data-drive remains to be to obtain faster and better decision making abilities with more accuracy. Nowadays organizations across the board leverage their data to achieve their goals. Currently sales organizations are the frontrunners who mine their data to get the best results and maximize their revenue from already customers. Also crediting companies use their data to evaluate the risks associated with different individual debtors and then act accordingly when setting rates and fees for their loans that seem to be fair based on this information. In these scenarios the companies use real information to make decisions.

Continue reading “Is Data an Asset or Liability”

New R Packages- 5 Reasons for Data Scientists to Rejoice

5-Reasons-for-Data-Scientists-to-Rejoice

One of the fundamental advantages of the ecosystem related to R and the primary reason that lie behind the phenomenal growth of R is the practice and facility to contribute new packages to R. When this is added to the highly stable CRAN which happens to be the primary repository of packages of R,gives it a great advantage. The effectiveness of CRAN is further enhanced by the ability of people with sufficient technical expertise and to contribute packages through a proper system of submission.

It is only with sufficient effort and time that one realizes the system of packages submitted through proper procedures can yield integrated software of high quality.Even those who are relatively new to R Programming the process of discovering the packages that serves as the bedrock of R language growth. Such packages add value to the language in a reliable way.

2

The following 5 new packages listed in the paragraphs that follow may trigger the curiosity of data scientists.

  •  AzureML V0.1.1

Cloud computing is and will continue to be of great interest to all data scientists. The AzureML provides Python and R Programmers a rich environment for machine learning. If you are yet to be initiated to Azure as a user this package will go long ways in helping you get started. It provides functions that let you push R code from your local system to the Azure cloud in addition to publishing models and functions as web services.

  •  Distcomp V0.25.1

Using distributed computing when dealing with large sets of data is invariable an irksome problem. This is truer in cases where sharing data amongst collaborators is difficult or simply not possible. The distcomp package implements a crafty partial likelihood algorithm which lets users build statistical models of complexity and sophistication on data sets that are not aggregated.

  • RotationForest V0.1

If there is any primary ensemble method that performs well on diverse sets of data on a constant basis is the forests algorithm. This particular variety performs principal analysis of components on subsets taken at random in the feature space and holds great promise.

  • Rpca V0.2.3

In case there is a matrix that forms a superposition of a component that is lowly ranked along with a sparse component, rcpa calls in a robust PCA method that recovers all of these components. The algorithm was publicized by the data scientists at Netflix.

  •  SwarmSVM V0.1

One of the primary machine learning algorithm happens to be the support vector machine. SwarmSVM has for its basis an approach that may be said to be as a clustering approach and makes provisions for 3 different ensemble methods that train support vector machines. A practical introduction to this particular method is also attached with the vignette that comes with the package.

For more such interesting technical blogs and insights, follow us at DexLab Analytics. We are a pioneering R programming training institute. Our industry experts impart the best possible R programming courses, so when are you contacting us!!

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Twelve Great Free R Programming E-books

To Big Data enthusiasts R is word or rather a letter that needs no introduction. R programming is a programming language that brings the complex world of statistics and datasets at your fingertips. It is mainly used for computing statistics and relevant graphics. The following twelve e-books are not only useful to bring you up to the task for R programming but best of all they are free.

 

Twelve Great Free R Programming E-books

 

  • Learning Statistics with R
    Author: Daniel Navarro

If you are looking for a guide that will take you through the intricacies of developing software with R be it the basic types and structures of data to more complex topics like recursion, closures as well as anonymous functions. Knowledge of statistics, although helpful, is not an essential pre-requisite .

Continue reading “Twelve Great Free R Programming E-books”

Call us to know more