Dexlab, Author at DexLab Analytics | Big Data Hadoop SAS R Analytics Predictive Modeling & Excel VBA - Page 52 of 80

Let’s Make Visualizations Better In Python with Matplotlib

Let’s Make Visualizations Better In Python with Matplotlib

Learn the basics of effective graphic designing and create pretty-looking plots, using matplotlib. In fact, not only matplotlib, I will try to give meaningful insights about R/ggplot2, Matlab, Excel, and any other graphing tool you use, that will help you grasp the concepts of graphic designing better.

Simplicity is the ultimate sophistication

To begin with, make sure you remember– less is more, when it is about plotting. Neophyte graphic designers sometimes think that by adding a visually appealing semi-related picture on the background of data visualization, they will make the presentation look better but eventually they are wrong. If not this, then they may also fall prey to less-influential graphic designing flaws, like using a little more of chartjunk.

 

Data always look better naked. Try to strip it down, instead of adorning it.

Have a look at the following GIF:

“Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away.” – Antoine de Saint-Exupery explained it the best.

Color rules the world

The default color configuration of Matlab is quite awful. Matlab/matplotlib stalwarts may find the colors not that ugly, but it’s undeniable that Tableau’s default color configuration is way better than Matplotlib’s.

Get Tableau certification Pune today! DexLab Analytics offers Tableau BI training courses to the aspiring candidates.

Make use of established default color schemes from leading software that is famous for offering gorgeous plots. Tableau is here with its incredible set of color schemes, right from grayscale and colored to colorblind friendly.

A plenty of graphic designers forget paying heed to the issue of color blindness, which encompasses over 5% of the graphic viewers. For example, if a person suffers from red-green color blindness, it will be completely indecipherable for him to understand the difference between the two categories depicted by red and green plots. So, how will he work then?

 

For them, it is better to rely upon colorblind friendly color configurations, like Tableau’s “Color Blind 10”.

 

To run the codes, you need to install the following Python libraries:

 

  1. Matplotlib
  2. Pandas

 

Now that we are done with the fundamentals, let’s get started with the coding.

 

percent-bachelors-degrees-women-usa

 

import matplotlib.pyplot as plt
import pandas as pd

# Read the data into a pandas DataFrame.  
gender_degree_data = pd.read_csv("http://www.randalolson.com/wp-content/uploads/percent-bachelors-degrees-women-usa.csv")  

# These are the "Tableau 20" colors as RGB.  
tableau20 = [(31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),  
             (44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),  
             (148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),  
             (227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),  
             (188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]  

# Scale the RGB values to the [0, 1] range, which is the format matplotlib accepts.  
for i in range(len(tableau20)):  
    r, g, b = tableau20[i]  
    tableau20[i] = (r / 255., g / 255., b / 255.)  

# You typically want your plot to be ~1.33x wider than tall. This plot is a rare  
# exception because of the number of lines being plotted on it.  
# Common sizes: (10, 7.5) and (12, 9)  
plt.figure(figsize=(12, 14))  

# Remove the plot frame lines. They are unnecessary chartjunk.  
ax = plt.subplot(111)  
ax.spines["top"].set_visible(False)  
ax.spines["bottom"].set_visible(False)  
ax.spines["right"].set_visible(False)  
ax.spines["left"].set_visible(False)  

# Ensure that the axis ticks only show up on the bottom and left of the plot.  
# Ticks on the right and top of the plot are generally unnecessary chartjunk.  
ax.get_xaxis().tick_bottom()  
ax.get_yaxis().tick_left()  

# Limit the range of the plot to only where the data is.  
# Avoid unnecessary whitespace.  
plt.ylim(0, 90)  
plt.xlim(1968, 2014)  

# Make sure your axis ticks are large enough to be easily read.  
# You don't want your viewers squinting to read your plot.  
plt.yticks(range(0, 91, 10), [str(x) + "%" for x in range(0, 91, 10)], fontsize=14)  
plt.xticks(fontsize=14)  

# Provide tick lines across the plot to help your viewers trace along  
# the axis ticks. Make sure that the lines are light and small so they  
# don't obscure the primary data lines.  
for y in range(10, 91, 10):  
    plt.plot(range(1968, 2012), [y] * len(range(1968, 2012)), "--", lw=0.5, color="black", alpha=0.3)  

# Remove the tick marks; they are unnecessary with the tick lines we just plotted.  
plt.tick_params(axis="both", which="both", bottom="off", top="off",  
                labelbottom="on", left="off", right="off", labelleft="on")  

# Now that the plot is prepared, it's time to actually plot the data!  
# Note that I plotted the majors in order of the highest % in the final year.  
majors = ['Health Professions', 'Public Administration', 'Education', 'Psychology',  
          'Foreign Languages', 'English', 'Communications\nand Journalism',  
          'Art and Performance', 'Biology', 'Agriculture',  
          'Social Sciences and History', 'Business', 'Math and Statistics',  
          'Architecture', 'Physical Sciences', 'Computer Science',  
          'Engineering']  

for rank, column in enumerate(majors):  
    # Plot each line separately with its own color, using the Tableau 20  
    # color set in order.  
    plt.plot(gender_degree_data.Year.values,  
            gender_degree_data[column.replace("\n", " ")].values,  
            lw=2.5, color=tableau20[rank])  

    # Add a text label to the right end of every line. Most of the code below  
    # is adding specific offsets y position because some labels overlapped.  
    y_pos = gender_degree_data[column.replace("\n", " ")].values[-1] - 0.5  
    if column == "Foreign Languages":  
        y_pos += 0.5  
    elif column == "English":  
        y_pos -= 0.5  
    elif column == "Communications\nand Journalism":  
        y_pos += 0.75  
    elif column == "Art and Performance":  
        y_pos -= 0.25  
    elif column == "Agriculture":  
        y_pos += 1.25  
    elif column == "Social Sciences and History":  
        y_pos += 0.25  
    elif column == "Business":  
        y_pos -= 0.75  
    elif column == "Math and Statistics":  
        y_pos += 0.75  
    elif column == "Architecture":  
        y_pos -= 0.75  
    elif column == "Computer Science":  
        y_pos += 0.75  
    elif column == "Engineering":  
        y_pos -= 0.25  

    # Again, make sure that all labels are large enough to be easily read  
    # by the viewer.  
    plt.text(2011.5, y_pos, column, fontsize=14, color=tableau20[rank])  

# matplotlib's title() call centers the title on the plot, but not the graph,  
# so I used the text() call to customize where the title goes.  

# Make the title big enough so it spans the entire plot, but don't make it  
# so big that it requires two lines to show.  

# Note that if the title is descriptive enough, it is unnecessary to include  
# axis labels; they are self-evident, in this plot's case.  
plt.text(1995, 93, "Percentage of Bachelor's degrees conferred to women in the U.S.A."  
       ", by major (1970-2012)", fontsize=17, ha="center")  

# Always include your data source(s) and copyright notice! And for your  
# data sources, tell your viewers exactly where the data came from,  
# preferably with a direct link to the data. Just telling your viewers  
# that you used data from the "U.S. Census Bureau" is completely useless:  
# the U.S. Census Bureau provides all kinds of data, so how are your  
# viewers supposed to know which data set you used?  
plt.text(1966, -8, "Data source: nces.ed.gov/programs/digest/2013menu_tables.asp"  
       "\nAuthor: Randy Olson (randalolson.com / @randal_olson)"  
       "\nNote: Some majors are missing because the historical data "  
       "is not available for them", fontsize=10)  

# Finally, save the figure as a PNG.  
# You can also save it as a PDF, JPEG, etc.  
# Just change the file extension in this call.  
# bbox_inches="tight" removes all the extra whitespace on the edges of your plot.  
plt.savefig("percent-bachelors-degrees-women-usa.png", bbox_inches="tight")

 

chess-number-ply-over-time
 

import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import sem

# This function takes an array of numbers and smoothes them out.
# Smoothing is useful for making plots a little easier to read.
def sliding_mean(data_array, window=5):
    data_array = array(data_array)
    new_list = []
    for i in range(len(data_array)):
        indices = range(max(i - window + 1, 0),
                        min(i + window + 1, len(data_array)))
        avg = 0
        for j in indices:
            avg += data_array[j]
        avg /= float(len(indices))
        new_list.append(avg)
        
    return array(new_list)

# Due to an agreement with the ChessGames.com admin, I cannot make the data
# for this plot publicly available. This function reads in and parses the
# chess data set into a tabulated pandas DataFrame.
chess_data = read_chess_data()

# These variables are where we put the years (x-axis), means (y-axis), and error bar values.
# We could just as easily replace the means with medians,
# and standard errors (SEMs) with standard deviations (STDs).
years = chess_data.groupby("Year").PlyCount.mean().keys()
mean_PlyCount = sliding_mean(chess_data.groupby("Year").PlyCount.mean().values,
                             window=10)
sem_PlyCount = sliding_mean(chess_data.groupby("Year").PlyCount.apply(sem).mul(1.96).values,
                            window=10)

# You typically want your plot to be ~1.33x wider than tall.
# Common sizes: (10, 7.5) and (12, 9)
plt.figure(figsize=(12, 9))

# Remove the plot frame lines. They are unnecessary chartjunk.
ax = plt.subplot(111)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)

# Ensure that the axis ticks only show up on the bottom and left of the plot.
# Ticks on the right and top of the plot are generally unnecessary chartjunk.
ax.get_xaxis().tick_bottom()
ax.get_yaxis().tick_left()

# Limit the range of the plot to only where the data is.
# Avoid unnecessary whitespace.
plt.ylim(63, 85)

# Make sure your axis ticks are large enough to be easily read.
# You don't want your viewers squinting to read your plot.
plt.xticks(range(1850, 2011, 20), fontsize=14)
plt.yticks(range(65, 86, 5), fontsize=14)

# Along the same vein, make sure your axis labels are large
# enough to be easily read as well. Make them slightly larger
# than your axis tick labels so they stand out.
plt.ylabel("Ply per Game", fontsize=16)

# Use matplotlib's fill_between() call to create error bars.
# Use the dark blue "#3F5D7D" as a nice fill color.
plt.fill_between(years, mean_PlyCount - sem_PlyCount,
                 mean_PlyCount + sem_PlyCount, color="#3F5D7D")

# Plot the means as a white line in between the error bars. 
# White stands out best against the dark blue.
plt.plot(years, mean_PlyCount, color="white", lw=2)

# Make the title big enough so it spans the entire plot, but don't make it
# so big that it requires two lines to show.
plt.title("Chess games are getting longer", fontsize=22)

# Always include your data source(s) and copyright notice! And for your
# data sources, tell your viewers exactly where the data came from,
# preferably with a direct link to the data. Just telling your viewers
# that you used data from the "U.S. Census Bureau" is completely useless:
# the U.S. Census Bureau provides all kinds of data, so how are your
# viewers supposed to know which data set you used?
plt.xlabel("\nData source: www.ChessGames.com | "
           "Author: Randy Olson (randalolson.com / @randal_olson)", fontsize=10)

# Finally, save the figure as a PNG.
# You can also save it as a PDF, JPEG, etc.
# Just change the file extension in this call.
# bbox_inches="tight" removes all the extra whitespace on the edges of your plot.
plt.savefig("chess-number-ply-over-time.png", bbox_inches="tight");

Histograms

 
chess-elo-rating-distribution

 

import pandas as pd
import matplotlib.pyplot as plt

# Due to an agreement with the ChessGames.com admin, I cannot make the data
# for this plot publicly available. This function reads in and parses the
# chess data set into a tabulated pandas DataFrame.
chess_data = read_chess_data()

# You typically want your plot to be ~1.33x wider than tall.
# Common sizes: (10, 7.5) and (12, 9)
plt.figure(figsize=(12, 9))

# Remove the plot frame lines. They are unnecessary chartjunk.
ax = plt.subplot(111)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)

# Ensure that the axis ticks only show up on the bottom and left of the plot.
# Ticks on the right and top of the plot are generally unnecessary chartjunk.
ax.get_xaxis().tick_bottom()
ax.get_yaxis().tick_left()

# Make sure your axis ticks are large enough to be easily read.
# You don't want your viewers squinting to read your plot.
plt.xticks(fontsize=14)
plt.yticks(range(5000, 30001, 5000), fontsize=14)

# Along the same vein, make sure your axis labels are large
# enough to be easily read as well. Make them slightly larger
# than your axis tick labels so they stand out.
plt.xlabel("Elo Rating", fontsize=16)
plt.ylabel("Count", fontsize=16)

# Plot the histogram. Note that all I'm passing here is a list of numbers.
# matplotlib automatically counts and bins the frequencies for us.
# "#3F5D7D" is the nice dark blue color.
# Make sure the data is sorted into enough bins so you can see the distribution.
plt.hist(list(chess_data.WhiteElo.values) + list(chess_data.BlackElo.values),
         color="#3F5D7D", bins=100)

# Always include your data source(s) and copyright notice! And for your
# data sources, tell your viewers exactly where the data came from,
# preferably with a direct link to the data. Just telling your viewers
# that you used data from the "U.S. Census Bureau" is completely useless:
# the U.S. Census Bureau provides all kinds of data, so how are your
# viewers supposed to know which data set you used?
plt.text(1300, -5000, "Data source: www.ChessGames.com | "
         "Author: Randy Olson (randalolson.com / @randal_olson)", fontsize=10)

# Finally, save the figure as a PNG.
# You can also save it as a PDF, JPEG, etc.
# Just change the file extension in this call.
# bbox_inches="tight" removes all the extra whitespace on the edges of your plot.
plt.savefig("chess-elo-rating-distribution.png", bbox_inches="tight");

Here Goes the Bonus

It takes one more line of code to transform your matplotlib into a phenomenal interactive.

 

 

Learn more such tutorials only at DexLab Analytics. We make data visualizations easier by providing excellent Python courses in India. In just few months, you will cover advanced topics and more, which will help you make a career in data analytics.

 

Interested in a career in Data Analyst?

To learn more about Machine Learning Using Python and Spark – click here.
To learn more about Data Analyst with Advanced excel course – click here.
To learn more about Data Analyst with SAS Course – click here.
To learn more about Data Analyst with R Course – click here.
To learn more about Big Data Course – click here.

Data Science: Is It the Right Answer?

‘Big Data’, and then there is ‘Data Science’. These terms are found everywhere, but there is a constant issue lingering with their effectiveness. How effective is data science? Is Big Data an overhyped concept stealing the thunder?

Summing this up, Tim Harford stated in a leading financial magazine –“Big Data has arrived, but big insights have not.” Well, to be precise, Data Science nor Big Data are to be blamed for this, whereas the truth is there exists a lot of data around, but in different places. The aggregation of data is difficult and time-consuming.

Look for Data analyst course in Gurgaon at DexLab Analytics.

Statistically, Data science may be the next-big-thing, but it is yet to become mainstream. Though prognosticators predict 50% of organizations are going to use Data Science in 2017, more practical visionaries put the numbers closer to 15%. Big Data is hard, but it is Data Science that is even harder. Gartner reports, “Only 15% organizations are able to channelize Data Science to production.” – The reason being the gap existing between Data Science expectations and reality.

Big Data is relied upon so extensively that companies have started to expect more than it can actually deliver. Additionally, analytics-generated insights are easier to be replicated – of late, we studied a financial services company where we found a model based on Big Data technology only to learn later that the developers had already developed similar models for several other banks. It means, duplication is to be expected largely.

However, Big Data is the key to Data Science success. For years, the market remained exhilarated about Big Data. Yet, years after big data infused into Hadoop, Spark, etc., Data Science is nowhere near a 50% adoption rate. To get the best out of this revered technology, organizations need vast pools of data and not the latest algorithms. But the biggest reason for Big Data failure is that most of the companies cannot muster in the information they have, properly. They don’t know how to manage it, evaluate it in the exact ways that amplify their understanding, and bring in changes according to newer insights developed. Companies never automatically develop these competencies; they first need to know how to use the data in the correct manner in their mainframe systems, much the way he statisticians’ master arithmetic before they start on with algebra. So, unless and until a company learns to derive out the best from its data and analysis, Data Science has no role to play.

Even if companies manage to get past the above mentioned hurdles, they fail miserably in finding skillful data scientists, who are the right guys for the job in question. Veritable data scientists are rare to find these days. Several universities are found offering Data Science programs for the learners, but instead of focusing on the theoretical approach, Data Science is a more practical discipline. Classroom training is not what you should be looking for. Seek for a premier Data analyst training institute and grab the fundamentals of Data Science. DexLab Analytics is here with its amazing analyst courses in Delhi. Get enrolled today to outshine your peers and leave an imprint in the bigger Big Data community for long.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Top 4 Best Big Data Jobs to Look For in 2017

Data is now produced at an incredible rate – right from online shopping to browsing through social media platforms to navigating through GPS-enabled smartphones, data is being accessed everywhere. Big Data professionals now fathom the enormous business opportunities by perusing petabytes of data, which was impossible to grasp previously. Organizations are taking the best advantage of this situation and rushing to make the best of these revelations about.

 
Top-4-Best-Big-Data-Jobs-to-Look-For-in-2017
 

Big data courses are now available in India. DexLab Analytics is the one providing such advanced Big Data Hadoop certification in Gurgaon.

Continue reading “Top 4 Best Big Data Jobs to Look For in 2017”

Predictive Analytics: In conversation with Adam Bataran, Managing Director of GTM Global Salesforce Platforms at Bluewolf

To discuss about Predictive Analytics, we have Adam Bataran, Managing Director of GTM Global Salesforce Platforms at Bluewolf with us.

 

Follow the answers Mr. Bataran pitches to understand the entire concept better.

 
Predictive Analytics: In conversation with Adam Bataran, Managing Director of GTM Global Salesforce Platforms at Bluewolf
 

The question: What does predictive analytics mean and what value it imparts to the businesses today?

 

The answer: Predictive Analytics functions by implementing data, machine learning techniques and statistical algorithms to predict the future business outcomes and trends, based on past data and figures. It involves a number of distinct but advanced analytics disciplines and technologies – from deep data mining techniques and statistical analysis to predictive modeling and machine learning answers the most sought after question, “what will happen next?” or “how the customers will react to this?”.

Continue reading “Predictive Analytics: In conversation with Adam Bataran, Managing Director of GTM Global Salesforce Platforms at Bluewolf”

Analyze Smartphone Sensor Data with R and the BreakoutDetection Package

Analyze-Smartphone-Sensor-Data-with-R-and-the-BreakoutDetection-Package

Quite interetsing. Juggling with sensor data is starkly different from economics data, document processing or social networking, but very worthwhile. In this blog, we will take a practical approach to analyze smartphone sensor data with R. We are going to use the accelerometer smartphone data that Datarella presented in its Data Fiction competition. The dataset signifies the stimulation along the three axes of the smartphone:

 

x – for sideways stimulation

y – for forward and backward stimulation

z – for upward and downward stimulation

 

The trickier part lies in its interpretation – on one hand where there are device, manufacturer and sensor specific mutations and artifacts, the other reflects all acceleration is calculated relative to the sensor orientation of the device. For example, taking out the cell phone out of your pocket and reading a tweet can be presented in the following way:

 

y acceleration – the phone was in the pocket top down but now has been taken out

z and y acceleration – tossing the phone so that it becomes horizontal

x acceleration – moving the smartphone from the left to the middle of your body

z acceleration – bringing  up the phone so that you can read the tweet clearly

And thirdly, the gravity influences all the movements.

 

Seeking R programming courses in Gurgaon? Feel free to reach us at DexLab Analytics..

Knowing exactly what to do with your smartphone can be quite intimidating – let us introduce an application of the Twitter BreakoutDetection Open Source library (see Github), which is used extensively for Behavioral Change Point analysis.

First, I have loaded the dataset and this is how it looks like:

setwd("~/Documents/Datarella")
accel <- read.csv("SensorAccelerometer.csv", stringsAsFactors=F)
head(accel)

  user_id           x          y        z                 updated_at                 type
1      88 -0.06703765 0.05746084 9.615114 2014-05-09 17:56:21.552521 Probe::Accelerometer
2      88 -0.05746084 0.10534488 9.576807 2014-05-09 17:56:22.139066 Probe::Accelerometer
3      88 -0.04788403 0.03830723 9.605537 2014-05-09 17:56:22.754616 Probe::Accelerometer
4      88 -0.01915361 0.04788403 9.567230 2014-05-09 17:56:23.372244 Probe::Accelerometer
5      88 -0.06703765 0.08619126 9.615114 2014-05-09 17:56:23.977817 Probe::Accelerometer
6      88 -0.04788403 0.07661445 9.595961  2014-05-09 17:56:24.53004 Probe::Accelerometer

This data includes the sensor data per user per day:

accel$day <- substr(accel$updated_at, 1, 10)
df <- accel[accel$day == '2014-05-12' & accel$user_id == 88,]
df$timestamp <- as.POSIXlt(df$updated_at) # Transform to POSIX datetime
library(ggplot2)
ggplot(df) + geom_line(aes(timestamp, x, color="x")) + 
             geom_line(aes(timestamp, y, color="y")) + 
             geom_line(aes(timestamp, z, color="z")) + 
             scale_x_datetime() + xlab("Time") + ylab("acceleration")

sensor_all

Let’s focus on the period between 12:32 and 13:00:

ggplot(df[df$timestamp >= '2014-05-12 12:32:00' & df$timestamp < '2014-05-12 13:00:00',]) +
  geom_line(aes(timestamp, x, color="x")) + 
  geom_line(aes(timestamp, y, color="y")) + 
  geom_line(aes(timestamp, z, color="z")) + 
  scale_x_datetime() + xlab("Time") + ylab("acceleration")

sensor_zoom

Following all this, I load the Breakoutdetection library:

install.packages("devtools")
devtools::install_github("twitter/BreakoutDetection")
library(BreakoutDetection)
bo <- breakout(df$x[df$timestamp >= '2014-05-12 12:32:00' & df$timestamp < '2014-05-12 12:35:00'], 
               min.size=10, method='multi', beta=.001, degree=1, plot=TRUE)
bo$plotsensor_breakout

The rapid analysis of the acceleration in the x direction presents us with 4 change points, in which the stimulation suddenly starts to change. At the start, the smartphone normally lies flat on a horizontal surface – the sensor reading revolves around value of 9.8 in a positive direction – which means the gravitational force only triggers this axis and not the x or y axes. Therefore, the phone is lying flat. However, things change and after a couple of movements or changing directions, the last observation reveals the phone has been on a position where the x axis has 9.6 acceleration, meaning the phone is being positioned in a landscape orientation facing the right.

Get the best R Analytics Certification in Gurgaon from our seasoned experts at DexLab Analytics.

 
This post originally appeared onwww.r-bloggers.com/how-to-analyze-smartphone-sensor-data-with-r-and-the-breakoutdetection-package
 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Analyze Data Using These Easy but Effective MS Excel Tricks

Analyze-Data-Using-These-Easy-but-Effective-MS-Excel-Tricks


Everyone adores MS Excel. The powerful Excel software not only excels in doing basic computations but is also used for various formidable purposes, like financial modeling and business strategies. For novices, employing the skills of MS Excel can open new vistas in the world of data analytics. It is even being said – prior to R or Python, better grab the knowledge about Excel. Excel, with its wide array of functions, visualizations and skills empowers the users to quickly generate insights about data from the data, which would take a great toll otherwise.

Top 5 commonly used functions are highlighted below:

  1. Vlookup(): It works great in search of a value in a table and returns the corresponding value. For better understanding, take a look at the Policy Table.

For mapping the city name from the customer tables, based on a regular “customer id”, use function vlookup()

Vlookup_11-850x263

Syntax: =VLOOKUP(Key to lookup, Source_table, column of source table, are you ok with relative match?)

 

For this problem, we can type formula in cell “F4” as =VLOOKUP(B4, $H$4:$L$15, 5, 0) and this will return the city name for all the Customer id 1 and after that, copy this formula for all Customer ids.

Tip: Do not forget to lock the range of the second table using “$” sign – a common error when copying this formula down. This is known as relative referencing.

  1. CONCATINATE() – When it comes to combine text from two or more cells into a single cell, use CONCATINATE().

Check out the following table:

Concatenate1Here, we want to create a URL structured on input of host name and request path.

Use formula =concatenate(B3,C3) to solve the issue, and copy it.

Tip: Try to use “&” symbol, as it is shorter than typing a full “concatenate” formula, and does the exactly same thing. The formula can be written as “= B3&C3”.

  1. LEN() – It indicates the length of a cell, consisting the number of characters, including special characters and spaces.

 

Syntax: =Len(Text)

 

Example: =Len(B3) = 23

 

  1. TRIM() – This is a very useful function to wipe off text that has leading and trailing white space. When you get a large chunk of data from any database, the text found is usually padded with blanks. To deal with such problems, use this handy function.

 

Syntax: =Trim(Text)

 

  1. If() – It allows you to use conditional formulas to calculate when a certain thing holds true and when not. Move your eyes below on the below table to mark each sales as High and Low.Conditional

Undoubtedly MS Excel is one of the most robust programs ever created, and it still remains to be the golden standard for all business outcomes worldwide. Irrespective of whether you are a fresh blood or a veteran user, there is always some scope to learn new things in Excel, which makes it an all-time-favorite software program.  

 

To follow such interesting blogs on Excel, SAS, Big Data and other branches of Big Data, reach us at DexLab Analytics. We are the prime advanced Excel institute in Gurgaon and expect nothing but the best from us!

 
This post originally appeared onwww.analyticsvidhya.com/blog/2015/11/excel-tips-tricks-data-analysis
 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

A New India in the Make: GST’s Impact on Items and Services

India’s iconic tax reform is here. Rolled out from midnight of 30th June, 2017, GST is taking the entire nation by a storm – the GST Council has pegged the tax rates for 1211 items and 600 services, holding a majority of these within 18% tax rate slab.

 
A New India in the Make: GST’s Impact on Items and Services
 

The GST Council has designed 4 tax rate slabs primarily for numerous items – the rates are as low as 5%, standard rates hover around 12% to 18%, whereas the highest rate found is of 28%. Some of the items had higher effective tax rates before GST implementation, but under the new tax policy, the consumers are to be benefitted at large.

Continue reading “A New India in the Make: GST’s Impact on Items and Services”

Will GST Boost The Big Data Revolution? The Answer lies Within

It is July 1st, 2017 – the epic day when GST, aka The Goods and Services Tax comes into effect, simplifying the whole tax collection procedure of the nation. From today, there will be a single tax on the supply of goods and services that will replace all other state and central levies. GST is pegged to be one of the most impressive economic tax reforms implemented by PM Narendra Modi to take Bharat to the summit of transparent digitization.

 
Will GST Boost The Big Data Revolution? The Answer lies Within
 

Data is crucial. While it ushers in a greater transparency and simplified tracking through data, it also unleashes the requirement for Data Analytics and ERP solutions. Besides, GST includes several billing software and payment gateway channelization, triggering plenitude of job opportunities in the IT sector. Reports say it is going to be a $1 billion opportunity for IT vendors over the next two years. Quite a lot to think about!

Continue reading “Will GST Boost The Big Data Revolution? The Answer lies Within”

Big Data is the Magic Wand to Cure Healthcare Industry Hiccups

Big Data is the Magic Wand to Cure Healthcare Industry Hiccups

Spurred by advanced Analytics and Big Data technologies, Healthcare industry is going towards a major transformation, of course for the good! The catalyst here is none but our very own, our most favorite Big Data – it is robustly opening all the doors of health and medical science, and the possibilities seem endless.

Electronic Health Records have been around for sometime – numerous systems of variable reliability have been designed to ensure data is more easily accessible as well as transferable between the healthcare professionals, institutions and whatever it is for better patients’ care. With Big Data, scientists are coming up with improved sophisticated methods of incorporating the derived information with the data from innumerable number of health-related sources. The main objective is to make the best use of the relevant information in consultation with the doctors and patients to serve in the best way possible.

2

Nowadays, plenty of veritable companies provide systems which not only help in providing the doctors a detailed study of a patient’s medical history but also supply with data that can be used largely for fine treatment purposes. Highlighting correlations between different medical conditions inaccessible before, sparing insights into how these conditions may be influenced by other factors, like treatment methods or in which part of the world they are taking place are some improvements to be witnessed now.

As estimated, 75% of healthcare data is generated from unstructured sources like clinical notes, laboratory tests, emails, telematics, digital devices, imaging and third party sources. This data revolution is brought to you by Big Data, and this is how you can derive the best of its benefits:

Reduce fraud, abuse and waste

We all know how fraud, abuse and waste have been spiking healthcare costs, thanks to data science, the tides are changing now. To ascertain abuse and fraud, insurers require the expertise to analyze large unstructured datasets related to historical claims using machine learning algorithms.

Improve outcomes, embrace Predictive Analysis

Predictive Modeling is helping the health world in detecting the early signs of life threatening diseases, like sepsis. The availability of a vast pool of patients’ data means Predictive Analytics would find not only similar symptoms but also will curate a similar response to a specific medication.

Healthcare Internet of Things

The Internet of Things (IoT) is the aggregation of the increasing number of smart, interconnected, technology-efficient devices and sensors that share data over the internet. In healthcare, IoT refers to the devices that monitor almost all kinds of patient behavior, right from blood pressure to ECG. As per statistics, spending on healthcare IoT could cross $120 billion mark in the coming four years and the possibilities are quite high.

Minimum costs but better patients’ recovery rates

Through data convergence, stream processing and application agility, full-scale digital transformation is now possible in the medical world. Improving patients’ diagnosis is a new milestone achieved in the field of medicines and it has only been possible due to advancement in data science.

 

On this National Doctor’s Day, celebrated on 1st July nationwide, take a big leap in career by enrolling for a Big Data Hadoop course in Gurgaon. DexLab Analytics is the proud name behind such intensive big data training in Delhi, browse through our courses today.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Call us to know more