Data analyst course in noida Archives - Page 4 of 6 - DexLab Analytics | Big Data Hadoop SAS R Analytics Predictive Modeling & Excel VBA

Why Your Business Needs a Chief Productivity Officer: 5 Reasons Explained

For smooth use of technology, businesses look forward to CIOs. But that’s so passé now. This position is now losing its relevance more and more, as other notable features like migration of business applications and storage on the cloud are enhancing their capabilities.

 
Why Your Business Needs a Chief Productivity Officer: 5 Reasons Explained
 

As such, a new business position: Chief Productivity Officer (CPO) is sprouting out – this job profile dictates all the services, while ensuring your organization meets every goal.

Continue reading “Why Your Business Needs a Chief Productivity Officer: 5 Reasons Explained”

How Data Scientists are Merging Professional and Personal Resolutions for a Career Boost in 2018

The beginning of a year comes with a wide stream of promises! Some decide to work on their physique, while others look forward to visit a new country, but budding data scientists are found thinking of something else.

How Data Scientists are Merging Professional and Personal Resolutions for a Career Boost in 2018

Here goes a chart down of what goes on in a mind of a data scientist, who could stare for hours at the computer screen pondering which code or query to run…

Continue reading “How Data Scientists are Merging Professional and Personal Resolutions for a Career Boost in 2018”

Architecture Trade-offs Pays Well for Enterprise Analytics

Today, owing to an explosion of technology options, determining which analytics stack to adopt takes into account a streak of architectural trade-offs. Over the years, with our experience and expertise we have learnt the most crucial aspect of creating sound analytics systems and pleasing customers with improved digital solutions – is the location where data is to be stored and processed, and the different types of databases to use so that only the right people gain access to it.

Architecture Trade-offs Pays Well for Enterprise Analytics

Opt for a comprehensive data analyst course Delhi NCR from DexLab Analytics.

Continue reading “Architecture Trade-offs Pays Well for Enterprise Analytics”

How Careers in Tech is Getting Influenced Due to CryptoCurrency Revolution

Cryptocurrency is the new in-thing that is creating a lot of buzz in the tech world. And though your friends and family might be hearing all good things about Bitcoin, you will be surprised to know – it’s creating job. Yes, you heard it right – cryptocurrency is exploding the job market. From startups to blue chip companies, everyone is talking about the perks of blockchain and what it potentials it holds for future.

 
How Careers in Tech is Getting Influenced Due to CryptoCurrency Revolution

Job trends to follow

Going by the reports uploaded by job search site Indeed, job postings for bitcoin, blockchain and cryptocurrency have increased by more than 620% since November 2015. In fact, the search ratios for such jobs have also increased by 1065%, suggesting supply is expanding along with demand.

Continue reading “How Careers in Tech is Getting Influenced Due to CryptoCurrency Revolution”

How Data Analytics Influences Holiday Retail Experience [Video]

Thanksgiving was right here! Half of the globe witnessed some crazy shopping kicking off the entire holiday season, and retailers had a whale of a time, offering luscious discounts and consumer gifts at half the prices.

 
How Data Analytics Influences Holiday Retail Experience
 

Before the weekend Thanksgiving sale, 69% of Americans, close to 164 million people across the US were estimated to shop– and they had planned to shell out up to 3.4% more money as compared to last year’s Black Friday and Cyber Monday sale. The forecasts came from National Retail Federation’s annual survey, headed by Prosper Insights & Analytics.

Continue reading “How Data Analytics Influences Holiday Retail Experience [Video]”

Internet of Things: It’s Much More Than What It Appears to Be

Internet of Things: It’s Much More Than What It Appears to Be

What’s all the hype about “the next big thing”? Have you got it yet? Nope? It’s not owing to a lack of imagination, but an observation.

Currently, the Internet of Things is the big buzz. It’s all about enhancing machine-to-machine communication – being structured on cloud computing and systems of data-gathering sensors, the connection is entirely virtual, mobile and instantaneous.

Big Data And The Internet Of Things – @Dexlabanalytics.

What is IoT?

In simple terms, the concept of IoT stresses on connecting any device with the Internet – including cellphones, headphones, washing machines, lamps, coffee makers, wearable devices and almost anything that comes in your mind. The IoT is a colossal network of connected Things (inclusive of people) – the famous analyst firm Gartner says by 2020 there will be more than 26 billion connected devices in this world.

Explaining the Everlasting Bond between Data and Risk Analytics – @Dexlabanalytics.

What makes it so popular?

As we now know, IoT is a network of things and people, where communication takes place through numerous wireless and wired technologies and it comes with a wide set of advantages. Following are some of the advantages of this new breed of technology:

A better, less-complicated life

Imagine a life, where what you seek will be delivered to you right away, before you even ask for it. It may appear to you that you are dropped right into a scene from your favorite sci-fi movie or novel – the moment your morning alarm starts ringing, your bathtub automatically starts getting filled with hot water; when you leave your home, the lights get turned off automatically and doors lock itself on its own; your car takes you to the office through the less-congested roadway and when you return home, your home lights automatically start to switch on and lastly your air conditioner adjusts the temperature of your room once you are ready to hit the bed. Proper use of IoT makes your life easier and effortlessly simple.

Is Change the Only Constant: How Analytics has Changed, while Staying the Same Over the Last Decade – @Dexlabanalytics.

Less accident, better safety

How would it be if for an example you get a heart attack while driving back home and your smartwatch detects it and deploys autopilot mode in your car so that it straightaway takes you directly to the nearest hospital? On the way, your cellphone can dial up the hospital staffs and inform them about the current condition of the patient to help you get the best treatment possible.

Harnessing the power of data

Utilizing the power of data is awesome. Harnessing data to simplify things is the next best thing in today’s world. Living a life straight out of sci-fi movies is awesome, but practically, there’s still some time left for IoT to become a hardcore reality. Once IoT makes its way into our lives, a set of smart devices powered by sensors will take charge and make almost everything possible – whether it’s switching on the AC automatically when a person enters the room or driving a car to a destination without any driver.

IoT helps in taking better decisions in the best interest for businesses

Beyond making your lives easier, IoT possesses a bunch of capabilities – it’s a robust technology that collects the most valuable resource, i.e. data. Data helps businesses take better, well-informed decisions. 

Of all the recent technological developments, Internet of Things is considered to be one of the biggest trends to watch out for. In the next 5 years, it’s going to change lives forever!

To know more about the Internet of Things and more such digital trends, why don’t you settle for a good business analytics course in Delhi! DexLab Analytics is a premier Data Science training institute Gurgaon that offers hands-on experience to students alike.

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Write ETL Jobs to Offload the Data Warehouse Using Apache Spark

Write ETL Jobs to Offload the Data Warehouse Using Apache Spark

The surge of Big Data is everywhere. The evolving trends in BI have taken the world in its stride and a lot of organizations are now taking the initiative of exploring how all this fits in.

Leverage data ecosystem to its full potential and invest in the right technology pieces – it’s important to think ahead so as to reap maximum benefits in IT in the long-run.

“By 2020, information will be used to reinvent, digitalize or eliminate 80% of business processes and products from a decade earlier.” – Gartner’s prediction put it so right!

The following architecture diagram entails a conceptual design – it helps you leverage the computing power of Hadoop ecosystem from your conventional BI/ Data warehousing handles coupled with real time analytics and data science (data warehouses are now called data lakes).

moderndwarchitecture

In this post, we will discuss how to write ETL jobs to offload data warehouse using PySpark API from the genre of Apache Spark. Spark with its lightning-fast speed in data processing complements Hadoop.

Now, as we are focusing on ETL job in this blog, let’s introduce you to a parent and a sub-dimension (type 2) table from MySQL database, which we will merge now to impose them on a single dimension table in Hive with progressive partitions.

Stay away from snow-flaking, while constructing a warehouse on hive. It will reduce useless joins as each join task generates a map task.

Just to raise your level of curiosity, the output on Spark deployment alone in this example job is 1M+rows/min.

The Employee table (300,024 rows) and a Salaries table (2,844,047 rows) are two sources – here employee’s salary records are kept in a type 2 fashion on ‘from_date’ and ‘to_date’ columns. The main target table is a functional Hive table with partitions, developed on year (‘to_date’) from Salaries table and Load date as current date. Constructing the table with such potent partition entails better organization of data and improves the queries from current employees, provided the to_date’ column has end date as ‘9999-01-01’ for all current records.

The rationale is simple: Join the two tables and add load_date and year columns, followed by potent partition insert into a hive table.

Check out how the DAG will look:

screen-shot-2015-09-28-at-1-44-32-pm

Next to version 1.4 Spark UI conjures up the physical execution of a job as Direct Acyclic Graph (the diagram above), similar to an ETL workflow. So, for this blog, we have constructed Spark 1.5 with Hive and Hadoop 2.6.0

Go through this code to complete your job easily: it is easily explained as well as we have provided the runtime parameters within the job, preferably they are parameterized.

Code: MySQL to Hive ETL Job

__author__ = 'udaysharma'
# File Name: mysql_to_hive_etl.py
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext, HiveContext
from pyspark.sql import functions as sqlfunc

# Define database connection parameters
MYSQL_DRIVER_PATH = "/usr/local/spark/python/lib/mysql-connector-java-5.1.36-bin.jar"
MYSQL_USERNAME = '<USER_NAME >'
MYSQL_PASSWORD = '********'
MYSQL_CONNECTION_URL = "jdbc:mysql://localhost:3306/employees?user=" + MYSQL_USERNAME+"&password="+MYSQL_PASSWORD 

# Define Spark configuration
conf = SparkConf()
conf.setMaster("spark://Box.local:7077")
conf.setAppName("MySQL_import")
conf.set("spark.executor.memory", "1g")

# Initialize a SparkContext and SQLContext
sc = SparkContext(conf=conf)
sql_ctx = SQLContext(sc)

# Initialize hive context
hive_ctx = HiveContext(sc)

# Source 1 Type: MYSQL
# Schema Name  : EMPLOYEE
# Table Name   : EMPLOYEES
# + --------------------------------------- +
# | COLUMN NAME| DATA TYPE    | CONSTRAINTS |
# + --------------------------------------- +
# | EMP_NO     | INT          | PRIMARY KEY |
# | BIRTH_DATE | DATE         |             |
# | FIRST_NAME | VARCHAR(14)  |             |
# | LAST_NAME  | VARCHAR(16)  |             |
# | GENDER     | ENUM('M'/'F')|             |
# | HIRE_DATE  | DATE         |             |
# + --------------------------------------- +
df_employees = sql_ctx.load(
    source="jdbc",
    path=MYSQL_DRIVER_PATH,
    driver='com.mysql.jdbc.Driver',
    url=MYSQL_CONNECTION_URL,
    dbtable="employees")

# Source 2 Type : MYSQL
# Schema Name   : EMPLOYEE
# Table Name    : SALARIES
# + -------------------------------- +
# | COLUMN NAME | TYPE | CONSTRAINTS |
# + -------------------------------- +
# | EMP_NO      | INT  | PRIMARY KEY |
# | SALARY      | INT  |             |
# | FROM_DATE   | DATE | PRIMARY KEY |
# | TO_DATE     | DATE |             |
# + -------------------------------- +
df_salaries = sql_ctx.load(
    source="jdbc",
    path=MYSQL_DRIVER_PATH,
    driver='com.mysql.jdbc.Driver',
    url=MYSQL_CONNECTION_URL,
    dbtable="salaries")

# Perform INNER JOIN on  the two data frames on EMP_NO column
# As of Spark 1.4 you don't have to worry about duplicate column on join result
df_emp_sal_join = df_employees.join(df_salaries, "emp_no").select("emp_no", "birth_date", "first_name",
                                                             "last_name", "gender", "hire_date",
                                                             "salary", "from_date", "to_date")

# Adding a column 'year' to the data frame for partitioning the hive table
df_add_year = df_emp_sal_join.withColumn('year', F.year(df_emp_sal_join.to_date))

# Adding a load date column to the data frame
df_final = df_add_year.withColumn('Load_date', F.current_date())

df_final.repartition(10)

# Registering data frame as a temp table for SparkSQL
hive_ctx.registerDataFrameAsTable(df_final, "EMP_TEMP")

# Target Type: APACHE HIVE
# Database   : EMPLOYEES
# Table Name : EMPLOYEE_DIM
# + ------------------------------- +
# | COlUMN NAME| TYPE   | PARTITION |
# + ------------------------------- +
# | EMP_NO     | INT    |           |
# | BIRTH_DATE | DATE   |           |
# | FIRST_NAME | STRING |           |
# | LAST_NAME  | STRING |           |
# | GENDER     | STRING |           |
# | HIRE_DATE  | DATE   |           |
# | SALARY     | INT    |           |
# | FROM_DATE  | DATE   |           |
# | TO_DATE    | DATE   |           |
# | YEAR       | INT    | PRIMARY   |
# | LOAD_DATE  | DATE   | SUB       |
# + ------------------------------- +
# Storage Format: ORC


# Inserting data into the Target table
hive_ctx.sql("INSERT OVERWRITE TABLE EMPLOYEES.EMPLOYEE_DIM PARTITION (year, Load_date) \
            SELECT EMP_NO, BIRTH_DATE, FIRST_NAME, LAST_NAME, GENDER, HIRE_DATE, \
            SALARY, FROM_DATE, TO_DATE, year, Load_date FROM EMP_TEMP")

As we have the necessary configuration mentioned in our code, we will simply call to run this job

spark-submit mysql_to_hive_etl.py

As soon as the job is run, our targeted table will consist 2844047 rows just as expected and this is how the partitions will appear:

screen-shot-2015-09-29-at-12-42-37-am

2

3

screen-shot-2015-09-29-at-12-46-55-am

The best part is that – the entire process gets over within 2-3 mins..

For more such interesting blogs and updates, follow us at DexLab Analytics. We are a premium Big Data Hadoop institute in Gurgaon catering to the needs of aspiring candidates. Opt for our comprehensive Hadoop certification in Delhi and crack such codes in a jiffy!

 

Interested in a career in Data Analyst?

To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.

To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Quantum Internet Is Now Turning Into a Reality

Quantum Internet Is Now Turning Into a Reality
 

Scientists across the globe are looking forward towards formulating new methods to realize ‘quantum internet’, an unhackable internet, which connects particles linked together by the principle of quantum entanglement. In simple terms, quantum internet will entail multiple particles striking information at each other in the form of quantum signals – but specialists are yet to figure out what it actually does beyond that. The term ‘quantum internet’ is quite sketchy at this moment. There’s no real definition of it as of now.

Continue reading “Quantum Internet Is Now Turning Into a Reality”

Automation Doesn’t Necessarily Make Humans Obsolete, Here’s Why

Machines are going to eat our jobs.

 

AI is handling insurance claims and basic bookkeeping, maintaining investment portfolios, doing preliminary HR tasks, and performing extensive legal research and lot more. So, do humans stand a chance against the automation apocalypse, where everything, almost everything will be controlled by robots?

 
Automation Doesn’t Necessarily Make Humans Obsolete, Here’s Why
 

What do you think? You might be worried about your future job opportunities and universal basic income, but I would ask you to draw a clearer picture about this competing theory – because, in the end, this question might not even be a plausible and completely valid question. Why, I will tell you now.

Continue reading “Automation Doesn’t Necessarily Make Humans Obsolete, Here’s Why”

Call us to know more