The Pros and Cons of HIVE Partitioning

Hive organizes data using Partitions. By use of Partition, data of a table is organized into related parts based on values of partitioned columns such as Country, Department. It becomes easier to query certain portions of data using partition.

Partitions are defined using command PARTITIONED BY at the time of the table creation.

We can create partitions on more than one column of the table. For Example, We can create partitions on Country and State.

Syntax:

CREATE [EXTERNAL] TABLE table_name (col_name_1 data_type_1, ….)

PARTITIONED BY (col_name_n data_type_n , …);

Following are features of Partitioning:

It’s used for distributing execution load horizontally.
Query response is faster as query is processed on a small dataset instead of entire dataset.
If we selected records for US, records would be fetched from directory ‘Country=US’ from all directories.

Limitations:

Having large number of partitions create number of files/ directories in HDFS, which creates overhead for NameNode as it maintains metadata.
It may optimize certain queries based on where clause, but may cause slow response for queries based on grouping clause.

It can be used for log analysis, we can segregate the records based on timestamp or date value to see the results day wise / month wise.

Another use case can be, Sales records by Product –type , Country and month.

Interested in a career in Data Analyst?
To learn more about Data Analyst with Advanced excel course – Enrol Now.
To learn more about Data Analyst with R Course – Enrol Now.
To learn more about Big Data Course – Enrol Now.
To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.

Big Data, Big Data Analytics, Big data certification, Big data courses, big data hadoop, Big Data Hadoop courses, Big Data Hadoop institute in Delhi, Hadoop, Hive Training, Hive Training Institute

The Pros and Cons of HIVE Partitioning

Syntax:

Following are features of Partitioning:

Limitations:

Interested in a career in Data Analyst?

Recent Posts

Call us to know more

Gurgaon

Kolkata

Quick Links

Our Courses

Important dates