basic Hadoop Archives - DexLab Analytics | Big Data Hadoop SAS R Analytics Predictive Modeling & Excel VBA

HIVE – User Defined Functions

Though, Hive has a list of built in functions, in some scenarios we need user defined functions to be written in Java for some specific use cases.
HIVE User Defined Functions.

We can use two interfaces which can be used to write UDFs for apache Hive.

  • The simple API (apache.hadoop.hive.ql.exec.UDF) can be used as long as our function reads and returns primitive types. Means, basic Hadoop & Hive writable types – Text, LongWritable, IntWritable and DoubleWritable etc.
  • If you plan to write a UDF that deals with embedded data structures, such asList, Mapand Set, then you need to useapache.hadoop.hive.ql.udf.generic.GenericUDF, which is a little more involved.
  • Simple API – apache.hadoop.hive.ql.exec.UDF
  • Complex API – apache.hadoop.hive.ql.udf.generic.GenericUDF

Steps to create Hive-UDF

Step 1:-

Open your Eclipse then create a java Class Name

Step 2:-

Add Jar files to project folder

Step 3 :-

Extend UDF Abstract Class

public class classname extends UDF and you return the value.

Step 4 :-

Implement evaluate() method . This method is called once for every row of data being processed

Step 5:-

Compile and create jar file.

Step 6:-

Add jar file to hive class path.

In hive terminal – add jar <jar file path>

Step 7 :-

Create temporary function in Hive Terminal.

CREATE temporary function Convert as ‘udf.Convert′;

udf represents the package name and Convert represents the program name .

For example:

packageudf

importorg.apache.hadoop.hive.ql.exec.UDF;

importorg.apache.hadoop.io.Text;

publicclassConvertextends UDF{

private Text result =new Text();

public Text evaluate(String str){

int number;

number=Integer.parseInt(str);

float fno=(float) number;

String res=Float.toString(fno);

result.set(res);

return result;

}

}

Here, We have extended UDF abstract class.

This code converts Int to Float.

Assuming a hive table Demo contains column ID with following data:

1

2

3

5

Select Convert(ID) from Demo gives following output :

1.0

2.0

3.0

5.0

Call us to know more