Can I pass parameters to UDFs in Pig script?
Asked Answered
P

3

6

I am relatively new to PigScript. I would like to know if there is a way of passing parameters to Java UDFs in Pig?

Here is the scenario: I have a log file which have different columns (each representing a Primary Key in another table). My task is to get the count of distinct primary key values in the selected column. I have written a Pig script which does the job of getting the distinct primary keys and counting them. However, I am now supposed to write a new UDF for each column. Is there a better way to do this? Like if I can pass a row number as parameter to UDF, it avoids the need for me writing multiple UDFs.

Piracy answered 31/10, 2012 at 17:38 Comment(0)
N
3

The way to do it is by using DEFINE and the constructor of the UDF. So here is an example of a customer "splitter":

REGISTER com.sample.MyUDFs.jar;
DEFINE CommaSplitter com.sample.MySplitter(',');

B = FOREACH A GENERATE f1, CommaSplitter(f2);

Hopefully that conveys the idea.

Nestling answered 6/11, 2012 at 14:39 Comment(0)
P
1

To pass parameters you do the following in your pigscript:

UDF(document, '$param1', '$param2', '$param3')

edit: Not sure if those params need to be wrappedin ' ' or not

while in your UDF you do:

public class UDF extends EvalFunc<Boolean> {



public Boolean exec(Tuple input) throws IOException {

    if (input == null || input.size() == 0)
        return false;

    FileSystem fs = FileSystem.get(UDFContext.getUDFContext().getJobConf());


    String var1 = input.get(1).toString();
    InputStream var1In = fs.open(new Path(var1));


    String var2 = input.get(2).toString();
    InputStream var2In = fs.open(new Path(var2));

    String var3 = input.get(3).toString();
    InputStream var3In = fs.open(new Path(var3));



    return doyourthing(input.get(0).toString());
}
}

for example

Polyhydroxy answered 6/8, 2014 at 7:21 Comment(0)
P
0

Yes, you can pass any parameter in the Tuple parameter input of your UDF:

exec(Tuple input)

and access it using

input.get(index)
Pulvinate answered 31/10, 2012 at 18:49 Comment(2)
Yes Fred. But how do I pass a parameter from the PigScript side?Piracy
I don't know if that is exaclty what you want to do but you could create a new Tuple with the primary key as a first field and the data you actually want to pass to your UDF as the remaining fields: FOREACH tupleForUdf GENERATE primarykey, *;Pulvinate

© 2022 - 2024 — McMap. All rights reserved.