how to call a pig script within another pig script
Asked Answered
S

4

5

I have a file in hdfs with 100 columns, which i want to proces using pig. I want to load this file into a tuple with columns names in a separate pig script, and reuse this script from other pig scripts. How do I do this?

Say this 100 column pig script is - 100col.pig. How do i call it from anotherone.pig?

Siren answered 26/9, 2011 at 15:33 Comment(0)
M
5

Check into the exec command (for batch processing) or the run command (for interactive scripts). Also, if you need to use (non-grunt) shell commands, check the fs command. Here's a good reference:

http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html

Markson answered 26/9, 2011 at 15:46 Comment(2)
The run command did it! exec doesn't work in my case because i want the variables defined within 100col.pig to be available from my second script. RUN did it. thanks!Siren
As you hint, RUN has additional side-effects; see also this later SO answer that explains more about RUN versus EXEC.Kettle
I
3

You should try using macros that is present in pig version 0.9.

http://pig.apache.org/docs/r0.9.1/cont.html#macros

Inborn answered 26/10, 2011 at 23:8 Comment(0)
Z
3

Its a little late for this answer, but I was recently working on this requirement and found almost nothing helpful, until I found this, hope this will help someone in need:

** This excerpt is taken from Programming Pig book.

For a long time in Pig Latin, the entire script needed to be in one file. This produced some rather unpleasant multithousand-line Pig Latin scripts. Starting in 0.9, the preprocessor can be used to include one Pig Latin script in another. Taken together with the macros, it is now possible to write modular Pig Latin that is easier to debug and reuse: import is used to include one Pig Latin script in another:

--main.pig

import '../examples/ch6/dividend_analysis.pig';
daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray,
date:chararray, open:float, high:float, low:float, close:float,
volume:int, adj_close:float);
results = dividend_analysis(daily, '2009', 'symbol', 'open', 'close');

import writes the imported file directly into your Pig Latin script in place of the import statement. In the preceding example, the contents of dividend_analysis.pig will be placed immediately before the load statement. Note that a file cannot be imported twice. If you wish to use the same functionality multiple times, you should write it as a macro and import the file with that macro.

Zootoxin answered 19/10, 2014 at 19:12 Comment(0)
S
1

Here there are 2 options as mentioned above. Pig gives run and exec commands to tackle your requirement.

exec command is there for calling a pig script that is inependent and a standalone run. run command is there for running a pigscipt and preserve its variables and aliases.

I suppose you need to check out the run command to achieve your requirements. http://pig.apache.org/docs/r0.9.1/cmds.html#run

Simmie answered 3/6, 2014 at 2:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.