Oozie shell action: exec and file tags
Asked Answered
F

1

7

I'm a newbie in Oozie and I've read some Oozie shell action examples but this got me confused about certain things.

There are examples I've seen where there is no <file> tag.

Some example, like in Cloudera here, repeats the shell script in file tag:

<shell xmlns="uri:oozie:shell-action:0.2">
    <exec>check-hour.sh</exec>
    <argument>${earthquakeMinThreshold}</argument>
    <file>check-hour.sh</file>
</shell>

While in Oozie's website, writes the shell script (the reference ${EXEC} from job.properties, which points to script.sh file) twice, separated by #.

<shell xmlns="uri:oozie:shell-action:0.1">
    ...
    <exec>${EXEC}</exec>
    <argument>A</argument>
    <argument>B</argument>
    <file>${EXEC}#${EXEC}</file>
</shell>

There are also examples I've seen where the path (HDFS or local?) is prepended before the script.sh#script.sh within the <file> tag.

<shell xmlns="uri:oozie:shell-action:0.1">
    ...
    <exec>script.sh</exec>
    <argument>A</argument>
    <argument>B</argument>
    <file>/path/script.sh#script.sh</file>
</shell>

As I understand, any shell script file can be included in the workflow HDFS path (same path where workflow.xml resides).

Can someone explain the differences in these examples and how <exec>, <file>, script.sh#script.sh, and the /path/script.sh#script.sh are used?

Fy answered 27/1, 2016 at 7:19 Comment(0)
P
20

<file>hdfs:///apps/duh/mystuff/check-hour.sh</file> means "download that HDFS file into the Current Working Dir of the YARN container that runs the Oozie Launcher for the Shell action, using the same file name by default, so that I can reference it as ./check-hour.sh or simply check-hour.sh in the <exec> element".

<file>check-hour.sh</file> means "download that HDFS file -- from my user's home dir e.g. hdfs:///user/borat/check-hour.sh -- into etc. etc.".

<file>hdfs:///apps/duh/mystuff/check-hour.sh#youpi</file> means "download that HDFS file etc. etc., renaming it as youpi, so that I can reference it as ./youpi or simply youpi in the element".

Note that the Hue UI often inserts unnecessary # stuff with no actual name change. That's why you will see it so often.

Periscope answered 27/1, 2016 at 21:53 Comment(2)
BTW, maybe you wonder what is the value of # syntax for a <file> element. In the examples above, it has no value. But think about <file>/apps/bling/bling-hardcore-1.2.3.4-6-unplugged.jar#bling.jar</file> or <archive>/apps/stuff/spark-1.6.0-with hive-1.2-dependencies.zip#spark</archive> so that your Java or Shell action can expect a pre-defined name to build a CLASSPATH...Periscope
hi @samson-scharfrichter do you advise using shell action instead of spark action to submit spark - I still not sure what to use ?Lumpen

© 2022 - 2024 — McMap. All rights reserved.