Executing Sqoops using Oozie
Asked Answered
I

2

6

I have 2 Sqoops that loads data from HDFS to MySQL. I want to execute them using Oozie. I have seen that Oozie is an XML file. How can I configure it so I can execute those Sqoop? Demonstration with steps will be appreciated?

Two Sqoops are:

1.

sqoop export --connect jdbc:mysql://localhost/hduser --table foo1 -m 1 --export-dir /user/cloudera/bar1

2.

sqoop export --connect jdbc:mysql://localhost/hduser --table foo2 -m 1 --export-dir /user/cloudera/bar2

Thanks.

Impracticable answered 1/4, 2014 at 21:19 Comment(0)
H
7

You don't have to execute it via a shell action. There is a separate sqoop action in oozie. Here is what you have to put in your workflow.xml

<workflow-app xmlns="uri:oozie:workflow:0.4" name="oozie-wf">
    <start to="sqoop-wf1"/>
    <action name="sqoop-wf1">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
                <job-tracker>${jobTracker}</job-tracker>
                <name-node>${nameNode}</name-node>
                <command>export --connect jdbc:mysql://localhost/hduser --table foo1 -m 1 --export-dir /user/cloudera/bar1</command>
        </sqoop>
        <ok to="sqoop-wf2"/>
        <error to="fail"/>
    </action> 
    <action name="sqoop-wf2">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
                <job-tracker>${jobTracker}</job-tracker>
                <name-node>${nameNode}</name-node>
                <command>export --connect jdbc:mysql://localhost/hduser --table foo1 -m 1 --export-dir /user/cloudera/bar2</command>
        </sqoop>
        <ok to="end"/>
        <error to="fail"/>
    </action> 
    <kill name="fail">
        <message>Failed, Error Message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

Hope this helps..

Hungarian answered 2/4, 2014 at 9:24 Comment(7)
Hi, Thank you very much. How do I execute workflow.xml? Where do I put workflow.xml file? Where do I define parameters for "${jobTracker}"? I am new to Oozie and your help will be appreciated.Impracticable
jobTracker and nameNode is parametarized here meaning it should be defined in job.properties file. Seems like you have not gone through the examples. Check the apache oozie website. They have provided good examples. Here is the link. https://oozie.apache.org/docs/3.3.2/DG_Examples.htmlHungarian
Ok thanks. My Hadoop version is: Hadoop 2.0.0-cdh4.3.1. Can I use sqoop action in oozie for this version of Hadoop?Impracticable
thanks for your support. One last thing I would like to ask is, before the sqoop command gets executed (to load data from Hive/HDFS to MySQL), I have to load data to the HDFS, so what I do is: hive -hiveconf DATE='2014-04-01' -f bar.hql and it loads data into the HDFS e.g. /user/cloudera/bar1. Where should I put "hive -hiveconf DATE='2014-04-01' -f bar.hql" statement in Oozie? ThanksImpracticable
You can use Hive Action before Sqooping. Go through this example. http://oozie.apache.org/docs/3.3.2/DG_HiveActionExtension.htmlHungarian
Can I use Hive action and Sqoop action together? If yes, could you put it as new answer? Thank you!Impracticable
I have also other questions, if you could kindly answer: #22920931Impracticable
S
0

You can use an Oozie shell action for this. Basically you need to create a shell action & provide the commands that you posted in your question as the commands to be executed within the action

Sample Oozie action:

 <action name="SqoopAction">
    <shell xmlns="uri:oozie:shell-action:0.1">
        <job-tracker>[JOB-TRACKER]</job-tracker>
        <name-node>[NAME-NODE]</name-node>
        <prepare>
           <delete path="[PATH]"/>
           ...
           <mkdir path="[PATH]"/>
           ...
        </prepare>
        <job-xml>[SHELL SETTINGS FILE]</job-xml>
        <configuration>
            <property>
                <name>[PROPERTY-NAME]</name>
                <value>[PROPERTY-VALUE]</value>
            </property>
            ...
        </configuration>
        <exec>[SHELL-COMMAND]</exec>
        <argument>[ARG-VALUE]</argument>
            ...
        <argument>[ARG-VALUE]</argument>
        <env-var>[VAR1=VALUE1]</env-var>
           ...
        <env-var>[VARN=VALUEN]</env-var>
        <file>[FILE-PATH]</file>
        ...
        <archive>[FILE-PATH]</archive>
        ...
        <capture-output/>
    </shell>

In your case, you would replace [SHELL-COMMAND] with whatever Sqoop command you want to run, such as:

<exec>sqoop export --connect jdbc:mysql://localhost/hduser --table foo1 -m 1 --export-dir /user/cloudera/bar1</exec>

Also, you could put all your sqoop commands in a shell script, and execute that script instead. This is better if you have a lot of commands to be executed.

Spearman answered 1/4, 2014 at 21:30 Comment(2)
Hi, Thanks for the reply. Where do I put this XML file (say SqoopAction.xml)? How do I execute this XML file? What do I put for [PATH], [SHELL SETTINGS FILE], [FILE-PATH], ...? I am using "Cloudera QuickStart VM"Impracticable
Please read through this guide: blogs.oracle.com/datawarehousing/entry/… to learn about how to execute oozie workflows through workflow.xml. Also, the example I posted is generic, so there might be parameters which you may not require, so feel free to exclude them from your workflowSpearman

© 2022 - 2024 — McMap. All rights reserved.