Passing parameter to sqoop job
Asked Answered
T

2

6

I'm crceating a sqoop job which will be scheduled in Oozie to load daily data into Hive.

I want to do incremental load into hive based on Date as a parameter, which will be passed to sqoop job

After researching lot I'm unable to find a way to pass a parameter to Sqoop job

Ticking answered 5/3, 2015 at 14:22 Comment(0)
J
6

You do this by passing the date down through two stages:

  1. Coordinator to workflow

In your coordinator you can pass the date to the workflow that it executes as a <property>, like this:

<coordinator-app name="schedule" frequency="${coord:days(1)}"
                 start="2015-01-01T00:00Z" end="2025-01-01T00:00Z"
                 timezone="Etc/UTC" xmlns="uri:oozie:coordinator:0.2">
    ...
    <action>
        <workflow>
            <app-path>${nameNode}/your/workflow.xml</app-path>
            <configuration>
                <property>
                    <name>workflow_date</name>
                    <value>${coord:formatTime(coord:nominalTime(), 'yyyyMMdd')}</value>
                </property>
            </configuration>
        </workflow>
    </action>
    ...
</coordinator-app>
  1. Workflow to Sqoop

In your workflow you can reference that property in your Sqoop call using the ${workflow_date} variable, like this:

<sqoop xmlns="uri:oozie:sqoop-action:0.2">
    ...
    <command>import --connect jdbc:connect:string:here --table tablename --target-dir /your/import/dir/${workflow_date}/ -m 1</command>
    ...
</sqoop>
Johnathan answered 5/3, 2015 at 16:47 Comment(4)
hey Thank you so much!!..So i'll have to pass date to oozie coordinator XML on daily basis ? as I have daily loads from database to hive..Is it possible to pass this parameter from file?Ticking
No, an Oozie coordinator's purpose is to run itself on the schedule you set it and provide the date as shown in the answer.Johnathan
So before every run of coordinator I need to update workflow date in XML?Ticking
You should read up more on what coordinators are and then re-read my answer.Johnathan
C
3

Below solution is from Apache Sqoop Cookbook.

Preserving the Last Imported Value

Problem

Incremental import is a great feature that you're using a lot. Shouldering the responsibility for remembering the last imported value is getting to be a hassle.

Solution

You can take advantage of the built-in Sqoop metastore that allows you to save all parameters for later reuse. You can create a simple incremental import job with the following command:

sqoop job \
--create visits 3.3. Preserving the Last Imported Value | 27
-- import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--table visits \
--incremental append \
--check-column id \
--last-value 0

And start it with the --exec parameter:

sqoop job --exec visits

Discussion

The Sqoop metastore is a powerful part of Sqoop that allows you to retain your job definitions and to easily run them anytime. Each saved job has a logical name that is used for referencing. You can list all retained jobs using the --list parameter:

sqoop job --list

You can remove the old job definitions that are no longer needed with the --delete parameter, for example:

sqoop job --delete visits

And finally, you can also view content of the saved job definitions using the --show parameter, for example:

sqoop job --show visits

Output of the --show command will be in the form of properties. Unfortunately, Sqoop currently can't rebuild the command line that you used to create the saved job.

The most important benefit of the built-in Sqoop metastore is in conjunction with incremental import. Sqoop will automatically serialize the last imported value back into the metastore after each successful incremental job. This way, users do not need to remember the last imported value after each execution; everything is handled automatically.

Cacus answered 12/5, 2016 at 16:35 Comment(2)
@.Raghu Gundu.. how you are handling password for your saved sqoop jobs?Arboriculture
Just an addition to Rahul answer..If you are running your sqoop job as saved job through oozie then you need to set the below property.. sqoop.metastore.client.record.password. sqoop metastore does not store password itself..it's important to set this property if running saved jobs via oozie as it won't prompt you to enter your password.. Oozie job might get failed in that case.Arboriculture

© 2022 - 2024 — McMap. All rights reserved.