Below solution is from Apache Sqoop Cookbook.
Preserving the Last Imported Value
Problem
Incremental import is a great feature that you're using a lot. Shouldering the responsibility for remembering the last imported value is getting to be a hassle.
Solution
You can take advantage of the built-in Sqoop metastore that allows you to save all parameters for later reuse. You can create a simple incremental import job with the following command:
sqoop job \
--create visits 3.3. Preserving the Last Imported Value | 27
-- import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--table visits \
--incremental append \
--check-column id \
--last-value 0
And start it with the --exec parameter
:
sqoop job --exec visits
Discussion
The Sqoop metastore is a powerful part of Sqoop that allows you to retain your job definitions and to easily run them anytime. Each saved job has a logical name that is used for referencing. You can list all retained jobs using the --list
parameter:
sqoop job --list
You can remove the old job definitions that are no longer needed with the --delete
parameter, for example:
sqoop job --delete visits
And finally, you can also view content of the saved job definitions using the --show parameter
, for example:
sqoop job --show visits
Output of the --show
command will be in the form of properties. Unfortunately, Sqoop currently can't rebuild the command line that you used to create the saved job.
The most important benefit of the built-in Sqoop metastore is in conjunction with incremental import. Sqoop will automatically serialize the last imported value back into the metastore after each successful incremental job. This way, users do not need to remember the last imported value after each execution; everything is handled automatically.