I'm using Sqoop version 1.4.2 and Oracle database.
When running Sqoop command. For example like this:
./sqoop import \
--fs <name node> \
--jt <job tracker> \
--connect <JDBC string> \
--username <user> --password <password> \
--table <table> --split-by <cool column> \
--target-dir <where> \
--verbose --m 2
We can specify --m - how many parallel tasks do we want Sqoop to run (also they might be accessing Database at same time). Same option is available for ./sqoop export <...>
Is there some heuristic (probably based on size of data) which will help to guess what is optimal number of task to use?
Thank you!