How to use Sqoop in Java Program?
Asked Answered
W

5

22

I know how to use sqoop through command line. But dont know how to call sqoop command using java programs . Can anyone give some code view?

Winter answered 10/2, 2012 at 14:50 Comment(1)
Runtime.exec(command); Is that what you are looking for?docs.oracle.com/javase/1.5.0/docs/api/java/lang/Runtime.htmlAlcot
A
23

You can run sqoop from inside your java code by including the sqoop jar in your classpath and calling the Sqoop.runTool() method. You would have to create the required parameters to sqoop programmatically as if it were the command line (e.g. --connect etc.).

Please pay attention to the following:

  • Make sure that the sqoop tool name (e.g. import/export etc.) is the first parameter.
  • Pay attention to classpath ordering - The execution might fail because sqoop requires version X of a library and you use a different version. Ensure that the libraries that sqoop requires are not overshadowed by your own dependencies. I've encountered such a problem with commons-io (sqoop requires v1.4) and had a NoSuchMethod exception since I was using commons-io v1.2.
  • Each argument needs to be on a separate array element. For example, "--connect jdbc:mysql:..." should be passed as two separate elements in the array, not one.
  • The sqoop parser knows how to accept double-quoted parameters, so use double quotes if you need to (I suggest always). The only exception is the fields-delimited-by parameter which expects a single char, so don't double-quote it.
  • I'd suggest splitting the command-line-arguments creation logic and the actual execution so your logic can be tested properly without actually running the tool.
  • It would be better to use the --hadoop-home parameter, in order to prevent dependency on the environment.
  • The advantage of Sqoop.runTool() as opposed to Sqoop.Main() is the fact that runTool() return the error code of the execution.

Hope that helps.

final int ret = Sqoop.runTool(new String[] { ... });
if (ret != 0) {
  throw new RuntimeException("Sqoop failed - return code " + Integer.toString(ret));
}

RL

Ariose answered 11/2, 2012 at 19:10 Comment(5)
thanks...I was searching for Sqoop Docs over net to see details of each method...but could not found....can u tell me about sqoop DocsWinter
The docs I've used are in archive.cloudera.com/cdh/3/sqoop/…. However, they don't describe that method of executing Sqoop. I've used the Sqoop source code for that, with some help from the oozie project (incubator.apache.org/oozie)Ariose
I'm able to run this java client successfully, but it has created part-m-0000 file on local file system not HDFS. I checked the logs to find job has executed locally, Job Id itself specifies : job_local362447144_0001. How to run this similar to sqoop cli cmd?Jameson
Resolved the above mentioned issue, for executing the bundled jar we need to use yarn jar jar_name not java -jar jar_nameJameson
when I run this, there is a warning SQOOP_CONF_DIR not set and later the error ERROR oracle.OracleConnectionFactory: Unable to load the jdbc driver class : oracle.jdbc.OracleDriver. When i run the same command from commandline, everything works. Any suggestions?Flambeau
C
12

Find below a sample code for using sqoop in Java Program for importing data from MySQL to HDFS/HBase. Make sure you have sqoop jar in your classpath:

        SqoopOptions options = new SqoopOptions();
        options.setConnectString("jdbc:mysql://HOSTNAME:PORT/DATABASE_NAME");
        //options.setTableName("TABLE_NAME");
        //options.setWhereClause("id>10");     // this where clause works when importing whole table, ie when setTableName() is used
        options.setUsername("USERNAME");
        options.setPassword("PASSWORD");
        //options.setDirectMode(true);    // Make sure the direct mode is off when importing data to HBase
        options.setNumMappers(8);         // Default value is 4
        options.setSqlQuery("SELECT * FROM user_logs WHERE $CONDITIONS limit 10");
        options.setSplitByCol("log_id");

        // HBase options
        options.setHBaseTable("HBASE_TABLE_NAME");
        options.setHBaseColFamily("colFamily");
        options.setCreateHBaseTable(true);    // Create HBase table, if it does not exist
        options.setHBaseRowKeyColumn("log_id");

        int ret = new ImportTool().run(options);

As suggested by Harel, we can use the output of the run() method for error handling. Hoping this helps.

Cynthla answered 29/4, 2013 at 16:8 Comment(0)
J
3

There is a trick which worked out for me pretty well. Via ssh, you can execute the Sqoop command directly. Just you have to use is an SSH Java Library

This is independent of Java. You just need to include any SSH library and sqoop installed in the remote system you want to perform the import. Now connect to the system via ssh and execute the commands which will export data from MySQL to hive.

You have to follow this step.

Download sshxcute java library: https://code.google.com/p/sshxcute/ and Add it to the build path of your java project which contains the following Java code

import net.neoremind.sshxcute.core.SSHExec;
import net.neoremind.sshxcute.core.ConnBean;
import net.neoremind.sshxcute.task.CustomTask;
import net.neoremind.sshxcute.task.impl.ExecCommand;

public class TestSSH {

public static void main(String args[]) throws Exception{

    // Initialize a ConnBean object, the parameter list is IP, username, password

    ConnBean cb = new ConnBean("192.168.56.102", "root","hadoop");

    // Put the ConnBean instance as parameter for SSHExec static method getInstance(ConnBean) to retrieve a singleton SSHExec instance
    SSHExec ssh = SSHExec.getInstance(cb);          
    // Connect to server
    ssh.connect();
    CustomTask sampleTask1 = new ExecCommand("echo $SSH_CLIENT"); // Print Your Client IP By which you connected to ssh server on Horton Sandbox
    System.out.println(ssh.exec(sampleTask1));
    CustomTask sampleTask2 = new ExecCommand("sqoop import --connect jdbc:mysql://192.168.56.101:3316/mysql_db_name --username=mysql_user --password=mysql_pwd --table mysql_table_name --hive-import -m 1 -- --schema default");
    ssh.exec(sampleTask2);
    ssh.disconnect();   
}
}
Joachim answered 27/6, 2014 at 16:7 Comment(1)
working fine, but the log seems to be incorrect. Sqoop job is executed successfully and console is showing wrong that job is failed. Check if exec success or not ... Execution failed while executing command:sqoop import --connect.... Error message: <sqoop job log is written hereWild
P
0

If you know the location of the executable and the command line arguments you can use a ProcessBuilder, this can then be run a separate Process that Java can monitor for completion and return code.

Papua answered 10/2, 2012 at 14:56 Comment(1)
Generally if there is a api integration point, its better to use that than spawning off child processes.Carillonneur
B
0

Please follow the code given by vikas it worked for me and include these jar files in classpath and import these packages

import com.cloudera.sqoop.SqoopOptions; import com.cloudera.sqoop.tool.ImportTool;

Ref Libraries

  1. Sqoop-1.4.4 jar /sqoop
  2. ojdbc6.jar /sqoop/lib (for oracle)
  3. commons-logging-1.1.1.jar hadoop/lib
  4. hadoop-core-1.2.1.jar /hadoop
  5. commons-cli-1.2.jar hadoop/lib
  6. commmons-io.2.1.jar hadoop/lib
  7. commons-configuration-1.6.jar hadoop/lib
  8. commons-lang-2.4.jar hadoop/lib
  9. jackson-core-asl-1.8.8.jar hadoop/lib
  10. jackson-mapper-asl-1.8.8.jar hadoop/lib
  11. commons-httpclient-3.0.1.jar hadoop/lib

JRE system library

1.resources.jar jdk/jre/lib 2.rt.jar jdk/jre/lib 3. jsse.jar jdk/jre/lib 4. jce.jar jdk/jre/lib 5. charsets,jar jdk/jre/lib 6. jfr.jar jdk/jre/lib 7. dnsns.jar jdk/jre/lib/ext 8. sunec.jar jdk/jre/lib/ext 9. zipfs.jar jdk/jre/lib/ext 10. sunpkcs11.jar jdk/jre/lib/ext 11. localedata.jar jdk/jre/lib/ext 12. sunjce_provider.jar jdk/jre/lib/ext

Sometimes u get error if your eclipse project is using JDK1.6 and the libraries you add are JDK1.7 for this case configure JRE while creating project in eclipse.

Vikas if i want to put the imported files into hive should i use options.parameter ("--hive-import") ?

Burrstone answered 12/11, 2013 at 4:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.