Oozie + Sqoop: JDBC Driver Jar Location
Asked Answered
U

4

6

I have a 6 node cloudera based hadoop cluster and I'm trying to connect to an oracle database from a sqoop action in oozie.

I have copied my ojdbc6.jar into the sqoop lib location (which for me happens to be at: /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/sqoop/lib/ ) on all the nodes and have verified that I can run a simple 'sqoop eval' from all the 6 nodes.

Now when I run the same command using Oozie's sqoop action, I get "Could not load db driver class: oracle.jdbc.OracleDriver"

I have read this article about using shared libs and it makes sense to me when we're talking about my task/action/workflow specific dependencies. But I see a JDBC driver installation as an extention to sqoop and so I think it belongs in the sqoop installation lib.

Now the question is, while sqoop sees this ojdbc6 jar I have put into it's lib folder, how come my Oozie workflow doesn't see it?

Is this something expected or am I missing something?

As an aside, what do you guy think about where is the appropriate location for a JDBC driver jar?

Thanks in advance!

Underplot answered 12/4, 2013 at 18:13 Comment(1)
Use <file>my-jdbc.jar</file> to pass your JDBC jar in Oozie workflow.xml. This can be done easily using Hue too.Simdars
D
7

The JDBC driver jar (and any jars it depends on) should go in your Oozie sharelib folder on HDFS. I'm running Hortonworks Data Platform 1.2 instead of Cloudera 4.2 so the details may vary, but my JDBC driver is located in /user/oozie/share/lib/sqoop. This should allow you to run Sqoop with the JDBC via Oozie.

It is not necessary to put to the JDBC driver jar in the sqoop lib on the data nodes. In my setupt I can't run a simple sqoop eval from the command line on my data nodes. I understand the logic for why you thought this would work. The reason the JDBC driver jar needs to be on HDFS is so that all the data nodes have access to it. Your solution should accomplish the same goal. I'm not familiar enough with the inner workings of Oozie to say why using the sharelib works but your solution does not.

Demand answered 12/4, 2013 at 18:49 Comment(6)
Hey Dan, I have actually put the ojdbc6.jar at /user/oozie/share/lib/sqoop and have gotten the job to successfully run.Underplot
Sorry, I thought you were still in the phase of trying to get it run at all. Wish I could help more. Out of curiosity, why are you looking for a non-sharelib solution?Demand
Sorry Dan, I was in the middle of my comment and I hit enter to insert a line break which apparently posts the comment on here. :) Here's the complete comment I intended to post: Hey Dan, Thanks for the quick reply. I have actually put the ojdbc6.jar at /user/oozie/share/lib/sqoop and have gotten the job to successfully run. I guess my question is more about understanding the design philosophy behind why oozie basically ignores a jar that is present in sqoops own lib folder. :)Underplot
To answer your second question, again, it's a design question... where does a jdbd driver belong? When it comes to dependencies for my workflow, I completely agree that they belong either in the shared lib or in the workflow's lib. But, when it comes to a JDBC driver, I see that as extending sqoop itself and so I think it belongs in sqoops lib folder. (Just like we used to drop xml parsing libraries in the jdk's ext directory) So, I'm looking to see whether my thought process is correct or flawed. :)Underplot
Gotcha. From what I've seen, Oozie likes to copy all the jars from sharelib into a temp folder (including the sqoop jar itself) on the datanode and run from there. This suggests to me that Oozie doesn't care if Sqoop is installed on the datanode at all (thought I haven't actually tried uninstalling Sqoop from the datanodes and seeing how Oozie handles it). This would fit in with a design philosophy of "I don't know if the datanode is set up correctly, so I'm going to assume it's not and work from there." This is idle guessing on my part though, hopefully someone can give the real answer.Demand
This: "Oozie likes to copy all the jars from sharelib into a temp folder (including the sqoop jar itself)" answers why this is happening. Thanks Dan! Yeah, your point about "I don't know if the datanode is set up correctly" makes sense.Underplot
A
5

In CDH5, you should put the jar to '/user/oozie/share/lib/lib_${timestamp}/sqoop', and after that, you must update the sharelib or restart oozie.

update sharelib:

oozie admin -oozie http://localhost:11000/oozie -sharelibupdate

Abdicate answered 17/11, 2016 at 7:47 Comment(0)
P
1

If you are using CDH-5 the JDBC driver jar (and any jars it depends on) should go in '/user/oozie/share/lib/lib_timestamp/sqoop' folder on HDFS.

Propriety answered 30/11, 2015 at 4:54 Comment(0)
N
0

I was facing the same issue it was not able to find the mysql jar. I am using cloudera 4.4 in this even oozie admin -oozie http://localhost:11000/oozie -sharelibupdate command will not work

To resolve the issue I had followed the below steps:

  1. create a user in Hue with hdfs and provide the admin privileges

  2. using Hue UI upload the jar into /user/oozie/share/lib/sqoop hdfs path or you can use below command:

hadoop put /var/lib/sqoop2/mysql-connector-java.jar /user/oozie/share/lib/sqoop

  1. Once the jar is placed run the oozie command.
Nock answered 22/11, 2017 at 19:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.