How oozie handle dependencies?
Asked Answered
C

1

9

I have several questions about oozie 2.3 share libraries:

Currently, I defined the share libraries in our coordinator.properties:

oozie.use.system.libpath=true 
oozie.libpath=<hdfs_path>

Here are my questions:

  1. When share libraries are copied to other data node and how many data node will get share libraries?

  2. Are the share libraries copied to other data node based on number of wf in a coordinator job or they are only copied once per coordinator job?

Caia answered 14/6, 2012 at 22:59 Comment(0)
K
8

Adding entries to the oozie.libpath property effectively means that OOZIE will configure those libraries to be in the mapred.cache.files configuration property (this is a DistributedCache property) when the actions in your workflow are executed.

Hadoop will then take care of copying those jars to each cluster node once per job, and the tasks are then configured with the jar in the classpath configuration property mapred.job.classpath.files

So in response to your second question, they will be copied over for each action in the workflow, not once per coordinator job. So if you have a wf job that has 4 mapreduce actions, the libraries will be copied to each tasktracker (only those task trackers that participate in the mapreduce job) 4 times in the lifetime of that workflow.

Keelson answered 15/6, 2012 at 11:47 Comment(4)
Is that possible to update multiple actions or wfs to share the same distributed cache?Caia
Not sure i understand what you're askingKeelson
I understand it means that dependencies from the system libpath will be loaded from HDFS always and avoid using distributed cache each time the workflow executes, is that so?Coitus
@TerminalUser that's a nature of hadoop job submission. nothing to do with oozie. One workaround is that you submit one job that trigger your other jobs upon completion but that's a very bad idea breaching standard methodologyEdington

© 2022 - 2024 — McMap. All rights reserved.