I'm trying to build an Oozie workflow to execute everyday a python script which needs specific libraries to run.
At the moment I created a python virtual environment (using venv) on a node of my cluster (consisting of 11 nodes). Through Oozie I saw that it is possible to run the script using an SSH Action specifying the node containing the virtual environment. Alternatively it is possible to use a Shell Action to run the python script but this requires creating the virtual environment, with the same dependencies in terms of libraries, on the node where the shell will be executed (any of the cluster nodes).
I would like to avoid sharing keys or configuring all the cluster nodes to make this possible and looking in the docs I found this section talking about launching applications using Docker containers but in Hadoop version of my cluster this feature is experimental and not complete (Hadoop 3.0.0). I suppose that if you can launch Docker containers from shell you should be able to launch them from Oozie.
So my question is: has anyone tried to do it? Is it a trick to use docker this way?
I came across this question but to date 2019/09/30 there are no specific answers.
UPDATE: I tried to do it, and it works (you can find more info in my answer to this question). I'm still wondering if it's a correct way to do it.