My Apache beam pipeline implements custom Transforms and ParDo's python modules which further imports other modules written by me. On Local runner this works fine as all the available files are available in the same path. In case of Dataflow runner, pipeline fails with module import error.
How do I make custom modules available to all the dataflow workers? Please advise.
Below is an example:
ImportError: No module named DataAggregation
at find_class (/usr/lib/python2.7/pickle.py:1130)
at find_class (/usr/local/lib/python2.7/dist-packages/dill/dill.py:423)
at load_global (/usr/lib/python2.7/pickle.py:1096)
at load (/usr/lib/python2.7/pickle.py:864)
at load (/usr/local/lib/python2.7/dist-packages/dill/dill.py:266)
at loads (/usr/local/lib/python2.7/dist-packages/dill/dill.py:277)
at loads (/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py:232)
at apache_beam.runners.worker.operations.PGBKCVOperation.__init__ (operations.py:508)
at apache_beam.runners.worker.operations.create_pgbk_op (operations.py:452)
at apache_beam.runners.worker.operations.create_operation (operations.py:613)
at create_operation (/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py:104)
at execute (/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py:130)
at do_work (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:642)