Google DataFlow/Python: Import errors with save_main_session and custom modules in __main__
Asked Answered
L

2

8

Could somebody please clarify the expected behavior when using save_main_session and custom modules imported in __main__. My DataFlow pipeline imports 2 non-standard modules - one via requirements.txt and another one via setup_file. Unless I move the imports into the functions where they get used I keep getting import/pickling errors. Sample error is below. From the documentation, I assumed that setting save_main_session would help to solve this problem, but it does not (see error below). So I wonder if I missed something or this behavior is by design. The same import works fine when placed into a function.

Error:

  File "/usr/lib/python2.7/pickle.py", line 1130, in find_class
    __import__(module)
ImportError: No module named jmespath
Lobo answered 12/7, 2018 at 17:21 Comment(0)
S
8

https://cloud.google.com/dataflow/faq#how-do-i-handle-nameerrors https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/

When to use --save_main_session:

you can set the --save_main_session pipeline option to True. This will cause the state of the global namespace to be pickled and loaded on the Cloud Dataflow worker

The setup that best works for me is having a dataflow_launcher.py sitting at the project root with your setup.py. The only thing it does is import your pipeline file and launch it. Use setup.py to handle all your dependencies. This is the best example I've found so far.

https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/complete/juliaset

Spathose answered 12/7, 2018 at 18:28 Comment(1)
thank you for sharing this method. I will try to adapt it to the multiple pipelines that I have. Still curious if --save_main_session is supposed to fix the problem with custom modules in __main__. In my case, it does not help when there are custom modules in __main__.Lobo
C
2

In particular, --save_main_session fails if your DoFn has an __init__ using super. See https://issues.apache.org/jira/browse/BEAM-6158.

Castello answered 13/2, 2020 at 7:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.