Run external python dependencies with spark-submit?
Asked Answered
M

0

7

I have a test.py file

import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.externals import joblib
import tqdm
import time

print("Successful import")

I have followed this method to create independent zip of all dependencies

pip install -t dependencies -r requirements.txt
cd dependencies
zip -r ../dependencies.zip .

which creates this tree structure (dependencies.zip)

dependencies.zip
     ->pandas
     ->numpy
     ->........

and when I run

spark-submit --py-files /home/ion/Documents/dependencies.zip /home/ion/Documents/sentiment_analysis/test.py

I get the following error

2018-05-16 07:36:21 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
  File "/home/ion/Documents/sentiment_analysis/test.py", line 2, in <module>
    from encoder import Model
  File "/home/ion/Documents/sentiment_analysis/encoder.py", line 2, in <module>
    import numpy as np
  File "/home/ion/Documents/dependencies.zip/numpy/__init__.py", line 142, in <module>
  File "/home/ion/Documents/dependencies.zip/numpy/add_newdocs.py", line 13, in <module>
  File "/home/ion/Documents/dependencies.zip/numpy/lib/__init__.py", line 8, in <module>
  File "/home/ion/Documents/dependencies.zip/numpy/lib/type_check.py", line 11, in <module>
  File "/home/ion/Documents/dependencies.zip/numpy/core/__init__.py", line 26, in <module>
ImportError: 
Importing the multiarray numpy extension module failed.  Most
likely you are trying to import a failed build of numpy.
If you're working with a numpy git repo, try `git clean -xdf` (removes all
files not under version control).  Otherwise reinstall numpy.

Original error was: cannot import name multiarray

2018-05-16 07:36:21 INFO  ShutdownHookManager:54 - Shutdown hook called
2018-05-16 07:36:21 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-a3c2ec75-6c12-4ac2-ae2c-b36412209889

Is there any way So that I can run this python script as spark jon without changing the code in pyspark or changing a minimum of code?

Mcelrath answered 16/5, 2018 at 2:9 Comment(1)
Have you checked here: #47906046 ?Le

© 2022 - 2024 — McMap. All rights reserved.