moduleNotFoundError in Pyspark running with Spark-Submit
Asked Answered
B

1

0

I have an ETL code which has been written with Pyspark. I have two bash scripts to run the code. When I use this script, it's run without any problems:

 #!/bin/bash

 cd /root/Desktop/project
 rm etl.zip
 zip -r ./etl.zip * -x logs/\*
/u01/spark/bin/spark-submit --conf spark.driver.host=x.x.x.x --conf spark.driver.memory=28g --conf spark.executor.memory=24g --conf spark.rpc.message.maxSize=1024 --conf spark.rpc.askTimeout=600s --conf spark.sql.broadcastTimeout=36000 --driver-class-path ojdbc8.jar  --files config.json,basic_tables_config.json --py-files etl.zip project_main.py  

But, when I use this scrip, I receive moduleNotFoundError. One of the python file is not found.

 #!/bin/bash


 cd /root/Desktop/project
 rm etl.zip
 zip -r ./etl.zip * -x logs/\*
 /u01/spark/bin/spark-submit --conf spark.driver.host=x.x.x.x --conf spark.driver.memory=28g --conf spark.executor.memory=24g  --driver-class-path ojdbc8.jar --files config.json,basic_tables_config.json --py-files etl.zip project_main.py

I checked two scripts, first one had these parameters:

   --conf spark.rpc.message.maxSize=1024 --conf spark.rpc.askTimeout=600s --conf spark.sql.broadcastTimeout=36000

Would you please guide me what is wrong with the second script?

Any help is really appreciated.

Bluecoat answered 18/1 at 6:12 Comment(3)
Add a comma between your .json files as in the first script, and it would be fine. Because right now its missing: ...--files config.json basic_tables_config.json...Stephniestepladder
@mazaneicha, I have add that, I just wrote wrong here. Thank you to say that.Bluecoat
Now you have a space after comma which should be removed: --files config.json, basic_tables_config.json. Alternatively, you can enclose the whole argument in double-quotes.Stephniestepladder
D
1

Can you show me the logs?

I have encountered similar issues and need to add the following configurations

--conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=/home/user/anaconda3/envs/env2/bin/python3.6
--conf spark.pyspark.driver.python=/home/user/anaconda3/envs/env2/bin/python3.6
Detrimental answered 18/1 at 7:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.