moduleNotFoundError in Pyspark running with Spark-Submit

I have an ETL code which has been written with Pyspark. I have two bash scripts to run the code. When I use this script, it's run without any problems:

 #!/bin/bash

 cd /root/Desktop/project
 rm etl.zip
 zip -r ./etl.zip * -x logs/\*
/u01/spark/bin/spark-submit --conf spark.driver.host=x.x.x.x --conf spark.driver.memory=28g --conf spark.executor.memory=24g --conf spark.rpc.message.maxSize=1024 --conf spark.rpc.askTimeout=600s --conf spark.sql.broadcastTimeout=36000 --driver-class-path ojdbc8.jar  --files config.json,basic_tables_config.json --py-files etl.zip project_main.py

But, when I use this scrip, I receive moduleNotFoundError. One of the python file is not found.

 #!/bin/bash


 cd /root/Desktop/project
 rm etl.zip
 zip -r ./etl.zip * -x logs/\*
 /u01/spark/bin/spark-submit --conf spark.driver.host=x.x.x.x --conf spark.driver.memory=28g --conf spark.executor.memory=24g  --driver-class-path ojdbc8.jar --files config.json,basic_tables_config.json --py-files etl.zip project_main.py

I checked two scripts, first one had these parameters:

   --conf spark.rpc.message.maxSize=1024 --conf spark.rpc.askTimeout=600s --conf spark.sql.broadcastTimeout=36000

Would you please guide me what is wrong with the second script?

Any help is really appreciated.

Recommended topics

Hot tags