I have an ETL code which has been written with Pyspark. I have two bash scripts to run the code. When I use this script, it's run without any problems:
#!/bin/bash
cd /root/Desktop/project
rm etl.zip
zip -r ./etl.zip * -x logs/\*
/u01/spark/bin/spark-submit --conf spark.driver.host=x.x.x.x --conf spark.driver.memory=28g --conf spark.executor.memory=24g --conf spark.rpc.message.maxSize=1024 --conf spark.rpc.askTimeout=600s --conf spark.sql.broadcastTimeout=36000 --driver-class-path ojdbc8.jar --files config.json,basic_tables_config.json --py-files etl.zip project_main.py
But, when I use this scrip, I receive moduleNotFoundError. One of the python file is not found.
#!/bin/bash
cd /root/Desktop/project
rm etl.zip
zip -r ./etl.zip * -x logs/\*
/u01/spark/bin/spark-submit --conf spark.driver.host=x.x.x.x --conf spark.driver.memory=28g --conf spark.executor.memory=24g --driver-class-path ojdbc8.jar --files config.json,basic_tables_config.json --py-files etl.zip project_main.py
I checked two scripts, first one had these parameters:
--conf spark.rpc.message.maxSize=1024 --conf spark.rpc.askTimeout=600s --conf spark.sql.broadcastTimeout=36000
Would you please guide me what is wrong with the second script?
Any help is really appreciated.
...--files config.json basic_tables_config.json...
– Stephniestepladder--files config.json, basic_tables_config.json
. Alternatively, you can enclose the whole argument in double-quotes. – Stephniestepladder