You can pass the arguments from the spark-submit command and then access them in your code in the following way,
sys.argv[1] will get you the first argument, sys.argv[2] the second argument and so on. Refer to the below example,
You can create code as below to take the arguments which you will be passing in the spark-submit command,
import os
import sys
n = int(sys.argv[1])
a = 2
tables = []
for _ in range(n):
tables.append(sys.argv[a])
a += 1
print(tables)
Save the above file as PysparkArg.py and execute the below spark-submit command,
spark-submit PysparkArg.py 3 table1 table2 table3
Output:
['table1', 'table2', 'table3']
This piece of code can be used in PySpark jobs where it is required to fetch multiple tables from the database and, the number of tables to be fetched & the table names will be given by the user while executing the spark-submit command.