I am running an apache beam workload on Spark. I initialized the workers with 32GB of memory (slave run with -c 2 -m 32G
). Spark submit sets driver memory to 30g and executor memory to 16g. However, executors fail with java.lang.OutOfMemoryError: Java heap space
.
The master gui indicates that memory per executor is 1024M. In addition, I see that all java processes are launched with -Xmx 1024m
. This means spark-submit doesn't propagate it's executor settings to the executors.
Pipeline options are as follows:
--runner PortableRunner \
--job_endpoint=localhost:8099 \
--environment_type=PROCESS \
--environment_config='{"command": "$HOME/beam/sdks/python/container/build/target/launcher/linux_amd64/boot"}'
Job endpoint is setup in the default way:
docker run --rm --network=host --name spark-jobservice apache/beam_spark_job_server:latest --spark-master-url=spark://$HOSTNAME:7077
How do I make sure the settings propagate to the executors?
Update: I set conf/spark-defaults.conf to
spark.driver.memory 32g
spark.executor.memory 32g
and conf/spark-env.sh to
SPARK_EXECUTOR_MEMORY=32g
and restarted the cluster and relaunched everything, and executor memory is still limited to 1024M