Addressing issues with Apache Spark application run in Client mode from Docker container

I'm trying to connect to Standalone Apache Spark cluster from a dockerized Apache Spark application using Client mode.

Driver gives the Spark Master and the Workers its address. When run inside a docker container it will use some_docker_container_ip. The docker address is not visible from outside so an application won't work.

Spark has spark.driver.host property. This property is passed to Master and Workers. My initial instinct was to pass host machine address in there so the cluster would address visible machine instead.

Unfortunately the spark.driver.host is also used to set up a server by Driver. Passing a host machine address in there will cause server startup errors because a docker container cannot bind ports under host machine host.

It seems like a lose-lose situation. I cannot use neither the host machine address nor the docker container address.

Ideally I would like to have two properties. The spark.driver.host-to-bind-to used to set up the driver server and the spark.driver.host-for-master which would be used by Master and Workers. Unfortunately it seems like I'm stuck with one property only.

Another approach would be to use --net=host when running a docker container. This approach has many disadvantages (e.g. other docker containers cannot get linked to a container with the --net=host on and must be exposed outside of the docker network) and I would like to avoid it.

Is there any way I could solve the driver-addressing problem without exposing the docker containers?

Recommended topics

Hot tags