launching ipyparallel cluster across multiple nodes using MPI
Asked Answered
B

0

6

I am trying to start a ipyparallel cluster using MPI.

The ipcluster_config has following lines modified as such:

c.MPILauncher.mpi_cmd = ['mpiexec']
c.MPIControllerLauncher.controller_args = ['--ip=*']
c.MPILauncher.mpi_args = ["-machinefile", "~/mpi_hosts"]

The ipcontroller_config.py is configured as such:

c.HubFactory.engine_ip = '*'
c.HubFactory.ip = '*'
c.HubFactory.client_ip = '*'

However, when I launch the cluster using command ipcluster start --profile mpi -n 2 it fails with following message

Engines shutdown early, they probably failed to connect.
You can set this by adding "--ip='*'" to your ControllerLauncher.controller_args

Not sure how to debug further.

Bolter answered 12/11, 2017 at 5:26 Comment(4)
Try running ipcluster start --profile mpi -n 2 --debug and post the logs from the sameQuiroz
Thanks Tarun. This helps. It seems ipcluster is not able to find mpiexec. I need to figure out how to configure ipcluster so it loads the modules.Bolter
Did you install the MPI package?Quiroz
I am on a PBS cluster environment. I have to do module load to see mpiexec in the path. I guess when ipcluster is launching engines on remote nodes, it does not do "module load". I am looking into configs to see if there is any place to specify that.Bolter

© 2022 - 2024 — McMap. All rights reserved.