I have successfully set up the password less ssh between the servers and my computer. There is a simple openMPI program which is running well on the single computer. But ,unfortunately when i am trying this on a cluster ,neither i am getting a password prompt(as i have set up ssh authorization) nor the execution is moving forward.
Hostfile looks like this,
# The Hostfile for Open MPI
# The master node, 'slots=8' is used because it has 8 cores
localhost slots=8
# The following slave nodes are single processor machines:
[email protected] slots=8
gautam@srvgrm04 slots=160
I am running hello world MPI program on the cluster,
int main(int argc, char *argv[]) {
int numprocs, rank, namelen;
char processor_name[MPI_MAX_PROCESSOR_NAME];
double t;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processor_name, &namelen);
printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
and i am running like this mpirun -np 16 --hostfile hostfile ./hello
when using -d option, the log is like this,
[gautam@pcys33:~/LTE/check ]% mpirun -np 16 --hostfile hostfile -d ./hello
[pcys33.grm.polymtl.ca:02686] procdir: /tmp/[email protected]_0/60067/0/0
[pcys33.grm.polymtl.ca:02686] jobdir: /tmp/[email protected]_0/60067/0
[pcys33.grm.polymtl.ca:02686] top: [email protected]_0
[pcys33.grm.polymtl.ca:02686] tmp: /tmp
[srvgrm04:77812] procdir: /tmp/openmpi-sessions-gautam@srvgrm04_0/60067/0/1
[srvgrm04:77812] jobdir: /tmp/openmpi-sessions-gautam@srvgrm04_0/60067/0
[srvgrm04:77812] top: openmpi-sessions-gautam@srvgrm04_0
[srvgrm04:77812] tmp: /tmp
can you make a inference from the logs ?
to get some idea what's happening. – Beaconhello
exists on all nodes and is located in the same filesystem path? Apparently the ORTE daemon is launching successfully on the second node, although the absence ofpcys13.grm.polymtl.ca
in the log could indicate that there is a problem connecting to it (or is it an alias forsrvgrm04
?) BTW, you don't have to specify the usernames in the hostfile if they are the same as the one on the master host. – Waisted