Probe seems to consume the CPU

Asked 28/1, 2013 at 11:9 Answered 28/1, 2015 at 0:43

I've got an MPI program consisting of one master process that hands off commands to a bunch of slave processes. Upon receiving a command, a slave just calls system() to do it. While the slaves are waiting for a command, they are consuming 100% of their respective CPUs. It appears that Probe() is sitting in a tight loop, but that's only a guess. What do you think might be causing this, and what could I do to fix it?

Here's the code in the slave process that waits for a command. Watching the log and the top command at the same time suggests that when the slaves are consuming their CPUs, they are inside this function.

MpiMessage
Mpi::BlockingRecv() {
  LOG(8, "BlockingRecv");

  MpiMessage result;
  MPI::Status status;

  MPI::COMM_WORLD.Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, status);
  result.source = status.Get_source();
  result.tag = status.Get_tag();

  int num_elems = status.Get_count(MPI_CHAR);
  char buf[num_elems + 1];
  MPI::COMM_WORLD.Recv(
     buf, num_elems, MPI_CHAR, result.source, result.tag
  );
  result.data = buf;
  LOG(7, "BlockingRecv about to return (%d, %d)", result.source, result.tag);
  return result;
}

Pfosi answered 28/1, 2013 at 11:9 Comment(1)

Note that you should be aware of possible segmentation faults when calling fork() on systems with OpenFabrics interconnect (InfiniBand or iWARP). – Hux 28/1, 2013 at 17:46

Yes; most MPI implementations, for the sake of performance, busy-wait on blocking operations. The assumption is that the MPI job is the only thing going on that we care about on the processor, and if the task is blocked waiting for communications, the best thing to do is to continually poll for that communication to reduce latency; so that there's virtually no delay between when the message arrives and when it's handed off to the MPI task. This typically means that CPU is pegged at 100% even when nothing "real" is being done.

That's probably the best default behaviour for most MPI users, but it isn't always what you want. Typically MPI implementations allow turning this off; with OpenMPI, you can turn this behaviour off with an MCA parameter,

mpirun -np N --mca mpi_yield_when_idle 1 ./a.out

Pampuch answered 28/1, 2013 at 13:14 Comment(1)

Thanks! Indeed, I was running more processes than processors. – Pfosi 31/1, 2013 at 11:51

It sounds like there are three ways to wait for an MPI message:

Aggressive busy wait. This will get the message into your receiving code as fast as possible. Some processor is doing nothing but checking for the incoming message. If you put all of your processors in this state, the rest of your system is going to be very slow. MPI uses aggressive mode by default.
Degraded busy wait. This will yield to other processes while doing its busy wait. If the number of processes you ask for is more than the number of processors you have, MPI switches to degraded mode. You can also force aggressive or degraded mode with an MCA parameter.
Polling. Even the degraded busy wait is still a busy wait, and it will keep one processor pegged at 100% for each process that is waiting. If you have other tasks on your system that you don't want to compete with, you can call MPI_Iprobe() in a loop with a sleep call before calling a blocking receive. I find a 100ms sleep is responsive enough for my tasks, and still keeps the CPU usage minimal when a worker is idle.

I did some searching and found that a busy wait is what you want if you are not sharing your processors with other tasks.

Dov answered 28/1, 2015 at 0:43 Comment(0)

Recommended topics

Hot tags