How does function named connect() prevent MPI C program from running?
Asked Answered
N

1

5

I was writing a project using MPI for a parallel programming course, and decided to name one of my functions connect(). But whenever I tried to mpirun the program (using recent versions of Open MPI on Linux and OS X), I would receive output from the connect() function, even if I had not called connect() from main(); also, some of the output from main() would not appear.

This is a simplified program with the issue:

#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>

void connect(); //function name breaks mpi

int main(void) {

    int comm_sz, my_rank;
    MPI_Init(NULL, NULL);
    MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
    printf("my_rank is %d\n", my_rank);
    fflush(stdout);
    MPI_Finalize();
    return EXIT_SUCCESS;
}

void connect() {

    printf("\nNot main! \n");
    return;
}

and the output:

[me@host ~]$ mpicc bad.c -Wall
[me@host ~]$ mpirun -n 1 a.out 

Not main! 
--------------------------------------------------------------------------
orterun noticed that process rank 0 with PID 17245 on node host exited on signal 13 (Broken pipe).
--------------------------------------------------------------------------

I was about to ask on Stack Overflow what was wrong in the first place, until I discovered that renaming the function fixes it. So what I'm curious about now is why naming the function connect() prevents the program from running correctly. Could it also be an issue with mpirun/Open RTE?

Possible leads:

  • There's a connect() function in <sys/socket.h>, but I haven't yet found it mentioned in the MPI header files.
  • There's also a Connect() function (with an uppercase C) in "ompi/mpi/cxx/intracomm.h" which is indirectly included by <mpi.h>, but I thought case mattered in C/C++, and it looks like a method of a C++ class.
  • If I try executing the program like a normal one, it works when run on OS X, but not on Linux:

mac:~ me$ ./a.out 
my_rank is 0

vs

[me@linux ~]$ ./a.out 

Not main! 
Naominaor answered 13/5, 2015 at 20:14 Comment(0)
G
7

I would guess that one of the MPI functions you call is in turn calling the connect() system call. But since ELF executables have a flat namespace for symbols, your connect() is being called instead.

The problem doesn't happen on Mac OS because Mach-O libraries have a two-level namespace, so symbols in different libraries don't conflict with each other.

If you make your function static, that would probably avoid the problem as well.

Gynandry answered 13/5, 2015 at 21:8 Comment(3)
Hooray for namespaces!Robinetta
I had assumed the main() function wasn't getting called, but after examining this suggestion it appears to always be called; I will add more printf() before and between the MPI functions to identify at least the first problematic one. Also, the static suggestion works. Is there a way to further investigate if the connect() system call is indeed called, or otherwise identify the namespace conflict?Naominaor
As one of the authors of Open MPI, I'll confirm that the answer provided by +elbows is correct: Open MPI makes use of the connect(2) system call. When you have a public function named connect(2) (probably as part of MPI_Init), the linker will end up invoking your connect() function rather than connect(2). Bad Things happen from there.Firstborn

© 2022 - 2024 — McMap. All rights reserved.