sem_timedwait not supported properly on RedHat Enterprise Linux 5.3 onwards?
Asked Answered
S

2

5

We're seeing odd behaviour on RedHat Enterprise Linux systems with pthreads sem_timedwait. It's only occurring with versions 5.3 onwards.

When we create the semaphore on a background thread with sem_init, no error is returned. When we do sem_timedwait, we get an immediate return with errno = 38 (ENOSYS) indicating it's not supported.

If we do the same thing on the main thread, it works as expected and we get no error from sem_timedwait.

We don't see it on RHEL 5.2 or before. We've tried compiling our code with gcc 3.2.3 and 4.1.2 and get the same result, so it seems to be a run-time issue.

So, my questions (finally ;)

1) has anyone else seen this? 2) is it a known issue with RHEL 5.3 onwards? 3) we're using sem_timedwait to sleep a single thread. What alternatives are there on Linux to do the same thing?

If this is a duplicate of another question, let me know. I've looked but can't find one with the same question, just similar ones for OSX which isn't what we're using.

thanks, pxb

Update: just done some more testing with the following results:

  • if I do a 64 bit build using gcc 4.1.2 on a RHEL5.4 box (with -L/usr/lib64 and -lstdc++ -lrt) and run it on a 64 bit install of RHEL5 it works fine
  • if I do a 32 bit build using gcc 4.1.2 on a RHEL5.1 box (with -L/usr/lib and -lstdc++ -lrt) and run it on a exactly the same 64 bit RHEL5 box, we get ENOSYS errors from sem_timedwait

So, it appears to be a difference between the 64 and 32 bit runtime libs on RHEL5.4 (and seemingly RHEL5.3). The only other difference was that the 32 and 64 bit builds were done of RHEL5.1 and RHEL5.4 boxes respectively.

Sarcoma answered 2/12, 2009 at 11:38 Comment(1)
Managed to fix it, but I'm not sure how why the fix works. The code that calls the semaphore functions was in a class inside a .so, and these were in turn being called from the executable itself. If we move the code from the cpp file to the header (ie. effectively inline the class into the executable) then the problem disappears. so, it seems to be this: on RHEL5.4 if we call the sem_timedwait from within the .so it fails, but if we make the same call from the executable it works. And I have no idea why...Sarcoma
S
5

Finally found out what the issue is. On RHEL 5.4 if we call sem_init then do sem_timedwait we get somewhat random behaviour of the timed wait, depending on where the code is located, whether the object that owns the sem_t is on the heap or stack, etc. Sometimes the timed wait returns immediately with errno = 38 (ENOSYS), sometimes it waits correctly before returning.

Running it via valgrind gives this error:

==32459== Thread 2:
==32459== Syscall param futex(op) contains uninitialised byte(s)
==32459==    at 0x406C78: sem_timedwait (in /lib/libpthread-2.5.so)
==32459==    by 0x8049F2E: TestThread::Run() (in /home/stsadm/semaphore_test/semaphore_test)
==32459==    by 0x44B2307: nxThread::_ThreadProc(void*) (in /home/stsadm/semaphore_test/libcore.so)
==32459==    by 0x4005AA: start_thread (in /lib/libpthread-2.5.so)
==32459==    by 0x355CFD: clone (in /lib/libc-2.5.so)

If I run exactly the same code on RHEL 5.2 the problem goes away and valgrind reports no errors.

If I do a memset on the sem_t variable before calling sem_init the problem goes away on RHEL 5.4

memset( &_semaphore, 0, sizeof( sem_t ) );

So, it looks like a bug has been introduced with semaphores on RHEL5.4 or something that it uses internally, and sem_init isn't correctly initialising the sem_t memory. Or, sem_timed wait has changed to be sensitive to this in a way it wasn't before.

Interestingly, in no cases does sem_init return an error to indicate it didn't work though.

Alternatively, if the expected behaviour is that sem_init won't intialise the memory of sem_t and that's up to the caller, then the behaviour has certainly changed with RHEL 5.4

pxb

Update - here's the test case code in case anyone else wants to try it. Note the problem only occurs when sem_timedwait is called from a .so, and only RHEL5.4 (maybe 5.3 haven't tested it), and only when built as a 32 bit binary (linking against 32 bit libs of course)

1) in semtest.cpp

#include <semaphore.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <time.h>

void semtest( int semnum, bool initmem )
{
        sem_t sem;

        if ( initmem )
        {
                memset( &sem, 0, sizeof( sem_t ) );
                printf( "sem %d: memset size = %d\n", semnum, sizeof( sem_t ) );
        }

        errno = 0;
        int res = sem_init( &sem, 0, 0 );

        printf( "sem %d: sem_init res = %d, errno = %d\n", semnum, res, errno );

        timespec ts;
        clock_gettime( CLOCK_REALTIME, &ts );
        ts.tv_sec += 1;

        errno = 0;
        res = sem_timedwait( &sem, &ts );

        printf( "sem %d: sem_timedwait res = %d, errno = %d\n\n", semnum, res, errno );
}

2) in main.cpp (note the duplicate test function so we can compare running from within the .so with in the exe)

#include <semaphore.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <time.h>

extern void semtest( int semnum, bool initmem );

void semtest_in_exe( int semnum, bool initmem )
{
        sem_t sem;

        if ( initmem )
        {
                memset( &sem, 0, sizeof( sem_t ) );
                printf( "sem %d: memset size = %d\n", semnum, sizeof( sem_t ) );
        }

        errno = 0;
        int res = sem_init( &sem, 0, 0 );

        printf( "sem %d: sem_init res = %d, errno = %d\n", semnum, res, errno );

        timespec ts;
        clock_gettime( CLOCK_REALTIME, &ts );
        ts.tv_sec += 1;

        errno = 0;
        res = sem_timedwait( &sem, &ts );

        printf( "sem %d: sem_timedwait res = %d, errno = %d\n\n", semnum, res, errno );
}

int main(int argc, char* argv[], char** envp)
{
        semtest( 1, false );
        semtest( 2, true );
        semtest_in_exe( 3, false );
        semtest_in_exe( 4, true );
}

3) here's the Makefile

all: main

semtest.o: semtest.cpp
        gcc -c -fpic -m32 -I /usr/include/c++/4.1.2 -I /usr/include/c++/4.1.2/i386-redhat-linux semtest.cpp -o semtest.o

libsemtest.so: semtest.o
        gcc -shared -m32 -fpic -lstdc++ -lrt semtest.o -o libsemtest.so

main: libsemtest.so
        gcc -m32 -L . -lsemtest main.cpp -o semtest

The test cases are:

  1. run from within .so without doing memset
  2. run from within .so and do memset
  3. run from within exe without doing memset
  4. run from within exe and do memset

And here's the result running on RHEL5.4

sem 1: sem_init res = 0, errno = 0
sem 1: sem_timedwait res = -1, errno = 38

sem 2: memset size = 16
sem 2: sem_init res = 0, errno = 0
sem 2: sem_timedwait res = -1, errno = 110

sem 3: sem_init res = 0, errno = 0
sem 3: sem_timedwait res = -1, errno = 110

sem 4: memset size = 16
sem 4: sem_init res = 0, errno = 0
sem 4: sem_timedwait res = -1, errno = 110

You can see that case 1 returns immediately with errno = 38.

If we run the exact same code on RHEL5.2 we get the following:

sem 1: sem_init res = 0, errno = 0
sem 1: sem_timedwait res = -1, errno = 110

sem 2: memset size = 16
sem 2: sem_init res = 0, errno = 0
sem 2: sem_timedwait res = -1, errno = 110

sem 3: sem_init res = 0, errno = 0
sem 3: sem_timedwait res = -1, errno = 110

sem 4: memset size = 16
sem 4: sem_init res = 0, errno = 0
sem 4: sem_timedwait res = -1, errno = 110

You can see that all cases now work as expected!

Sarcoma answered 7/12, 2009 at 16:49 Comment(1)
Thanks, helped me too. But I was on Ubuntu 9.10. Code was working before, just a little change broke it (started failing with errno 38). And memset solved the problem.Accordingly
F
3

It seems that semtest is calling sem_init@GLIBC_2.1, and libsemtest.so is calling sem_init@GLIBC_2.0.

sem_timedwait() seems to require version 2.1.

I got correct results for all four tests by adding -lpthread to the rule which creates libsemtest.so.

I have tested this on RH 5.3.

Fineman answered 25/4, 2011 at 7:9 Comment(3)
This is actually the correct approach (the accepted answer isn't).Drinking
Why and how does this sort of discrepancy occur? Is linking each library against lpthread the only way to ensure they are consistent?Burks
sem_init/sem_timedwait indeed require libpthread (check the man page). I think POSIX threads got added/implemented later than semaphores, so glibc used to provide the semaphore functions at some point and still keeps doing that for the sake of backward compatibility. Newer code should link to libpthread directly. That said, seems like the bug discussed here has been fixed in glibc: lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151012/…Theroid

© 2022 - 2024 — McMap. All rights reserved.