How to avoid race conditions in a condition variable in VxWorks

Asked 30/9, 2013 at 13:58 Answered 9/10, 2013 at 14:12

We're programming on a proprietary embedded platform sitting atop of VxWorks 5.5. In our toolbox, we have a condition variable, that is implemented using a VxWorks binary semaphore.

Now, POSIX provides a wait function that also takes a mutex. This will unlock the mutex (so that some other task might write to the data) and waits for the other task to signal (it is done writing the data). I believe this implements what's called a Monitor, ICBWT.

We need such a wait function, but implementing it is tricky. A simple approach would do this:

bool condition::wait_for(mutex& mutex) const {
    unlocker ul(mutex);    // relinquish mutex
    return wait(event);
}                          // ul's dtor grabs mutex again

However, this sports a race condition because it allows another task to preempt this one after the unlocking and before the waiting. The other task can write to the date after it was unlocked and signal the condition before this task starts to wait for the semaphore. (We have tested this and this indeed happens and blocks the waiting task forever.)

Given that VxWorks 5.5 doesn't seem to provide an API to temporarily relinquish a semaphore while waiting for a signal, is there a way to implement this on top of the provided synchronization routines?

Note: This is a very old VxWorks version that has been compiled without POSIX support (by the vendor of the proprietary hardware, from what I understood).

Eulalie answered 30/9, 2013 at 13:58 Comment(16)

Hmm... never tried to implement a 'condvar' without native support. I've always been able to live with semaphores and mutexes alone. – Lanfri 30/9, 2013 at 14:27

what does unlocker do? Because when I see this pattern, I fully expect it to use RAII style. It might unlock in the destructor. It might unlock during the lifetime. It's not obvious to me. – Oberammergau 30/9, 2013 at 14:27

@Oberammergau - I'm a little confused too. Why not just protect whatever it is with just the binary semaphore? OK, semantics different, but WTH... – Lanfri 30/9, 2013 at 14:31

@MartinJames likely because there's a need to signal all? Events have different semantics anyways. – Oberammergau 30/9, 2013 at 14:35

@sehe, @Martin: unlocker unlocks a resource in its ctor, and locks it again in its dtor. (Sorry, I though this was obvious.) – Eulalie 30/9, 2013 at 14:35

OK, this is nasty. Trying hard not to suggest a class with a mutex-protected state-machine... – Lanfri 30/9, 2013 at 14:40

Do you have a counting semaphore available as well? – Lanfri 30/9, 2013 at 14:56

VxWorks does support POSIX. It's been a while, but as far as I recall you just need to configure the kernel to include the POSIX subsystem. Couldn't you then just use the POSIX routine that you're hinting does what you want? Or have I misunderstood the question... – Fornof 30/9, 2013 at 18:2

I love how your simple approach doesn't use the timeout variable. – Livvyy 30/9, 2013 at 18:7

@bazza: "We're programming on a proprietary embedded platform sitting atop of VxWorks 5.5." That's from the mid-90s. There might be ways to do what you say, but since I started here, I have been told that we can't have POSIX. So while I'll investigate POSIX, I'd still like an answer of how to do this on foot. – Eulalie 30/9, 2013 at 18:8

@Dead: That's a remnant of a less sketchy sketch of the simple approach that I forgot to erase. :-/ – Eulalie 30/9, 2013 at 18:10

@sbi, I'm pretty sure POSIX was a standard part of VxWorks 5.5 back then. If the proprietaryness of the platform is such that you cannot put POSIX into the kernel you could exploit the way VxWorks loads to add it in as part of your program (rather than something built in to the OS). You should be able to do that if you're using the standard tool chain. – Fornof 30/9, 2013 at 21:37

I've just checked - POSIX (with condition variables) was part of 5.5, in pthreadLib. You should be able to load pthreadLib onto the system in the same way as you load your own application (it's just more load time linking). So long as you load pthreadLib first, the loader will be able to resolve your pthread function calls when you load your app. There may be some tasks to start first too, but I'm not sure about that. I also note from the docs that POSIX condition variables in VxWorks are implemented using mutexes and binary semaphores, so there is a way that doesn't require POSIX too. – Fornof 30/9, 2013 at 21:51

Alternatively you could use one of Tony Hoare's other ideas - CSP, which is easily implemented on VxWorks message queues or pipes. – Fornof 30/9, 2013 at 21:54

@bazza: I have just been ultimately told that the VxWorks installed on this device is 5.5.1 compiled without POSIX support. No luck then, here. – Eulalie 4/10, 2013 at 8:16

I would implement something along the lines of a spin lock doing busy waiting until it enters wait(). I think I saw this pattern before, but don't remember where or what it is called... – Bundy 4/10, 2013 at 20:21

Race conditions can be avoided if each waiting task waits on a separate binary semaphore. These semaphores must be registered in a container which the signaling task uses to unblock all waiting tasks. The container must be protected by a mutex.

The wait_for() method obtains a binary semaphore, waits on it and finally deletes it.

void condition::wait_for(mutex& mutex) {
    SEM_ID sem = semBCreate(SEM_Q_PRIORITY, SEM_EMPTY);
    {
        lock l(listeners_mutex);    // assure exclusive access to listeners container
        listeners.push_back(sem);       
    }                               // l's dtor unlocks listeners_mutex again

    unlocker ul(mutex);             // relinquish mutex
    semTake(sem, WAIT_FOREVER);

    {
        lock l(listeners_mutex);
        // remove sem from listeners
        // ...
        semDelete(sem);
    }
}                                   // ul's dtor grabs mutex again

The signal() method iterates over all registered semaphores and unlocks them.

void condition::signal() {
    lock l(listeners_mutex);
    for_each (listeners.begin(), listeners.end(), /* call semGive()... */ )
}

This approach assures that wait_for() will never miss a signal. A disadvantage is the need of additional system resources. To avoid creating and destroying semaphores for every wait_for() call, a pool could be used.

Andel answered 9/10, 2013 at 14:12 Comment(2)

The issue here is that the dtor is invoked multiple times, once for each wait_for() this all but the first one to leave the wait_for will block on the mutex... Otherwise it is pretty close to my solution, if you consider a list of binary sems to be the rough equivalent of a counting sem, you also have the requirement that the number of listeners is known before the unlocker unlocks. Except in this case it is not hard coded. – Lema 9/10, 2013 at 17:7

@Chris Concerning the block in the dtor I agree. This was copied from the original code sample. Obviously, tasks not supposed to grab the mutex must pass a dummy mutex or call a wait_for without mutex param and unlocker. From what I understood, the number of listeners should be flexible implying that listeners not sharing a mutex with the signaler will miss old signals. Counting semaphores would allow scenarios like: task A increases n and unlocks the mutex, signaler task B preempts A calling n times semGive, task C increases n and calls semTake, finally task A resumes and blocks on semTake. – Andel 10/10, 2013 at 9:35

This should be quite easy with native vxworks, a message queue is what is required here. Your wait_for method can be used as is.

bool condition::wait_for(mutex& mutex) const 
{
    unlocker ul(mutex);    // relinquish mutex
    return wait(event);
}                          // ul's dtor grabs mutex again

but the wait(event) code would look like this:

wait(event)
{
    if (msgQRecv(event->q, sigMsgBuf, sigMsgSize, timeoutTime) == OK)
    {
        // got it...
    }
    else
    {
        // timeout, report error or something like that....
    }
}

and your signal code would like something like this:

signal(event)
{
    msgQSend(event->q, sigMsg, sigMsgSize, NO_WAIT, MSG_PRI_NORMAL);
}

So if the signal gets triggered before you start waiting, then msgQRecv will return immediately with the signal when it eventually gets invoked and you can then take the mutex again in the ul dtor as stated above.

The event->q is a MSG_Q_ID that is created at event creation time with a call to msgQCreate, and the data in sigMsg is defined by you... but can be just a random byte of data, or you can come up with a more intelligent structure with information regarding who signaled or something else that may be nice to know.

Update for multiple waiters, this is a little tricky: So there are a couple of assumptions I will make to simplify things

The number of tasks that will be pending is known at event creation time and is constant.
There will be one task that is always responsible for indicating when it is ok to unlock the mutex, all other tasks just want notification when the event is signaled/complete.

This approach uses a counting semaphore, similar to the above with just a little extra logic:

wait(event)
{
    if (semTake(event->csm, timeoutTime) == OK)
    {
        // got it...
    }
    else
    {
        // timeout, report error or something like that....
    }
}

and your signal code would like something like this:

signal(event)
{
    for (int x = 0; x < event->numberOfWaiters; x++)
    {
        semGive(event->csm);
    }
}

The creation of the event is something like this, remember in this example the number of waiters is constant and known at event creation time. You could make it dynamic, but the key is that every time the event is going to happen the numberOfWaiters must be correct before the unlocker unlocks the mutex.

createEvent(numberOfWaiters)
{
    event->numberOfWaiters = numberOfWaiters;
    event->csv = semCCreate(SEM_Q_FIFO, 0);
    return event;
}

You cannot be wishy-washy about the numberOfWaiters :D I will say it again: The numberOfWaiters must be correct before the unlocker unlocks the mutex. To make it dynamic (if that is a requirement) you could add a setNumWaiters(numOfWaiters) function, and call that in the wait_for function before the unlocker unlocks the mutex, so long as it always sets the number correctly.

Now for the last trick, as stated above the assumption is that one task is responsible for unlocking the mutex, the rest just wait for the signal, which means that one and only one task will call the wait_for() function above, and the rest of the tasks just call the wait(event) function.

With this in mind the numberOfWaiters is computed as follows:

The number of tasks who will call wait()
plus 1 for the task that calls wait_for()

Of course you can also make this more complex if you really need to, but chances are this will work because normally 1 task triggers an event, but many tasks want to know it is complete, and that is what this provides.

But your basic flow is as follows:

init()
{
    event->createEvent(3);
}

eventHandler()
{
    locker l(mutex);
    doEventProcessing();
    signal(event);
}

taskA()
{
    doOperationThatTriggersAnEvent();
    wait_for(mutex);
    eventComplete();
}

taskB()
{
    doWhateverIWant();
    // now I need to know if the event has occurred...
    wait(event);
    coolNowIKnowThatIsDone();
}

taskC()
{
    taskCIsFun();
    wait(event);
    printf("event done!\n");
}

When I write the above I feel like all OO concepts are dead, but hopefully you get the idea, in reality wait and wait_for should take the same parameter, or no parameter but rather be members of the same class that also has all the data they need to know... but none the less that is the overview of how it works.

Lema answered 5/10, 2013 at 12:34 Comment(5)

The code as is will not work with multiple waiters, but it would be trivial to update it to make it work with multiple waiters, probably the easiest thing to do is to switch out the msgQ to a counting sem. I will update the answer. – Lema 9/10, 2013 at 8:48

I'd wonder if you find a way to make counting semaphores do this. We've tried, and came up dry. However, we think we found a solution by combining your message queue idea with a semaphore. I just checked that message queues are available on this platform (they're another feature that has to be compiled in explicitly), and I would now go and try to implement that idea. – Eulalie 9/10, 2013 at 9:52

Indeed I also considered a msgQ to a counting sem, but I don't like the chain of dependencies. This eliminates the middle man msgQ, and just gives direct notification to all waiting tasks. – Lema 9/10, 2013 at 10:35

Both assumptions don't hold. (Imagine a classic producer-consumer scenario with a dynamic number of consumers. We have that.) Sadly, we found out that our idea of combining a semaphore with a message queue wouldn't work either. – Eulalie 9/10, 2013 at 12:18

Indeed, I gave some clues on how to make it more dynamic, it can be done with this basic algorithm IFF the number of consumers is known before the unlocker unlocks the mutex. – Lema 9/10, 2013 at 13:38

The wait_for() method obtains a binary semaphore, waits on it and finally deletes it.

void condition::wait_for(mutex& mutex) {
    SEM_ID sem = semBCreate(SEM_Q_PRIORITY, SEM_EMPTY);
    {
        lock l(listeners_mutex);    // assure exclusive access to listeners container
        listeners.push_back(sem);       
    }                               // l's dtor unlocks listeners_mutex again

    unlocker ul(mutex);             // relinquish mutex
    semTake(sem, WAIT_FOREVER);

    {
        lock l(listeners_mutex);
        // remove sem from listeners
        // ...
        semDelete(sem);
    }
}                                   // ul's dtor grabs mutex again

The signal() method iterates over all registered semaphores and unlocks them.

void condition::signal() {
    lock l(listeners_mutex);
    for_each (listeners.begin(), listeners.end(), /* call semGive()... */ )
}

Andel answered 9/10, 2013 at 14:12 Comment(2)

-1

From the description, it looks like you may want to implement (or use) a semaphore - it's a standard CS algorithm with semantics similar to condvars, and there are tons of textbooks on how to implement them (https://www.google.com/search?q=semaphore+algorithm).

A random Google result which explains semaphores is at: http://www.cs.cornell.edu/courses/cs414/2007sp/lectures/08-bakery.ppt‎ (see slide 32).

Orelu answered 5/10, 2013 at 11:5 Comment(3)

I suggest you go the extra mile and actually read the question. Hint: This is not about how to implement a condition variable, it's about how to implement a certain type of wait function. – Eulalie 5/10, 2013 at 17:36

Hey, if every answer proposed was wrong, you may need to rethink the question. As others noticed, you are trying to implement a condvar, and if you need this specific behaviour without the platform supporting it, you may need to implement it yourself, and that means not using the mutex type your platform supports. Cheers. – Orelu 6/10, 2013 at 9:39

The question clearly says "However, this sports a race condition because it allows another task to preempt this one after the unlocking and before the waiting." If you can't be bothered to even read the question, you can't get the cookie either. HAND. – Eulalie 6/10, 2013 at 23:26

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags