Boost interprocess mutexes and checking for abandonment
Asked Answered
C

5

8

I have a need for interprocess synchronization around a piece of hardware. Because this code will need to work on Windows and Linux, I'm wrapping with Boost Interprocess mutexes. Everything works well accept my method for checking abandonment of the mutex. There is the potential that this can happen and so I must prepare for it.

I've abandoned the mutex in my testing and, sure enough, when I use scoped_lock to lock the mutex, the process blocks indefinitely. I figured the way around this is by using the timeout mechanism on scoped_lock (since much time spent Googling for methods to account for this don't really show much, boost doesn't do much around this because of portability reasons).

Without further ado, here's what I have:

#include <boost/interprocess/sync/named_recursive_mutex.hpp>
#include <boost/interprocess/sync/scoped_lock.hpp>

typedef boost::interprocess::named_recursive_mutex MyMutex;
typedef boost::interprocess::scoped_lock<MyMutex> ScopedLock;

MyMutex* pGate = new MyMutex(boost::interprocess::open_or_create, "MutexName");

{
    // ScopedLock lock(*pGate); // this blocks indefinitely
    boost::posix_time::ptime timeout(boost::posix_time::microsec_clock::local_time() + boost::posix_time::seconds(10));
    ScopedLock lock(*pGate, timeout); // a 10 second timeout that returns immediately if the mutex is abandoned ?????
    if(!lock.owns()) {
        delete pGate;
        boost::interprocess::named_recursive_mutex::remove("MutexName");
        pGate = new MyMutex(boost::interprocess::open_or_create, "MutexName");
    }
}

That, at least, is the idea. Three interesting points:

  • When I don't use the timeout object, and the mutex is abandoned, the ScopedLock ctor blocks indefinitely. That's expected.
  • When I do use the timeout, and the mutex is abandoned, the ScopedLock ctor returns immediately and tells me that it doesn't own the mutex. Ok, perhaps that's normal, but why isn't it waiting for the 10 seconds I'm telling it too?
  • When the mutex isn't abandoned, and I use the timeout, the ScopedLock ctor still returns immediately, telling me that it couldn't lock, or take ownership, of the mutex and I go through the motions of removing the mutex and remaking it. This is not at all what I want.

So, what am I missing on using these objects? Perhaps it's staring me in the face, but I can't see it and so I'm asking for help.

I should also mention that, because of how this hardware works, if the process cannot gain ownership of the mutex within 10 seconds, the mutex is abandoned. In fact, I could probably wait as little as 50 or 60 milliseconds, but 10 seconds is a nice "round" number of generosity.

I'm compiling on Windows 7 using Visual Studio 2010.

Thanks, Andy

Caw answered 2/4, 2013 at 19:33 Comment(6)
Unrelated to your question, but the reinterpret_cast<MyMutex*>s in your example are not needed (not sure why they are there).Respirator
@GaborMarton I think I am making correct use of the remove() function. Check the code again. I'm deleting the pointer, which I agree doesn't remove the mutex, but then I do call boost::interprocess::named_recursive_mutex::remove("MyMutex"). If I have incorrectly made use of it, please correct me. ThanksCaw
Yes, you are right, just removed my comment, and updated my answer. Hope that helps.Km
Check this out. It is an evolution of original robust mutex emulation.Erring
I may be wrong but in my experience when an interprocess lock gets abandoned your only hope is to shut everything down and exit. Hopefully you can recover any lost state on restart. In other words, don't bother trying to continue running. Your shared state is probably completely wrong.Respirable
Bit late, but for others going down this path you need to use universal_time() and not local_time(). Lock timeouts are absolute UTC timepoints.Helve
K
5

When I don't use the timeout object, and the mutex is abandoned, the ScopedLock ctor blocks indefinitely. That's expected

The best solution for your problem would be if boost had support for robust mutexes. However Boost currently does not support robust mutexes. There is only a plan to emulate robust mutexes, because only linux has native support on that. The emulation is still just planned by Ion Gaztanaga, the library author. Check this link about a possible hacking of rubust mutexes into the boost libs: http://boost.2283326.n4.nabble.com/boost-interprocess-gt-1-45-robust-mutexes-td3416151.html

Meanwhile you might try to use atomic variables in a shared segment.

Also take a look at this stackoverflow entry: How do I take ownership of an abandoned boost::interprocess::interprocess_mutex?

When I do use the timeout, and the mutex is abandoned, the ScopedLock ctor returns immediately and tells me that it doesn't own the mutex. Ok, perhaps that's normal, but why isn't it waiting for the 10 seconds I'm telling it too?

This is very strange, you should not get this behavior. However: The timed lock is possibly implemented in terms of the try lock. Check this documentation: http://www.boost.org/doc/libs/1_53_0/doc/html/boost/interprocess/scoped_lock.html#idp57421760-bb This means, the implementation of the timed lock might throw an exception internally and then returns false.

inline bool windows_mutex::timed_lock(const boost::posix_time::ptime &abs_time)
{
   sync_handles &handles =
      windows_intermodule_singleton<sync_handles>::get();
   //This can throw
   winapi_mutex_functions mut(handles.obtain_mutex(this->id_));
   return mut.timed_lock(abs_time);
}

Possibly, the handle cannot be obtained, because the mutex is abandoned.

When the mutex isn't abandoned, and I use the timeout, the ScopedLock ctor still returns immediately, telling me that it couldn't lock, or take ownership, of the mutex and I go through the motions of removing the mutex and remaking it. This is not at all what I want.

I am not sure about this one, but I think the named mutex is implemented by using a shared memory. If you are using Linux, check for the file /dev/shm/MutexName. In Linux, a file descriptor remains valid until that is not closed, no matter if you have removed the file itself by e.g. boost::interprocess::named_recursive_mutex::remove.

Km answered 2/4, 2013 at 20:7 Comment(1)
Sorry for the terrible delay in marking this as the answer. Basically, because of portability in Boost, we've decided to live with this "problem" for the time being.Caw
P
5

Check out the BOOST_INTERPROCESS_ENABLE_TIMEOUT_WHEN_LOCKING and BOOST_INTERPROCESS_TIMEOUT_WHEN_LOCKING_DURATION_MS compile flags. Define the first symbol in your code to force the interprocess mutexes to time out and the second symbol to define the timeout duration.

I helped to get them added to the library to solve the abandoned mutex issue. It was necessary to add it due to many interprocess constructs (like message_queue) that rely on the simple mutex rather than the timed mutex. There may be a more robust solution in the future, but this solution has worked just fine for my interprocess needs.

I'm sorry I can't help you with your code at the moment; something is not working correctly there.

Palisade answered 9/7, 2013 at 7:12 Comment(0)
S
1

BOOST_INTERPROCESS_ENABLE_TIMEOUT_WHEN_LOCKING is not so good. It throws an exception and does not help much. To workaround exceptional behaviour I wrote this macro. It works just alright for common purposed. In this sample named_mutex is used. The macro creates a scoped lock with a timeout, and if the lock cannot be acquired for EXCEPTIONAL reasons, it will unlock it afterwards. This way the program can lock it again later and does not freeze or crash immediately.

#define TIMEOUT 1000
#define SAFELOCK(pMutex) \
    boost::posix_time::ptime wait_time \
        = boost::posix_time::microsec_clock::universal_time() \
        + boost::posix_time::milliseconds(TIMEOUT); \
    boost::interprocess::scoped_lock<boost::interprocess::named_mutex> lock(*pMutex, wait_time); \
    if(!lock.owns()) { \
        pMutex->unlock(); }

But even this is not optimal, because the code to be locked now runs unlocked once. This may cause problems. You can easily extend the macro however. E.g. run code only if lock.owns() is true.

Spondee answered 11/1, 2014 at 18:48 Comment(0)
F
1

boost::interprocess::named_mutex has 3 defination: on windows, you can use macro to use windows mutex instead of boost mutex, you can try catch the abandoned exception, and you should unlock it!

on linux, the boost has pthread_mutex, but it not robust attribute in 1_65_1version

so I implemented interprocess_mutex myself use system API(windows Mutex and linux pthread_mutex process shared mode), but windows Mutex is in the kernel instead of file.

Frydman answered 5/12, 2018 at 8:28 Comment(1)
which macro specifically?Moorland
H
0

Craig Graham answered this in a reply already but I thought I'd elaborate because I found this, didn't read his message, and beat my head against it to figure it out.

On a POSIX system, timed lock calls:

timespec ts = ptime_to_timespec(abs_time);
pthread_mutex_timedlock(&m_mut, &ts)

Where abs_time is the ptime that the user passes into interprocess timed_lock.

The problem is, that abs_time must be in UTC, not system time. Assume that you want to wait for 10 seconds; if you're ahead of UTC your timed_lock() will return immediately, and if you're behind UTC, your timed_lock() will return in hours_behind - 10 seconds.

The following ptime times out an interprocess mutex in 10 seconds:

boost::posix_time::ptime now = boost::posix_time::second_clock::universal_time() +
                               boost::posix_time::seconds(10);

If I use ::local_time() instead of ::universal_time(), since I'm ahead of UTC, it returns immediately. The documentation fails to mention this.

I haven't tried it, but digging into the code a bit, it looks like the same problem would occur on a non-POSIX system.

If BOOST_INTERPROCESS_POSIX_TIMEOUTS is not defined, the function ipcdetail::try_based_timed_lock(*this, abs_time) is called. It uses universal time as well, waiting on while(microsec_clock::universal_time() < abs_time).

This is only speculation, as I don't have quick access to a Windows system to test this on.

For full details, see https://www.boost.org/doc/libs/1_76_0/boost/interprocess/sync/detail/common_algorithms.hpp

Hygrograph answered 25/5, 2021 at 23:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.