Using strace fixes hung memory issue

I have a multithreaded process running on RHEL6.x (64bit).

I find that the process hangs and some threads (of the same process) crash most of the time when I try to bring up the process. Some threads wait for shared memory between the threads to get created (I can see that all of it does not get created).

But when I use strace , the process does not hang and it works just fine (all of the memory that is supposed to be created, gets created). Even interrupting strace after the memory gets created, keeps the process running fine for good.

I have read this:

strace fixes hung process

which did give me an idea. But I am still unclear on this as the version of RHEL that they have used is not mentioned.

Also, another point is that, changing the kernel to a fedora (compatible) kernel did not produce the issue.

So, I would just like to know how exactly does strace affect a process ? (or is it just the stack that moves back to the kernel as pointed out in the link) ?

The most likely cause is that strace, which intercepts system calls using the ptrace facility, affects the timing of your application due to the per system call overhead involved.

Consider a scenario with two threads started in parallel where thread A initializes a global variable and thread B accesses that variable without any kind of thread synchronization. Thread B triggers a segmentation fault whenever thread A fails to initialize the variable before thread B accesses it.

If, however, thread B interacts with the OS (by means of system calls) before accessing the variable, strace could introduce a delay that might just be enough to give thread B time to initialize the variable. Conversely, if thread A interacts with the OS before initializing the variable, strace could make the segmentation fault more likely because of the initialization delay.

Similarly, different kernel implementations can affect the timing perceived by user space processes and might cause an application with race conditions to fail even though the kernel is not the root cause of the failure, only the trigger.

Recommended topics

Hot tags