Using strace fixes hung memory issue
Asked Answered
S

1

8

I have a multithreaded process running on RHEL6.x (64bit).

I find that the process hangs and some threads (of the same process) crash most of the time when I try to bring up the process. Some threads wait for shared memory between the threads to get created (I can see that all of it does not get created).

But when I use strace , the process does not hang and it works just fine (all of the memory that is supposed to be created, gets created). Even interrupting strace after the memory gets created, keeps the process running fine for good.

I have read this:

strace fixes hung process

which did give me an idea. But I am still unclear on this as the version of RHEL that they have used is not mentioned.

Also, another point is that, changing the kernel to a fedora (compatible) kernel did not produce the issue.

So, I would just like to know how exactly does strace affect a process ? (or is it just the stack that moves back to the kernel as pointed out in the link) ?

Senegambia answered 29/1, 2014 at 14:23 Comment(2)
I am experiencing the same issue. A process generally hung on a select() call resumes if I attach it to strace. If I don't attach it to strace, it just hangs there forever.Backchat
As far as I recall, strace forces the kernel to halt on every syscall entry/exit. It could be highly likely that using strace introduces a delay and adds an additional synchronization point that does away with the underlying cause of the hang, be it deadlock/livelock. It would thus be best to investigate potential areas that could cause the design to lock itself instead of delving into the inner workings of straceFellows
P
1

The most likely cause is that strace, which intercepts system calls using the ptrace facility, affects the timing of your application due to the per system call overhead involved.

Consider a scenario with two threads started in parallel where thread A initializes a global variable and thread B accesses that variable without any kind of thread synchronization. Thread B triggers a segmentation fault whenever thread A fails to initialize the variable before thread B accesses it.

If, however, thread B interacts with the OS (by means of system calls) before accessing the variable, strace could introduce a delay that might just be enough to give thread B time to initialize the variable. Conversely, if thread A interacts with the OS before initializing the variable, strace could make the segmentation fault more likely because of the initialization delay.

Similarly, different kernel implementations can affect the timing perceived by user space processes and might cause an application with race conditions to fail even though the kernel is not the root cause of the failure, only the trigger.

Pornocracy answered 22/6, 2023 at 6:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.