How to stop 'uninterruptible' process on Linux?
Asked Answered
F

6

58

I have a VirtualBox process hanging around which I tried to kill (KILL/ABORT) but without success. The parent pid is 1 (init).

top shows the process as D which is documented as "uninterruptible sleep".

strace shows up nothing.

How can I get rid of this? It prevents me from unloading the VirtualBox kernel driver to load a newer one.

Filagree answered 20/4, 2009 at 9:29 Comment(1)
Which is not yet online - as far as I can see. Thanks for the tip anyway.Entirety
M
50

Simple answer: you cannot.

Longer answer: the uninterruptable sleep means the process will not be woken up by signals. It can be only woken up by what it's waiting for. When I get such situations eg. with CD-ROM, I usually reset the computer by using suspend-to-disk and resuming.

Magically answered 20/4, 2009 at 9:37 Comment(5)
OK, I have uninterruptible sleep process, how I can find what it waiting for? For what process, who really blocked disk IO?Wholehearted
For example it happens, in File Manager (doublecmd), when it waits on unresponsive sshfs mount, and killing sshfs completely, is the only solution, which releases the File Manager process from D state.Whimsey
What is the technical reason why these processes can't be interrupted immediately? What if the kernel were patched to enable these processes to be terminated immediately by force? Is the situation that even the kernel can't possibly stop it, e.g. the CPU core has interrupts disabled? (Though even that could be solved if there's a way to trigger an NMI, e.g. via the APIC.)Azalea
@Azalea It's not a hardware interrupt, but a "software" interrupt, and it's related to how system calls work on Unix. Traditionally, all Unix I/O system calls are synchronous. When a process performs I/O by calling Unix, control is transferred from userspace to kernel. During this period, the userspace part can do nothing until the underlying system call finishes. But if the underlying I/O becomes impossible (e.g. mounted HDD dropped off), now the kernel itself is stuck, and recovering from this state is simply unsupported by Unix. Usually the culprit is a in-kernel device driver.Gorget
@Azalea One solution is to allow the kernel to early-return from a I/O system call, i.e. to make it interruptible. Unfortunately it would break a ton of userspace applications that are never designed to recover from a failed EINTR I/O system call. Since 2008, Linux used a solution called TASK_KILLABLE, which allows SIGKILL to be a special exception to that rule - if we're killing it anyway, application safety is a non-issue. But the relevant kernel code (especially drivers) must be converted to support this state. Otherwise, for legacy kernel code and non-Linux, the problem remains.Gorget
C
30

Killing an uninterruptible process succeeds, it just doesn't do so immediately. The process won't disappear until it actually receives the signal. So sending a signal alone is not enough to get rid of the process, you also have to wake it up from uninterruptible sleep.

Tanel Poder has written a great guide to analyse D state processes. It is very typical that this state is caused by incomplete I/O, e.g. network failure. slm has posted some very useful pointers on superuser how to unjam the network I/O, and also about the problem itself.

Personally, when dealing with Windows on VirtualBox, and even with wine, I often run into this problem because of a cdrom I/O that never completes (I guess its some sort of disc presence check). ATA devices can be reset, which likely will unjam the process. For instance, I'm using the following little script to reset both my optical drives, unjamming the processes they are blocking:

echo 1 > /sys/block/sr0/delete
echo 1 > /sys/block/sr1/delete
echo "- - -" > /sys/class/scsi_host/host7/scan
Coma answered 22/11, 2015 at 14:14 Comment(1)
Had to use /sys/block/srX/device/delete instead of just /sys/block/srX/delete, but this worked a treat!Argon
V
20

The D state basically means that the process is waiting for disk I/O, or other block I/O that can't be interrupted. Sometimes this means the kernel or device is feverishly trying to read a bad block (especially from an optical disk). Sometimes it means there's something else.

The process cannot be killed until it gets out of the D state. Find out what it is waiting for and fix that. The easy way is to reboot. Sometimes removing the disk in question helps, but that can be rather dangerous: unfixable catastrophic hardware failure if you don't know what you're doing (read: smoke coming out).

Vasta answered 20/4, 2009 at 9:35 Comment(3)
I have this problem because I used fusepy and accessed the mount point from inside a FUSE callback itself in single-threaded mode. It's now waiting for itself and I can't kill neither the process itself nor anything trying to read from that mount point ... Do I really have to restart for this?Sophistry
I mean, isn't this a security bug? I could brick any system with this. Simply use make a FUSE mount point and put it into uninterruptible sleep like mentioned and then start ls <mountpoint> in the background until you reach the process limit. Voila, no new processes can be started. I actually already experienced that process limit because I did something like this accidentally: while true; do sleep 1h & doneSophistry
Ok, I could close everything without a restart by using sudo umount -f <mount point>. Also there is a FUSE control system) which also might have worked.Sophistry
V
5

I recently encountered a process in D state on a remote server and would like to clarify that a hard reboot or power cycle is needed to remove the process.

Don't try a soft reboot until you have exhausted all other options. For example, you can try freeing up whatever resource the process is hanging on. A soft reboot might give you a system that is partially shut down and will no longer respond to ssh, but won't reboot because it is hung trying to terminate the uninterruptible process.

Vanpelt answered 29/8, 2013 at 18:35 Comment(0)
H
4

As others have said, an uninterruptable process is a process which is stuck in a kernel function which cannot be interrupted (usually it is waiting for some I/O operation). See this answer for a detailed description.

Apart from restarting the computer, I had success bringing some processes out of the D state by flushing linux VM caches:

kill -9 {process_id}
sync
echo 3 | sudo tee /proc/sys/vm/drop_caches

This did not seem to affect system stability, but I'm not a systems programmer and not sure what unintended consequences this might have.


Edit:

According to the kernel docs, drop_caches appears to be reasonably safe in a development environment.

drop_caches

Writing to this will cause the kernel to drop clean caches, as well as reclaimable slab objects like dentries and inodes. Once dropped, their memory becomes free.

To free pagecache:

echo 1 > /proc/sys/vm/drop_caches

To free reclaimable slab objects (includes dentries and inodes):

echo 2 > /proc/sys/vm/drop_caches

To free slab objects and pagecache:

echo 3 > /proc/sys/vm/drop_caches

This is a non-destructive operation and will not free any dirty objects. To increase the number of objects freed by this operation, the user may run `sync' prior to writing to /proc/sys/vm/drop_caches. This will minimize the number of dirty objects on the system and create more candidates to be dropped.

This file is not a means to control the growth of the various kernel caches (inodes, dentries, pagecache, etc...) These objects are automatically reclaimed by the kernel when memory is needed elsewhere on the system.

Use of this file can cause performance problems. Since it discards cached objects, it may cost a significant amount of I/O and CPU to recreate the dropped objects, especially if they were under heavy use. Because of this, use outside of a testing or debugging environment is not recommended.

You may see informational messages in your kernel log when this file is used:

cat (1234): drop_caches: 3

These are informational only. They do not mean that anything is wrong with your system. To disable them, echo 4 (bit 3) into drop_caches.

Hayman answered 16/8, 2017 at 16:19 Comment(0)
A
-3

new here and not that experienced, but I had the same issue where I could see my processes going into uninterruptible sleep (D state) when I checked their status using htop. For some reason,

kill -9 <pid>

worked for me. Maybe you can try the same.

Edit: the detailed answer is up there by ostrokach (which I didn't see).

Affinity answered 6/2, 2020 at 6:7 Comment(1)
you just got lucky.Hays

© 2022 - 2024 — McMap. All rights reserved.