Selecting a Linux I/O Scheduler
Asked Answered
I

5

86

I read that it's supposedly possible to change the I/O scheduler for a particular device on a running kernel by writing to /sys/block/[disk]/queue/scheduler. For example I can see on my system:

anon@anon:~$ cat /sys/block/sda/queue/scheduler 
noop anticipatory deadline [cfq] 

that the default is the completely fair queuing scheduler. What I'm wondering is if there is any use in including all four schedulers in my custom kernel. It would seem that there's not much point in having more than one scheduler compiled in unless the kernel is smart enough to select the correct scheduler for the correct hardware, specifically the 'noop' scheduler for flash based drives and one of the others for a traditional hard drive.

Is this the case?

Indicative answered 17/6, 2009 at 21:22 Comment(0)
O
111

As documented in /usr/src/linux/Documentation/block/switching-sched.txt, the I/O scheduler on any particular block device can be changed at runtime. There may be some latency as the previous scheduler's requests are all flushed before bringing the new scheduler into use, but it can be changed without problems even while the device is under heavy use.

# cat /sys/block/hda/queue/scheduler
noop deadline [cfq]
# echo anticipatory > /sys/block/hda/queue/scheduler
# cat /sys/block/hda/queue/scheduler
noop [deadline] cfq

Ideally, there would be a single scheduler to satisfy all needs. It doesn't seem to exist yet. The kernel often doesn't have enough knowledge to choose the best scheduler for your workload:

  • noop is often the best choice for memory-backed block devices (e.g. ramdisks) and other non-rotational media (flash) where trying to reschedule I/O is a waste of resources
  • deadline is a lightweight scheduler which tries to put a hard limit on latency
  • cfq tries to maintain system-wide fairness of I/O bandwidth

The default was anticipatory for a long time, and it received a lot of tuning, but was removed in 2.6.33 (early 2010). cfq became the default some while ago, as its performance is reasonable and fairness is a good goal for multi-user systems (and even single-user desktops). For some scenarios -- databases are often used as examples, as they tend to already have their own peculiar scheduling and access patterns, and are often the most important service (so who cares about fairness?) -- anticipatory has a long history of being tunable for best performance on these workloads, and deadline very quickly passes all requests through to the underlying device.

Ojeda answered 18/6, 2009 at 2:59 Comment(7)
Great info, thanks! But my basic question still is unanswered, if I plug in a flash drive or my netbook runs off a flash disk as it's main drive is the kernel smart enough to pick noop instead of the default cfq? Or is it completely up to me to do it manually?Indicative
You can configure the kernel to use a different scheduler by default. It would be clever to automatically use noop on non-rotational media, but the kernel doesn't have that functionality. It kind of does have detection of non-rotational media, but it's not reliable as some disks misreport themselves, and it's not yet wired up to the I/O scheduler code anyhow.Ojeda
You can add udev rules to define the scheduler based on device characteristics, as in the debian wiki (wiki.debian.org/SSDOptimization#Low-Latency_IO-Scheduler) # set deadline scheduler for non-rotating disks ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="deadline"Michaud
@Michaud You should expand that and add it as an answer.Indicative
Is there a way to change it for all drives at once at runtime? Likewise setting default scheduler by kernel command line param "elevator". Thanks.Blare
is it possible change I/O scheduler per partition. Let' says that sda1 has ext4 but sda2 has xfs. XFS FAQ says that cfq it's not good for that filesystemGodderd
The git.kernel.org/?p=linux/kernel/git/torvalds/… link is broken.Anaphrodisiac
M
22

It's possible to use a udev rule to let the system decide on the scheduler based on some characteristics of the hw.
An example udev rule for SSDs and other non-rotational drives might look like

# set noop scheduler for non-rotating disks
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="noop"

inside a new udev rules file (e.g., /etc/udev/rules.d/60-ssd-scheduler.rules). This answer is based on the debian wiki

To check whether ssd disks would use the rule, it's possible to check for the trigger attribute in advance:

for f in /sys/block/sd?/queue/rotational; do printf "$f "; cat $f; done
Michaud answered 3/2, 2016 at 10:20 Comment(1)
Great answer on automating detection of non-rotational media and applying IO scheduler only to those. Deadline is recommended not only for non-spinning media. Oracle recommends deadline io scheduler for database workloads. This Oracle's recommendation probably comes from the fact that deadline may handle better synchronous writes than other IO schedulers. Look for example for /sys/block/sdX/queue/iosched/writes_starved "deadline" scheduler tunable (there is no such tunable for reads). Databases may have bad performance if its synchronous redo writes are not coming through quickly.Shortridge
D
7

The aim of having the kernel support different ones is that you can try them out without a reboot; you can then run test workloads through the sytsem, measure performance, and then make that the standard one for your app.

On modern server-grade hardware, only the noop one appears to be at all useful. The others seem slower in my tests.

Dyer answered 17/6, 2009 at 21:24 Comment(6)
How do you actually change it at runtime?Indicative
noop's performance relative to the other schedulers very much depends on the hardware and the particular load. Out of curiosity, what disks, controllers, and tests were you running?Ojeda
Yeah, noop is good when you have smart RAID controllers and other stuff where it knows more than the kernel about the best access patterns. Deadline isn't bad either.Copestone
This is purely a learning exercise for me in which I'm trying to configure the smallest and fastest booting kernel possible that provides all the functionality I need on my laptop. I've looked in both "Linux Kernel Development" and "Essential Linux Device Drivers" and haven't found a satisfactory answer to this question, how smart is the kernel at picking a Scheduler at runtime or does it just always use the default unless you manually set it to something else?Indicative
ephemient > that was on DELL PERC controllers, also on DELL Powervault MD3000. It seemed better than the default (CFQ) on both.Dyer
Ah, so real server-class hardware. Yeah, I can imagine that noop can perform better than cfq, but deadline ought to be pretty good as well...Ojeda
E
0

You can set this at boot by adding the "elevator" parameter to the kernel cmdline (such as in grub.cfg)

Example:

elevator=deadline

This will make "deadline" the default I/O scheduler for all block devices.

If you'd like to query or change the scheduler after the system has booted, or would like to use a different scheduler for a specific block device, I recommend installing and use the tool ioschedset to make this easy.

https://github.com/kata198/ioschedset

If you're on Archlinux it's available in aur:

https://aur.archlinux.org/packages/ioschedset

Some example usage:

# Get i/o scheduler for all block devices
[username@hostname ~]$ io-get-sched
sda:    bfq
sr0:    bfq

# Query available I/O schedulers
[username@hostname ~]$ io-set-sched --list
mq-deadline kyber bfq none

# Set sda to use "kyber"
[username@hostname ~]$ io-set-sched kyber /dev/sda
Must be root to set IO Scheduler. Rerunning under sudo...

[sudo] password for username:
+ Successfully set sda to 'kyber'!

# Get i/o scheduler for all block devices to assert change
[username@hostname ~]$ io-get-sched
sda:    kyber
sr0:    bfq

# Set all block devices to use 'deadline' i/o scheduler
[username@hostname ~]$ io-set-sched deadline
Must be root to set IO Scheduler. Rerunning under sudo...

+ Successfully set sda to 'deadline'!
+ Successfully set sr0 to 'deadline'!

# Get the current block scheduler just for sda
[username@hostname ~]$ io-get-sched sda
sda:    mq-deadline

Usage should be self-explanatory. The tools are standalone and only require bash.

Hope this helps!

EDIT: Disclaimer, these are scripts I wrote.

Embitter answered 11/12, 2018 at 21:27 Comment(0)
M
-2

The Linux Kernel does not automatically change the IO Scheduler at run-time. By this I mean, the Linux kernel, as of today, is not able to automatically choose an "optimal" scheduler depending on the type of secondary storage devise. During start-up, or during run-time, it is possible to change the IO scheduler manually.

The default scheduler is chosen at start-up based on the contents in the file located at /linux-2.6 /block/Kconfig.iosched. However, it is possible to change the IO scheduler during run-time by echoing a valid scheduler name into the file located at /sys/block/[DEV]/queue/scheduler. For example, echo deadline > /sys/block/hda/queue/scheduler

Marya answered 13/11, 2012 at 21:44 Comment(1)
I don't get why this answer deserves so many downvotes. It isn't actually incorrect.Felucca

© 2022 - 2024 — McMap. All rights reserved.