Currently Linux knows four different schedulers:

CFQ

CFQ, also known as "Complete Fair Queuing", is an I/O scheduler for the Linux kernel which was written by Jens Axboe.

CFQ works by placing synchronous requests submitted by processes into a number of per-process queues and then allocating timeslices for each of the queues to access the disk. The length of the time slice and the number of requests a queue is allowed to submit, depends on the IO priority of the given process. Asynchronous requests for all processes are batched together in fewer queues, one per priority. While CFQ does not do explicit anticipatory IO scheduling, it achieves the same effect of having good aggregate throughput for the system as a whole, by allowing a process queue to idle at the end of synchronous IO thereby "anticipating" further close IO from that process. It can be considered a natural extension of granting IO time slices to a process. [23]

NOOP

NOOP scheduler works by placing all requests into a simple, unordered FIFO queue and implements only request merging. It assumes performance of the I/O has been or will be optimized at the block device or with an intelligent HBA or externally attached controller NOOP scheduler can be best used on a device where there is no seeking penalty, such as flash memory.[24]

Anticipatory

Anticipatory scheduling is an algorithm for scheduling hard disk input/output. It seeks to increase the efficiency of disk utilization by "anticipating" synchronous read operations.

"Deceptive idleness" is a situation where a process appears to be finished reading from the disk when it is actually processing data in preparation of the next read operation. This will cause a normal work-conserving I/O scheduler to switch to servicing I/O from an unrelated process. This situation is detrimental to the throughput of synchronous reads, as it degenerates into a seeking workload. [1] Anticipatory scheduling overcomes deceptive idleness by pausing for a short time (a few milliseconds) after a read operation in anticipation of another close-by read requests.[2]

Anticipatory scheduling yields significant improvements in disk utilization for some workloads.[3] In some situations the Apache web server may achieve up to 71% more throughput from using anticipatory scheduling.[4]

The linux anticipatory scheduler may reduce performance on disks using TCQ, high performance disks, and hardware RAID arrays.[5] An anticipatory scheduler (AS) was the default Linux kernel scheduler between 2.6.0 and 2.6.18, by which time it was replaced by the CFQ scheduler.[25]

Deadline

The goal of the Deadline scheduler is to attempt to guarantee a start service time for a request. It does that by imposing a deadline an all I/O operations to prevent resource starvation. It also maintains two deadline queues, in addition to the sorted queues (both read and write). Deadline queues are basically sorted by their deadline (the expiration time), while the sorted queues are sorted by the sector number.

Before serving the next request, the Deadline scheduler decides which queue to use. Read queues are given a higher priority, because processes usually block on read operations. Next, the Deadline scheduler checks if the first request in the deadline queue has expired. Otherwise, the scheduler serves a batch of requests from the sorted queue. In both cases, the scheduler also serves a batch of requests following the chosen request in the sorted queue.

By default, read requests have an expiration time of 500 ms, write requests expire in 5 seconds.

The kernel docs suggest this is the preferred scheduler for database systems, especially if you have TCQ aware disks, or any system with high disk performance. [26]

Bonnie results for Storage Server

This test has been performed on only 15 disks in RAID6 mode. There are two logical partitions involed, the system partition (sda) and the data partition (sdb). The tests were only run on the data partition.

Single thread

These test used a single instance of bonnie with a 32Gbyte setting (twice the system's RAM).

Sequential output

All entries are of the form kB/s and CPU usage in percent.

anticipatory  
char (block) rewrite char (block) rewrite char (block) rewrite
77106 (99) 518653 (58) 208325 (31) 75989 (99) 504647 (56) 155807 (22)
77225 (99) 348817 (43) 144939 (27) 77229 (99) 517279 (58) 155153 (22)

deadline  
char (block) rewrite char (block) rewrite char (block) rewrite
77171 (99) 500043 (57) 215317 (34) 76833 (99) 505958 (57) 154743 (22)
77253 (99) 353386 (44) 144172 (28) 77189 (99) 495193 (56) 158175 (22)

cfq  
char (block) rewrite char (block) rewrite char (block) rewrite
76667 (99) 506670 (57) 212186 (33) 77117 (99) 480250 (54) 158195 (22)
75483 (99) 306848 (38) 146007 (27) 77124 (99) 491633 (54) 155101 (22)

noop  
char (block) rewrite char (block) rewrite char (block) rewrite
77241 (99) 504024 (57) 193054 (30) 67369 (99) 494487 (56) 157277 (22)
76447 (99) 356155 (44) 151145 (29) 76760 (99) 499945 (55) 156119 (22)

Sequential input

All entries are of the form kB/s and CPU usage in percent.

char (block) char (block) char (block) char (block)
anticipatory
69708 (78) 460841 (34) 69450 (95) 464317 (33)
74010 (98) 654764 (51) 68195 (90) 463778 (32)
deadline
68825 (90) 463041 (34) 69260 (90) 463529 (34)
75664 (98) 634247 (48) 69157 (90) 464012 (34)
cfq
69331 (90) 470267 (34) 69394 (89) 461813 (34)
74825 (98) 652817 (51) 68557 (89) 463762 (34)
noop
69553 (90) 460988 (33) 68154 (90) 462037 (33)
75676 (98) 645228 (51) 69071 (90) 462026 (33)

Random seeks

All entries are of the form seeks/s. CPU usage was zero.

anticipatory
309.1 465.8 429.7 398.5
deadline
286.6 464.6 460.2 461.4
cfq
293.4 395.1 397.9 421.2
noop
320.1 474.4 463.1 438.5

Sequential Create

All entries are of the form per second, CPU usage in parenthesis. Reading is not mentioned here because it was too fast (see bonnie manual).

Create (Delete) Create (Delete) Create (Delete) Create (Delete)
anticipatory
24122 (98) 20258 (87) 24150 (97) 19847 (86)
23936 (96) 19665 (75) 24136 (98) 20146 (79)
deadline
24156 (98) 20131 (76) 24095 (97) 19823 (75)
24082 (97) 20001 (80) 24174 (98) 20118 (78)
cfq
24069 (99) 20109 (78) 24257 (99) 20143 (78)
24123 (95) 19729 (77) 24119 (100) 20186 (80)
noop
23760 (100) 19712 (85) 23986 (96) 19727 (75)
24011 (98) 19400 (69) 23792 (97) 19932 (85)

Random Create

All entries are of the form per second, CPU usage in parenthesis. Reading is not mentioned here because it was too fast (see bonnie manual).

Create Delete
anticipatory
24035 (98) 18786 (80)
24222 (99) 18192 (75)
24032 (99) 18079 (79)
24217 (100) 18504 (86)
deadline
24204 (97) 18500 (75)
24056 (99) 18235 (77)
23999 (99) 18526 (74)
24109 (94) 18566 (79)
cfq
24111 (97) 18551 (76)
24192 (95) 18539 (82)
24093 (99) 18181 (84)
24219 (99) 18540 (84)
noop
24118 (98) 18152 (72)
24183 (97) 18546 (72)
24117 (98) 18095 (87)
24106 (98) 18354 (79)
Topic revision: r2 - 23 Jun 2009, JenniferSchenke
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback