––thursday #4: blockdev
Disk IO is slow. You just won’t believe how vastly, hugely, mind-bogglingly slow it is. I mean, you may think your network is slow, but that’s just peanuts to disk IO.
The image below helps visualize how slow (post continues below).
(Originally found on Hacker News and inspired by Gustavo Duarte’s blog.)
The kernel knows how slow the disk is and tries to be smart about accessing it. It not only reads the data you requested, it also returns a bit more. This way, if you’re reading through a file or watching a movie (sequential access), your system doesn’t have to go to disk as frequently because you’re pulling more data back than you strictly requested each time.
You can see how far the kernel reads ahead using the blockdev tool:
$ sudo blockdev --report RO RA SSZ BSZ StartSec Size Device rw 256 512 4096 0 80026361856 /dev/sda rw 256 512 4096 2048 80025223168 /dev/sda1 rw 256 512 4096 0 2000398934016 /dev/sdb rw 256 512 1024 2048 98566144 /dev/sdb1 rw 256 512 4096 194560 7999586304 /dev/sdb2 rw 256 512 4096 15818752 19999490048 /dev/sdb3 rw 256 512 4096 54880256 1972300152832 /dev/sdb4 |
Readahead is listed in the “RA” column. As you can see, I have two disks (sda and sdb) with readahead set to 256 on each. But what unit is that 256? Bytes? Kilobytes? Dolphins? If we look at the man page for blockdev, it says:
$ man blockdev ... --setra N Set readahead to N 512-byte sectors. ... |
This means that my readahead is 512 bytes*256=131072 or 128KB. That means that, whenever I read from disk, the disk is actually reading at least 128KB of data, even if I only requested a few bytes.
So what value should you set your readahead to? Please don’t set it to a number you find online without understanding the consequences. If you Google for “blockdev setra”, the first result uses blockdev –setra 65536, which translates to 32MB of readahead. That means that, whenever you read from disk, the disk is actually doing 32MB worth of work. Please do not set your readahead this high if you’re doing a lot of random-access reads and writes, as all of the extra IO can slow things down a lot (and if your low on memory, you’ll be forcing the kernel to fill up your RAM with data you won’t need).
Getting a good readahead value can help disk IO issues to some extent, but if you are using MongoDB (in particular), please consider your typical document size and access patterns before changing your blockdev settings. I’m not recommending any particular value because what’s perfect for one application/machine can be death for another.
I’m really enjoying these –thursday posts because every week people have commented with different/better/interesting ways of doing what I talked about (or ways of telling the difference between stalagmites and stalactites), which is really cool. So I’m throwing this out there: how would you figure out what a good readahead setting is? Next week I’m planning to do iostat for –thursday which should cover this a bit, but please leave a comment if you have any ideas.

Subscribe