C Disk I/O - write after read at the same offset of a file will make read throughput very low
Asked Answered
P

1

8

Background:

I'm developing a database related program, and I need to flush dirty metadata from memory to disk sequentially. /dev/sda1 is volumn format, so data on /dev/sda1 will be accessed block by block and the blocks are adjacent physically if accessed sequentially. And I use direct I/O, so the I/O will bypass the caching mechanism of the file system and access directly the blocks on the disk.

Problems:

After opening /dev/sda1, I'll read one block, update the block and write the block back to the same offset from the beginning of /dev/sda1, iteratively.

The code are like below -

//block_size = 256KB
int file = open("/dev/sda1", O_RDWR|O_LARGEFILE|O_DIRECT);
for(int i=0; i<N; i++) {
    pread(file, buffer, block_size, i*block_size);
    // Update the buffer
    pwrite(file, buffer, block_size, i*block_size);
}

I found that if I don't do pwrite, read throughput is 125 MB/s.

If I do pwrite, read throughput will be 21 MB/s, and write throughput is 169 MB/s.

If I do pread after pwrite, write throughput is 115 MB/s, and read throughput is 208 MB/s.

I also tried read()/write() and aio_read()/aio_write(), but the problem remains. I don't know why write after read at the same position of a file will make the read throughput so low.

If accessing more blocks at a time, like this

pread(file, buffer, num_blocks * block_size, i*block_size);

The problem will mitigate, please see the chart.

Pentagram answered 23/9, 2015 at 9:30 Comment(2)
What's your block size? There's a good chance you're seeing the effects of hardware caching and read-ahead on the disk(s) you're accessing. The pwrite() fills the cache, and if the next pread() is for different data, none of it is cached. Doing the pread() after the pwrite() allows data to be read directly from the disk's hardware cache.Woe
I don't know the physical block size, and I set to 256KB in the program. Thanks for your comment, now I think it's very likely caused by disk's buffer.Pentagram
G
4

And I use direct I/O, so the I/O will bypass the caching mechanism of the file system and access directly the blocks on the disk.

If you don't have file system on the device and directly using the device to read/write, then there is no file system cache comes into the picture.

The behavior you observed is typical of disk access and IO behavior.

I found that if I don't do pwrite, read throughput is 125 MB/s

Reason: The disk just reads data, it doesn't have to go back to the offset and write data, 1 less operation.

If I do pwrite, read throughput will be 21 MB/s, and write throughput is 169 MB/s.

Reason: Your disk might have better write speed, probably disk buffer is caching write rather than directly hitting the media.

If I do pread after pwrite, write throughput is 115 MB/s, and read throughput is 208 MB/s.

Reason: Most likely data written is being cached at disk level and so read gets data from cache instead of media.

To get optimal performance, you should use asynchronous IOs and number of blocks at a time. However, you have to use reasonable number of blocks and can't use very large number. Should find out what is optimal by trial and error.

Gourmet answered 23/9, 2015 at 11:1 Comment(2)
Thanks for your answer, now I think it's very likely caused by disk's buffer. But I still can't imagine just seek to the previous position will let read throughput drop from 125 MB/s to 21 MB/s...Pentagram
@leo, Yes, seeks are expensive. Look at IO wait times which would increase when throughput decreases.Gourmet

© 2022 - 2024 — McMap. All rights reserved.