Disk IO Issue with circular file writes
Asked Answered
M

1

7

In my software I have 4x 500GB files which I write to sequentially in a circular fashion using boosts memory mapped file APIs.

I allocate regions in 32MB blocks, and when allocating a block at the end I create two memory mapped regions where the first is the end of the file and the second is at the start of the file and mapped to the end address of the first region.

Now this works just fine with smaller files. However, with big files when getting to the end region the disk performance goes to the floor and I'm not sure how to avoid it.

What I'm guessing is happening is that the disk tries to write to both ends of the files and the spindle has to jump back and forth. Which is a rather silly thing to do, especially when doing sequential writes, and I would have hoped that the OS would be a bit smarter.

Does anyone have any ideas on how to avoid this issue?

I was thinking of upgrading to Windows 10 and hope it does a better job. But it is a rather risky change that I would like to avoid right now.

I should also note that the files lives on a software RAID 1 with 2x 3TB Seagate Constallation Enterprise drives. These drive have minimum sequential write speed of 60MB/s and avarage of 120MB/s, and I am writing in total with all files at a speed of 30 MB/s.

The code can be found here.

EDIT:

So it turns out, after writing to the entire file and then starting over from the start the OS actually starts reading back what's on the disk even though it is not needed which what I believe is causing the issues.

Minesweeper answered 22/7, 2015 at 9:16 Comment(9)
how do you measure the disk performance and what is the border size of files after which you experience the problem?Tome
I am currently testing with smaller and smaller files. Though it takes about a day before it reaches the end. I will update as I get more results.Minesweeper
I measure it by the write buffer, I have 4x hot sources that sends data in 4 x 7.5 MB/s and every input packet is buffered. If the buffer starts growing it means that the file is not written fast enough and when it reaches 4GB it starts dropping packets, which is what is currently happening after it reaches the region in question.Minesweeper
If this is a sequential write and you're not accessing the data after it has been written, why bother with mapping at all?Tome
@AlexanderBalabin: Because I'm doing cross process communication where I need to perform atomic writes/reads to sections of the files.Minesweeper
How about a set of rolling small files instead of one circular one? You can still map them individually which will involve more pointer math but you'll never have to map existing data back in just to overwrite it.Tome
@AlexanderBalabin: That might work. Though it is a rather big re-write.Minesweeper
I reckon that might be contained behind the api you already have, and really it looks like the only feasible option.Tome
@AlexanderBalabin: Yea, it can be hidden behind the api, basically I would replace each region with an actual file.Minesweeper
S
1

"These drive have minimum sequential write speed of 60MB/s" - which is irrelevant because you're not doing sequential writes.

Use SSD caching, or rethink the design (find a way to prevent access across the buffer wraparound).


Not related to the spee: you could just use a circular buffer directly mapped to the file, so you don't have to use (proprietary?) tricks to map "consecutive" address regions. The rough idea: boost::circular_buffer equivalent for files?

Salsala answered 22/7, 2015 at 13:10 Comment(7)
Well, the way I'm writing in 32MB blocks is pretty much the same as sequential, the seek time on the drives is ~50ms which means there isn't even a 0.01% theoretic overhead.Minesweeper
I can't use boost::circular_buffer for several reason, amongst other is that I cannot map the entire file in one go. I've tried that and the machine runs out of memory and crashes.Minesweeper
Why not? Is the buffer not fixed in size? Are you in a 16bit address space?Salsala
SSDs have way to unreliable write performance and I'm not sure how "prevent access across the buffer wraparound" is relevant? Whether it wraps around or not I would still have blocks in the beginning and end cause the head to jump back and forth.Minesweeper
I'm in 64 bit address space I'm not sure why it the OS does what it does, but If I map the entire thing I after a little while I get std::bad_alloc. Either way, even if that did work, it will not work for what I'm using it for.Minesweeper
Well you comprehensively beat me down there. I'm not even going to ask for arguments anymore. I don't know how I can try to help further. Good luckSalsala
The problem seems to have to do with that once everything has been written to the file the first round it starts reading back the old pages from the file before overwriting with new data. Not sure whether it is possible to create a "write only" view where nothing is read back.Minesweeper

© 2022 - 2024 — McMap. All rights reserved.