Fastest way to read many 300 bytes chunks randomly by file offset from a 2TB file?

Asked 17/1, 2012 at 16:4 Answered 17/1, 2012 at 16:26

I have some 2TB read only (no writing once created) files on a RAID 5 (4 x 7.2k @ 3TB) system.

Now I have some threads that wants to read portions of that file. Every thread has an array of chunks it needs. Every chunk is addressed by file offset (position) and size (mostly about 300 bytes) to read from.

What is the fastest way to read this data. I don't care about CPU cycles, (disk) latency is what counts. So if possible I want take advantage of NCQ of the hard disks.

As the files are highly compressed and will accessed randomly and I know exactly the position, I have no other way to optimize it.

Should I pool the file reading to one thread?
Should I keep the file open?
Should every thread (maybe about 30) keep every file open simultaneously, what is with new threads that are coming (from web server)?
Will it help if I wait 100ms and sort my readings by file offsets (lowest first)?

What is the best way to read the data? Do you have experiences, tips, hints?

Aorangi answered 17/1, 2012 at 16:4 Comment(0)

The optimum number of parallel requests depends highly on factors outside your app (e.g. Disk count=4, NCQ depth=?, driver queue depth=? ...), so you might want to use a system, that can adapt or be adapted. My recommendation is:

Write all your read requests into a queue together with some metadata that allows to notify the requesting thread
have N threads dequeue from that queue, synchronously read the chunk, notify the requesting thread
Make N runtime-changeable
Since CPU is not your concern, your worker threads can calculate a floating latency average (and/or maximum, depending on your needs)
Slide N up and down, until you hit the sweet point

Why sync reads? They have lower latency than ascync reads. Why waste latency on a queue? A good lockless queue implementation starts at less than 10ns latency, much less than two thread switches

Update: Some Q/A

Should the read threads keep the files open? Yes, definitly so.

Would you use a FileStream with FileOptions.RandomAccess? Yes

You write "synchronously read the chunk". Does this mean every single read thread should start reading a chunk from disk as soon as it dequeues an order to read a chunk? Yes, that's what I meant. The queue depth of read requests is managed by the thread count.

Maurizio answered 17/1, 2012 at 16:26 Comment(6)

Should the read threads keep the files open? I think yes. Would you use a FileStream with FileOptions.RandomAccess? You write "synchronously read the chunk". Does this mean every single read thread should start reading a chunk from disk as soon as it dequeues an order to read a chunk? – Aorangi 17/1, 2012 at 16:59

Edited my answer, trying to answer your comment. – Maurizio 17/1, 2012 at 17:12

I forgot, you wrote: "good lockless queue implementation starts at less than 10ns latency". Do you have a special class/project in mind? – Aorangi 18/1, 2012 at 18:1

I do: I wrote one, because the usual suspects out there didn't scale (not even on a simple 8 core system). Guess it's time to opensource it. Will the LGPL be acceptable to you? It means with the LocklessQueue in a separate DLL you can use it in commercial projects. – Maurizio 18/1, 2012 at 18:56

Cool, thank you. Yes, LGPL is ok. I will mess a bit with it and read your comments in sourcecode. But a very small sample code how to use it would be very good for your project. Ok, by reading the source I think it is possible to see it on my own. – Aorangi 18/1, 2012 at 21:41

Look at LocklessBase.cs - there you see the interface with the methods, basically just Enqueue() and two flavours of Dequeue(), the boolean on gives true/false as success (queue was not empty), the direct one gives default(T) on empty queue – Maurizio 19/1, 2012 at 10:3

Disks are "single threaded" because there is only one head. It won't go faster no matter how many threads you use... in fact more threads probably will just slow things down. Just get yourself the list and arrange (sort) it in the app.

You can of course use many threads that'd make use of NCQ probably more efficient, but arranging it in the app and using one thread should work better.

If the file is fragmented - use NCQ and a couple of threads because you then can't know exact position on disk so only NCQ can optimize reads. If it's contignous - use sorting.

You may also try direct I/O to bypass OS caching and read the whole file sequentially... it sometimes can be faster, especially if you have no other load on this array.

Theomorphic answered 17/1, 2012 at 16:12 Comment(3)

-1 partially. If you ahve multiple thjreads, multi requets hit the discs. Good discs (SAS, SATA) allow the disc to reorer them to be more efficient ("Native Command Queueing") and deliver results in anotehr order. This gives you a significant boost compared to normal sync single threaded IO. – Lisp 17/1, 2012 at 16:14

Additionally I have raid 5. So every disk could read from an other position. – Aorangi 17/1, 2012 at 16:25

Additionally the reuqested chunks are quite small, so with RAID5 and most controllers reading a complete stripe at once, chances are, another chunk is already in RAM. – Maurizio 17/1, 2012 at 17:13

Will ReadFileScatter do what you want?

Ankerite answered 17/1, 2012 at 16:15 Comment(0)

Recommended topics

Hot tags