I have some 2TB read only (no writing once created) files on a RAID 5 (4 x 7.2k @ 3TB) system.
Now I have some threads that wants to read portions of that file. Every thread has an array of chunks it needs. Every chunk is addressed by file offset (position) and size (mostly about 300 bytes) to read from.
What is the fastest way to read this data. I don't care about CPU cycles, (disk) latency is what counts. So if possible I want take advantage of NCQ of the hard disks.
As the files are highly compressed and will accessed randomly and I know exactly the position, I have no other way to optimize it.
- Should I pool the file reading to one thread?
- Should I keep the file open?
- Should every thread (maybe about 30) keep every file open simultaneously, what is with new threads that are coming (from web server)?
- Will it help if I wait 100ms and sort my readings by file offsets (lowest first)?
What is the best way to read the data? Do you have experiences, tips, hints?