I have large files, containing a small number of large datasets. Each dataset contains a name and the dataset size in bytes, allowing to skip it and go to the next dataset.
I want to build an index of dataset names very quickly. An example of file is about 21MB large, and contains 88 datasets. Reading the 88 names quickly by using a std::ifstream
and seekg()
to skip between datasets takes about 1300ms, which I would like to reduce.
So in fact, I'm reading 88 chunks of about 30 bytes, at given positions in a 21MB file, and it takes 1300ms.
Is there a way to improve this, or is it an OS and filesystem limitation? I'm running the test under Windows 7 64bit.
I know that having a complete index at the beginning of the file would be better, but the file format does not have this, and we can't change it.