Optimum file buffer read size?
Asked Answered
H

2

20

I am writing an application which needs to read fairly large files. I have always wondered what's the optimum size for the read buffer on a modern Windows XP computer. I googled and found many examples which had 1024 as the optimum size.

Here is a snippet of what I mean:

long pointer = 0;
buffer = new byte[1024]; // What's a good size here ?
while (pointer < input.Length)
{
    pointer += input.Read(buffer, 0, buffer.Length);
}

My application is fairly simple, so I am not looking to write any benchmarking code, but would like to know what sizes are common?

Harsho answered 11/10, 2009 at 23:44 Comment(1)
It may be helpful: #19558935Hush
T
9

A 1k buffer size seems a bit small. Generally, there is no "one size fits all" buffer size. You need to set a buffer size that fits the behavior of your algorithm. Now, generally, its not a good idea to have a really huge buffer, but, having one that is too small or not in line with how you process each chunk is not that great either.

If you are simply reading data one chunk after another entirely into memory before processing it, I would use a larger buffer. I would probably use 8k or 16k, but probably not larger.

On the other hand, if you are processing the data in streaming fashion, reading a chunk then processing it before reading the next, smaller buffers might be more useful. Even better, if you are streaming data that has structure, I would change the amount of data read to specifically match the type of data you are reading. For example, if you are reading binary data that contains a 4-character code, a float, and a string, I would read the 4-character code into a 4-byte array, as well as the float. I would read the length of the string, then create a buffer to read the whole chunk of string data at once.

If you are doing streaming data processing, I would look into the BinaryReader and BinaryWriter classes. These allow you to work with binary data very easily, without having to worry much about the data itself. It also allows you to decouple your buffer sized from the actual data you are working with. You could set a 16k buffer on the underlying stream, and read individual data values with the BinaryReader with ease.

Tripody answered 12/10, 2009 at 0:3 Comment(2)
Thanks for the suggestion of using a BinaryReader. Using the BinaryReader helps when reading strings since i don't need to write plumbing code to write the length. I will test out 8K and 16K reads to see if performance improves. Personally, i don't care what the size is, but some of the QA guys want to see if we can improve performance by utilizing the hardware and operating system better.Harsho
You might try a larger buffer if you are simply streaming a large amount of data into memory. As long as you keep the buffer size a multiple of the disk cluster size, you should be optimal. To be honest, I think I still have a lot of my old late '90's and early 2000's practices still deeply ingrained. If the systems you are running this program on are modern and high performance, buffers of 32k, 64k, even larger could be helpful. If you go too large (say 1mb), you might see diminishing returns as other factors kick in (i.e. swap thrashing). The key is matching reads to the low-level behavior.Tripody
L
4

Depends on where you draw the line between access time and memory usage. The larger the buffer, the faster - but the more expensive in terms of memory. reading in multiples of your File system cluster size is probably the most efficient, in a Windows XP system using NTFS, 4K is the default cluster size.

You can see this link Default cluster size for NTFS, FAT, and exFAT

Bye.

Limpkin answered 11/10, 2009 at 23:58 Comment(3)
I will try 8K and 16K reads which was suggested by @jrista. Its interesting that the article says windows uses 8k clusters for 16 TB disks partitions. I haven't seen a partition that large before.Harsho
Andrew, 8K and 16K are muliples of 4KLimpkin
Older hard drives read and write entire 512-byte sectors at a time. Modern hard drives read and write entire 4096-byte sectors at a time. Windows NTFS has a (default) cluster size of 4096 bytes at at time. Using Event Tracing for Windows you can see that Windows most commonly does actual hard-drive I/O for 16,384 bytes, along with 4,096 bytes (and to a lesser degree 8192 and 49152 bytes). Ideally keep to a multiple of 4k, or 16384 bytes.Swagman

© 2022 - 2024 — McMap. All rights reserved.