File I/O with streams - best memory buffer size
Asked Answered
A

4

60

I am writing a small I/O library to assist with a larger (hobby) project. A part of this library performs various functions on a file, which is read / written via the FileStream object. On each StreamReader.Read(...) pass,

I fire off an event which will be used in the main app to display progress information. The processing that goes on in the loop is vaired, but is not too time consuming (it could just be a simple file copy, for example, or may involve encryption...).

My main question is: What is the best memory buffer size to use? Thinking about physical disk layouts, I could pick 2k, which would cover a CD sector size and is a nice multiple of a 512 bytes hard disk sector. Higher up the abstraction tree, you could go for a larger buffer which could read an entire FAT cluster at a time. I realise with today's PC's, I could go for a more memory hungry option (a couple of MiB, for example), but then I increase the time between UI updates and the user perceives a less responsive application.

As an aside, I'm eventually hoping to provide a similar interface to files hosted on FTP / HTTP servers (over a local network / fastish DSL). What would be the best memory buffer size for those (again, a "best-case" tradeoff between perceived responsiveness vs. performance)?

Anytime answered 13/6, 2010 at 20:26 Comment(3)
It may be helpful: #19558935Hatti
I'd have thought that the OS or Windows would maintain its own profile of hardware capabilities and speeds and provide a service that recommends the best buffer-size for a given storage volume and activity (e.g. random read/writes vs sequential read/write) - that would take out the guesswork.Quiteria
Possible duplicate of C# FileStream : Optimal buffer size for writing large files?Blue
P
90

Files are already buffered by the file system cache. You just need to pick a buffer size that doesn't force FileStream to make the native Windows ReadFile() API call to fill the buffer too often. Don't go below a kilobyte, more than 16 KB is a waste of memory and unfriendly to the CPU's L1 cache (typically 16 or 32 KB of data).

4 KB is a traditional choice, even though that will exactly span a virtual memory page only ever by accident. It is difficult to profile; you'll end up measuring how long it takes to read a cached file. Which runs at RAM speeds, 5 gigabytes/sec and up if the data is available in the cache. It will be in the cache the second time you run your test, and that won't happen in a production environment too often. File I/O is completely dominated by the disk drive or the NIC and is glacially slow, copying the data is peanuts. 4 KB will work fine.

Perfectionist answered 13/6, 2010 at 22:31 Comment(6)
Low buffer sizes like 4-8kb are also preferable because the CPU cache can hold such amounts. If you go to small you can accumulate significant overhead from kernel-transitions though.Strict
@HansPassant: My application deals with lots of small files together as well as large ones separately. Will a 4KB size adversely affect performance for files smaller than 4KB?Shovelboard
4KB is the default value used by .net framework: msdn.microsoft.com/en-us/library/dd783870.aspxBendigo
If the documentation is correct, in 4.5 they increased the default value to 81920.Plagiary
The documentation is correct, .NET Reflector shows the _DefaultCopyBufferSize has a value of 0x14000 (81920, or 80K). However, this is for copying from stream to stream, not buffering data. The BufferedStream Class has a _DefaultBufferSize of 0x1000 (4096 or 4k), this would be a better class to look at for understanding what buffer size the .NET framework uses for streams.Firebrat
I understand when using FileStream with useAsync: true, for best async performance the buffer size should be at least 1 megabyte in order for the cost of the async overhead to be worthwhile while waiting for overlapped disk IO to complete - I don't remember where I got this detail from, but would you agree with it?Quiteria
H
4

When I deal with files directly through a stream object, I typically use 4096 bytes. It seems to be reasonably effective across multiple I/O areas (local file system, LAN/SMB, network stream, etc.), but I haven't profiled it or anything. Way back when, I saw several examples use that size, and it stuck in my memory. That doesn't mean it's the best though.

Heavy answered 13/6, 2010 at 20:32 Comment(1)
Right. I wouldn't ever use anything less than 4k, since it's the smallest block managed by the virtual memory system (on which the disk cache is based).Mutation
D
3

"It depends".

You would have to test your application with different buffer sizes to determine whis is best. You can't guess ahead of time.

Dermato answered 13/6, 2010 at 20:32 Comment(0)
R
0

I suppose that default value is usually the best - therefore i use 4096B based on internal const int variable DefaultBufferSize in FileStream class.

Ronnaronnholm answered 7/8, 2018 at 10:34 Comment(1)
Default is not always the best. It's just a good compromise for the more common cases, not the optimal for all loads.Cosimo

© 2022 - 2024 — McMap. All rights reserved.