TL;DR
How much memory does opening a file take up on a modern Windows system? Some application loads will need to open "a lot" of files. Windows is very capable of opening "a lot" of files, but what is the load of keeping a single file open, so that one can decide when "a lot" is "too much"?
Background
For sequential processing of large-ish datasets (100s MB ~ few GB) inside a 32bit process, we need to come up with a buffer that stores its contents on disk instead of in memory.
We have fleshed out a little class without too much problem (using CreateFile
with FILE_ATTRIBUTE_TEMPORARY
and FILE_FLAG_DELETE_ON_CLOSE
).
The problem is, the way these buffers will be used is such that each buffer (each temporary file) can potentially store from a few bytes up to a few GB of data, and we would like to keep the buffer class itself as minimal and as general as possible.
The use case ranges from 100 buffers with ~ 100MB each to 100.000s of buffers with just a few bytes each. (And yes, it is important that each buffer in this sense has it's own file.)
It would seem natural to include a buffer threshold in the buffer class that only starts creating and using a temporary on-disk file when it is actually storing more bytes than the (memory) overhead of creating+referencing a temporary file uses - in process as well as load on physical machine memory.
Question
How much memory, in bytes, does opening a (temporary) file take up on a modern Windows system?
- Using
CreateFile
withFILE_ATTRIBUTE_TEMPORARY
andFILE_FLAG_DELETE_ON_CLOSE
- Bytes of the virtual address space of the (32 bit) process opening the file
- Bytes of the physical memory on the machine (including any kernel datastructures)
That is, what is the threshold, in bytes, when you start seeing a net main memory gain (both in-process as well as physically) from storing data in a file instead of in-memory?
Notes:
The comment mentioned open file limit is not applicable to CreateFile
, only to the MS CRT file API. (Opening 10.00s of files via CreateFile is no problem at all on my system -- whether it's a good idea is an entirely different matter and not part of this question.
Memory mapped files: Are totally unsuitable to process GB of data in a 32 bit process because you cannot reliably map such large datasets in to the normal 2GB address range of a 32 bit process. Are totally useless for my problem and do not, in any way at all, relate to the actual question. Plain files are just fine for the background problem.
Looked at http://blogs.technet.com/b/markrussinovich/archive/2009/09/29/3283844.aspx - which tells me that a HANDLE
itself takes up 16 bytes on a 64 bit system, but that's just the handle.
Looked at STXXL and it's docs, but neither is this lib appropriate for my task nor did I find any mention of a useful threshold before starting to actually use files.
Useful comments summary:
Raymond writes: "The answer will vary depending on what antivirus software is installed, so the only way to know is to test it on the production configuration."
qwm writes: "I would care more about cpu overhead. Anyway, the best way to answer your question is to test it. All I can say is that size of _FILE_OBJECT
alone (including _OBJECT_HEADER
) is ~300b, and some of its fields are pointers to other related structures."
Damon writes: "One correct answer is: 10 bytes (on my Windows 7 machine). Since nobody else seemed it worthwhile to actually try, I did (measured difference in MEMORYSTATUSEX::ullAvailVirtual
over 100k calls, nothing else running). Don't ask me why it isn't 8 or 16 bytes, I wouldn't know. Took around 17 seconds of kernel time, process had 100,030 handles open upon exiting. Private working set goes up by 412k during run whereas global available VM goes down by 1M, so roughly 60% of the memory overhead is inside the kernel. (...)"
"What's more stunning is the huge amount of kernel time (which is busy CPU time, not something like waiting on disk!) that CreateFile
obviously consumes. 17 seconds for 100k calls boils down to around 450,000 cycles for opening one handle on this machine. Compared to that, the mere 10 bytes of virtual memory going away are kind of negligible."
100.000s of buffers with just a few bytes each
You're doing it wrong. – CastellanCreateFile
is not affected by this limit. – Illbred_FILE_OBJECT
alone (including_OBJECT_HEADER
) is ~300b, and some of its fields are pointers to other related structures. – Smirch_FILE_OBJECT
, that seems like a good starting point. – IllbredMEMORYSTATUSEX::ullAvailVirtual
over 100k calls, nothing else running). Don't ask me why it isn't 8 or 16 bytes, I wouldn't know. Took around 17 seconds of kernel time, process had 100,030 handles open upon exiting. Private working set goes up by 412k during run whereas global available VM goes down by 1M, so roughly 60% of the memory overhead is inside the kernel. Can post source and details if you reopen. (not like it's very useful) – BiiskCreateFile
obviously consumes. 17 seconds for 100k calls boils down to around 450,000 cycles for opening one handle on this machine. Compared to that, the mere 10 bytes of virtual memory going away are kind of neglegible. – Biisk