Update: answer COMPLETELY rewritten. Original answer contained methods to find the largest possible addressable array on any system by divide and conquer, see history of this answer if you're interested. The new answer attempts to explain the 56 bytes gap.
In his own answer, AZ explained that the maximum array size is limited to less then the 2GB cap and with some trial and error (or another method?) finds the following (summary):
- If the size of the type is 1, 2, 4 or 8 bytes, the maximum occupiable size is 2GB - 56 bytes;
- If the size of the type is 16 bytes, the max is 2GB - 48 bytes;
- If the size of the type is 32 bytes, the max is 2GB - 32 bytes.
I'm not entirely sure about the 16 bytes and 32 bytes situations. The total available size for the array might be different if it's an array of structs or a build-in type. I'll emphasize on 1-8 bytes type size (of which I'm not that sure either, see conclusion).
Data layout of an array
To understand why the CLR does not allow exactly 2GB / IntPtr.Size
elements we need to know how an array is structured. A good starting point is this SO article, but unfortunately, some of the information seems false, or at least incomplete. This in-depth article on how the .NET CLR creates runtime objects proved invaluable, as well as this Arrays Undocumented article on CodeProject.
Taking all the information in these articles, it comes down to the following layout for an array in 32 bit systems:
Single dimension, built-in type
SSSSTTTTLLLL[...data...]0000
^ sync block
^ type handle
^ length array
^ NULL
Each part is one system DWORD
in size. On 64 bit windows, this looks as follows:
Single dimension, built-in type
SSSSSSSSTTTTTTTTLLLLLLLL[...data...]00000000
^ sync block
^ type handle
^ length array
^ NULL
The layout looks slightly different when it's an array of objects (i.e., strings, class instances). As you can see, the type handle to the object in the array is added.
Single dimension, built-in type
SSSSSSSSTTTTTTTTLLLLLLLLtttttttt[...data...]00000000
^ sync block
^ type handle
^ length array
^ type handle array element type
^ NULL
Looking further, we find that a built-in type, or actually, any struct type, gets its own specific type handler (all uint
share the same, but an int
has a different type handler for the array then a uint
or byte
). All arrays of object share the same type handler, but have an extra field that points to the type handler of the objects.
A note on struct types: padding may not always be applied, which may make it hard to predict the actual size of a struct.
Still not 56 bytes...
To count towards the 56 bytes of the AZ's answer, I have to make a few assumptions. I assume that:
- the syncblock and type handle count towards the size of an object;
- the variable holding the array reference (object pointer) counts towards the size of an object;
- the array's null terminator counts towards the size of an object.
A syncblock is placed before the address the variable points at, which makes it look like it's not part of the object. But in fact, I believe it is and it counts towards the internal 2GB limit. Adding all these, we get, for 64 bit systems:
ObjectRef +
Syncblock +
Typehandle +
Length +
Null pointer +
--------------
40 (5 * 8 bytes)
Not 56 yet. Perhaps someone can have a look with Memory View during debugging to check how the layout of an array looks like under 64 bits windows.
My guess is something along these lines (take your pick, mix and match):
2GB will never be possible, as that is one byte into the next segment. The largest block should be 2GB - sizeof(int)
. But this is silly, as mem indexes should start at zero, not one;
Any object larger then 85016 bytes will be put on the LOH (large object heap). This may include an extra pointer, or even a 16 byte struct holding LOH information. Perhaps this counts towards the limit;
Aligning: assuming the objectref does not count (it is in another mem segment anyway), the total gap is 32 bytes. It's very well possible that the system prefers 32 byte boundaries. Take a new look at the memory layout. If the starting point needs to be on a 32 byte boundary, and it needs room for the syncblock before it, the syncblock will end up in the end of the first 32 bytes block. Something like this:
XXXXXXXXXXXXXXXXXXXXXXXXSSSSSSSSTTTTTTTTLLLLLLLLtttttttt[...data...]00000000
where XXX..
stands for skipped bytes.
multi dimensional arrays: if you create your arrays dynamically with Array.CreateInstance
with 1 or more dimensions, a single dim array will be created with two extra DWORDS containing the size and the lowerbound of the dimension (even if you have only one dimension, but only if the lowerbound is specified as non-zero). I find this highly unlikely, as you would probably have mentioned this if this were the case in your code. But it would bring the total to 56 bytes overhead ;).
Conclusion
From all I gathered during this little research, I think that the Overhead + Aligning - Objectref
is the most likely and most fitting conclusion. However, a "real" CLR guru might be able to shed some extra light on this peculiar subject.
None of these conclusions explain why 16 or 32 byte datatypes have a 48 and 32 byte gap respectively.
Thanks for a challenging subject, learned something along my way. Perhaps some people can take the downvote off when they find this new answer more related to the question (which I originally misunderstood, and apologies for the clutter this may have caused).
sizeof(T)
. I didn't know you were on C++.NET, but it shouldn't be too hard to translate C# to C++. I'm sorry that I don't understand your own answer and didn't see how it fit the story, but I'll try again. – Jodi