Difference between Marshal.SizeOf and sizeof, I just don't get it
Asked Answered
F

4

10

Until now I have just taken for granted that Marshal.SizeOf is the right way to compute the memory size of a blittable struct on the unmanaged heap (which seems to be the consensus here on SO and almost everywhere else on the web).

But after having read some cautions against Marshal.SizeOf (this article after "But there's a problem...") I tried it out and now I am completely confused:

public struct TestStruct
{
    public char x;
    public char y;
}

class Program
{
    public static unsafe void Main(string[] args)
    {
        TestStruct s;
        s.x = (char)0xABCD;
        s.y = (char)0x1234;

        // this results in size 4 (two Unicode characters)
        Console.WriteLine(sizeof(TestStruct));

        TestStruct* ps = &s;

        // shows how the struct is seen from the managed side... okay!      
        Console.WriteLine((int)s.x);
        Console.WriteLine((int)s.y);

        // shows the same as before (meaning that -> is based on 
        // the same memory layout as in the managed case?)... okay!
        Console.WriteLine((int)ps->x);
        Console.WriteLine((int)ps->y);

        // let's try the same on the unmanaged heap
        int marshalSize = Marshal.SizeOf(typeof(TestStruct));
        // this results in size 2 (two single byte characters)
        Console.WriteLine(marshalSize);

        TestStruct* ps2 = (TestStruct*)Marshal.AllocHGlobal(marshalSize);

        // hmmm, put to 16 bit numbers into only 2 allocated 
        // bytes, this must surely fail...
        ps2->x = (char)0xABCD;
        ps2->y = (char)0x1234;

        // huh??? same result as before, storing two 16bit values in 
        // only two bytes??? next will be a perpetuum mobile...
        // at least I'd expect an access violation
        Console.WriteLine((int)ps2->x);
        Console.WriteLine((int)ps2->y);

        Console.Write("Press any key to continue . . . ");
        Console.ReadKey(true);
    }
}

What's going wrong here? What memory layout does the field dereferencing operator '->' assume? Is '->' even the right operator for addressing unmanaged structs? Or is Marshal.SizeOf the wrong size operator for unmanaged structs?

I have found nothing that explains this in a language I understand. Except for "...struct layout is undiscoverable..." and "...in most cases..." wishy-washy kind of stuff.

Foretop answered 23/3, 2018 at 23:58 Comment(4)
> "Marshal.SizeOf is the right way to compute the memory size of a blittable struct on the unmanaged heap" - I'd default to Unsafe.SizeOf<T>() if the size was measured in .NET terms; sorry if that throws a spanner at youTieback
I think this article explains your concern codeproject.com/Articles/97711/sizeof-vs-Marshal-SizeOfPettigrew
The struct does not have a [StructLayout] attribute that specifies the CharSet property. The default is CharSet.Ansi, a crime that native C and C++ code often commits. Using 1 byte to store a character was feasible before the rest of the non-English world started using personal computers, it dragged on for a while with character sets. The takeaway is that Marshal.SizeOf tells you what happens when you interop with native code, sizeof doesn't. And your code corrupts memory, the kind of bug that is not guaranteed to instantly crash your program. That is what unsafe means.Pacheco
Anyhoo, apply [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)] and now the struct declaration gets a very desirable property, it becomes "blittable". The unmanaged layout is identical to the managed layout and that makes pinvoke very efficient. And Marshal.SizeOf() does what you expected it to do. Another place where this comes up is in code that tries to make the System.IO.MemoryMappedFiles namespace efficient.Pacheco
I
3

I think the one question you still don't have answered is what's going on in your particular situation:

&ps2->x
0x02ca4370  <------
    *&ps2->x: 0xabcd 'ꯍ'
&ps2->y
0x02ca4372  <-------
    *&ps2->y: 0x1234 'ሴ'

You are writing to and reading from (possibly) unallocated memory. Because of the memory area you're in, it's not detected.

This will reproduce the expected behavior (at least on my system, YMMV):

  TestStruct* ps2 = (TestStruct*)Marshal.AllocHGlobal(marshalSize*10000);

  // hmmm, put to 16 bit numbers into only 2 allocated 
  // bytes, this must surely fail...
  for (int i = 0; i < 10000; i++)
  {
    ps2->x = (char)0xABCD;
    ps2->y = (char)0x1234;
    ps2++;
  }
Ingrain answered 24/3, 2018 at 0:21 Comment(6)
Exactly, everytime I exceeded the boundaries of allocated memory in the past, I got an access violation. But then again there might have been cases where I didn't even notice... ;-) So it's even more understandable that I want to clarify what's going on in the unsafe world as much as possible.Foretop
@Foretop -- see my edit. This 1) prevents a round up for memory alignment reasons on the part of AllocHGlobal from mattering, and 2) does more to ensure your access violation will be detected.Ingrain
funny, this throws either a StackOverflowException or OutOfMemoryException, depending of whether I start the app from the IDE or from Explorer. I take this as a sign of something serious going on in the memory wrecking department. ;-)Foretop
Interesting. I would not expect that. I got: Unhandled exception at 0x7760A879 (ntdll.dll) in ConsoleApplication3.exe: 0xC0000374: A heap has been corrupted (parameters: 0x77645910). Anyway...I guess that's why it's "undefined" :)Ingrain
Strange... my Windows 10 maybe? Or I am already a victim of a Spectre/Meltdown exploit which is interfering with my memory corruption? Just kidding...Foretop
@Foretop - Learn something new every day. My program stack is quite close to the allocated heap pointer. So...yeah, stack corruption...not surprising after all.Ingrain
J
7

The difference is: the sizeof operator takes a type name and tells you how many bytes of managed memory need to be allocated for an instance of that struct.This is not necessarily stack memory; structs are allocated off the heap when they are array elements, fields of a class, and so on. By contrast, Marshal.SizeOf takes either a type object or an instance of the type, and tells you how many bytes of unmanaged memory need to be allocated. These can be different for a variety of reasons. The name of the type gives you a clue: Marshal.SizeOf is intended to be used when marshaling a structure to unmanaged memory.

Another difference between the two is that the sizeof operator can only take the name of an unmanaged type; that is, a struct type whose fields are only integral types, Booleans, pointers and so on. (See the specification for an exact definition.) Marshal.SizeOf by contrast can take any class or struct type.

Jugum answered 24/3, 2018 at 0:5 Comment(7)
Well, yes, this is what I have read everywhere already. But that doesn't explain why I can store (a total of) 32 bit values in 16 bit of allocated memory. Sounds to good to be true, but maybe I should explore that as a business model... ;-)Foretop
@Foretop now try Unsafe.SizeOf<TestStruct>() or sizeof(TestStruct); they return 4Tieback
So sizeof / Unsafe.SizeOf is the "right" way to compute the size? I'd be satisfied with this, but why don't I get an access violation when I access my (insufficient) memory by '->'?Foretop
@Foretop the "right" way is contextual; if you're talking about how your type maps to managed code, then: sure; if you're talking about how your type maps to P/Invoke: then: ask MarshalTieback
@MarcGravell: okay, I think I understand. I was aware of the fact that any c/c++ code I pass the struct to might assume a different memory layout. But my problem was only related to understanding the C# part of it.Foretop
@Foretop access violations are ... tricky; I'm not sure it is a hard guarantee to spot every scenario - unmanaged memory is inherently troublesome like that :)Tieback
@Foretop if you're only talking about C#: then forget Marshal - it has no opinionTieback
I
3

I think the one question you still don't have answered is what's going on in your particular situation:

&ps2->x
0x02ca4370  <------
    *&ps2->x: 0xabcd 'ꯍ'
&ps2->y
0x02ca4372  <-------
    *&ps2->y: 0x1234 'ሴ'

You are writing to and reading from (possibly) unallocated memory. Because of the memory area you're in, it's not detected.

This will reproduce the expected behavior (at least on my system, YMMV):

  TestStruct* ps2 = (TestStruct*)Marshal.AllocHGlobal(marshalSize*10000);

  // hmmm, put to 16 bit numbers into only 2 allocated 
  // bytes, this must surely fail...
  for (int i = 0; i < 10000; i++)
  {
    ps2->x = (char)0xABCD;
    ps2->y = (char)0x1234;
    ps2++;
  }
Ingrain answered 24/3, 2018 at 0:21 Comment(6)
Exactly, everytime I exceeded the boundaries of allocated memory in the past, I got an access violation. But then again there might have been cases where I didn't even notice... ;-) So it's even more understandable that I want to clarify what's going on in the unsafe world as much as possible.Foretop
@Foretop -- see my edit. This 1) prevents a round up for memory alignment reasons on the part of AllocHGlobal from mattering, and 2) does more to ensure your access violation will be detected.Ingrain
funny, this throws either a StackOverflowException or OutOfMemoryException, depending of whether I start the app from the IDE or from Explorer. I take this as a sign of something serious going on in the memory wrecking department. ;-)Foretop
Interesting. I would not expect that. I got: Unhandled exception at 0x7760A879 (ntdll.dll) in ConsoleApplication3.exe: 0xC0000374: A heap has been corrupted (parameters: 0x77645910). Anyway...I guess that's why it's "undefined" :)Ingrain
Strange... my Windows 10 maybe? Or I am already a victim of a Spectre/Meltdown exploit which is interfering with my memory corruption? Just kidding...Foretop
@Foretop - Learn something new every day. My program stack is quite close to the allocated heap pointer. So...yeah, stack corruption...not surprising after all.Ingrain
T
2

What memory layout does the field dereferencing operator '->' assume?

Whatever the CLI decides

Is '->' even the right operator for addressing unmanaged structs?

That is an ambiguous concept. There are structs in unmanaged memory accessed via the CLI: these follow CLI rules. And there are structs that are merely notional monikers for unmanaged code (perhaps C/C++) accessing the same memory. This follows the rules of that framework. Marshalling usually refers to P/Invoke, but that isn't necessarily applicable here.

Or is Marshal.SizeOf the wrong size operator for unmanaged structs?

I'd default to Unsafe.SizeOf<T>, which is essentially sizeof(T) - which is perfectly well-defined for the CLI/IL (including padding rules etc), but isn't possible in C#.

Tieback answered 24/3, 2018 at 0:7 Comment(6)
I don't understand how the "->" operator can be ambiguous when applied to an unsafe pointer. Isn't any unsafe pointer the same, regardless of how the memory it is pointing to was allocated? And where do my 32 bits of data go, while I have only allocated 16 bits of memory?Foretop
@Foretop who says you allocated 16 bits? the CLI reports 4 bytes for that. Ignore Marshal: that is just talking about P/Invoke, and you aren't doing P/Invoke. When I said "ambiguous": I meant that "unmanaged structs" was ambiguous; the -> operator will follow CLI rules and is perfectly well defined under the same rules that say it is 4 bytesTieback
I had assumed that passing marshalSize (==2) to Marshal.AllocHGlobal would allocate 2 bytes of memory... Isn't that true?Foretop
@Foretop if you passed 2 to Marshal.AllocHGlobal, then a: yes you allocated 2 byes, and b: damn don't do that - there's no need to worry the unmanaged allocator for 2 bytes, and c: it was your fault that you didn't ask for enough memory and (d?) it is your fault if the memory doesn't flag as invalid: ultimately, the moment you touch unsafe in any way: you are explicitly taking responsibility for any mistakesTieback
no question, it's my mistake. I don't blame c# for it. But in order to be able to take responsibility I need to understand what's going on. And on that part I am missing some easily accessible information. I mean, memory layout isn't rocket science, I suppose.Foretop
@Foretop fair enough; so: if you're talking C# : trust sizeof(Foo), or Unsafe.SizeOf<Foo>() (they are the same thing in IL terms, just: sizeof doesn't work with generics)Tieback
L
2

A char marshals, by default, to an ANSI byte. This allows interoperability with most C libraries and is fundamental to the operation of the .NET runtime.

I believe the correct solution is to change TestStruct to:

public struct TestStruct
{
    [System.Runtime.InteropServices.MarshalAs(UnmanagedType.U2)]
    public char x;
    [System.Runtime.InteropServices.MarshalAs(UnmanagedType.U2)]
    public char y;
}

UnmanagedType.U2 means unsigned 'integer' 2 bytes long, which makes it equivalent to the wchar_t type in a C header.

Seamless porting of C structures to .NET is possible with attention to detail and opens many doors for interop with native libraries.

Luther answered 26/9, 2021 at 13:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.