Why would finding a type's initializer throw a NullReferenceException?
Asked Answered
J

2

198

This has got me stumped. I was trying to optimize some tests for Noda Time, where we have some type initializer checking. I thought I'd find out whether a type has a type initializer (static constructor or static variables with initializers) before loading everything into a new AppDomain. To my surprise, a small test of this threw NullReferenceException - despite there being no null values in my code. It only throws the exception when compiled with no debug information.

Here's a short but complete program to demonstrate the problem:

using System;

class Test
{
    static Test() {}

    static void Main()
    {
        var cctor = typeof(Test).TypeInitializer;
        Console.WriteLine("Got initializer? {0}", cctor != null);
    }    
}

And a transcript of compilation and output:

c:\Users\Jon\Test>csc Test.cs
Microsoft (R) Visual C# Compiler version 4.0.30319.17626
for Microsoft (R) .NET Framework 4.5
Copyright (C) Microsoft Corporation. All rights reserved.


c:\Users\Jon\Test>test

Unhandled Exception: System.NullReferenceException: Object reference not set to
an instance of an object.
   at System.RuntimeType.GetConstructorImpl(BindingFlags bindingAttr, Binder bin
der, CallingConventions callConvention, Type[] types, ParameterModifier[] modifi
ers)
   at Test.Main()

c:\Users\Jon\Test>csc /debug+ Test.cs
Microsoft (R) Visual C# Compiler version 4.0.30319.17626
for Microsoft (R) .NET Framework 4.5
Copyright (C) Microsoft Corporation. All rights reserved.


c:\Users\Jon\Test>test
Got initializer? True

Now you'll notice I'm using .NET 4.5 (the release candidate) - which may be relevant here. It's somewhat tricky for me to test it with the various other original frameworks (in particular "vanilla" .NET 4) but if anyone else has easy access to machines with other frameworks, I'd be interested in the results.

Other details:

  • I'm on an x64 machine, but this problem occurs with both x86 and x64 assemblies
  • It's the "debug-ness" of the calling code which makes a difference - even though in the test case above it's testing it on its own assembly, when I tried this against Noda Time I didn't have to recompile NodaTime.dll to see the differences - just Test.cs which referred to it.
  • Running the "broken" assembly on Mono 2.10.8 doesn't throw

Any ideas? Framework bug?

EDIT: Curiouser and curiouser. If you take out the Console.WriteLine call:

using System;

class Test
{
    static Test() {}

    static void Main()
    {
        var cctor = typeof(Test).TypeInitializer;
    }    
}

It now only fails when compiled with csc /o- /debug-. If you turn on optimizations, (/o+) it works. But if you include the Console.WriteLine call as per the original, both versions will fail.

Jelena answered 21/7, 2012 at 15:51 Comment(19)
Re last bullet; was that the same assembly on mono? Or the same c#, compiled-and-run on mono? Just wondering if this is compiler vs runtime. Presumably you've glanced at the IL and nothing looks crazy?Pacifica
Heh - "despite there being no null values in my code" this might actually be the first time ever in recorded SO history that the "the bug isn't in my code" card has been played successfully.Pacifica
@RemusRusanu: Ooh, that's interesting. More interesting than .NET 3.5 or 2.0, in fact.Jelena
Returns True just fine with no Debug doing first test from cmdline with .NET 4 Framework, Visual C# compiler 4.0.30319.1Carlenacarlene
@MarcGravell: Yeah, while I'm normally very skeptical of saying "There's no bug in my code" in this case, when there's a single expression at stake, and the exception is a NullReferenceException (which should always indicate a bug) it really does look dodgy. I strongly suspect if this is a .NET 4.5 bug, I've missed the window for getting it fixed...Jelena
@JonSkeet: We all know MS's SP1 is the real RTM ;pCogon
Interesting: csc /o+ test.cs works. Must be some kind of (lack of) optimizer bug... Still clueless on the effects though.Cogon
@JonSkeet Just confirmed I received the exact results you did when using 4.0.30319.17626 for Microsoft (R) .NET Framework 4.5 on the build tablet with VS 2012 RCCarlenacarlene
@JonSkeet: Some further results. It only crashes when csc /o- /debug- test.cs. Any other combination works fine. I suspect a JIT bug.Cogon
@minitech: what flags were passed to ilasm?Cogon
@leppie: None, but adding /DEBUG works. Sigh. Well, it's not a compiler bug!Encyclopedia
@minitech: If it was a compiler bug, PEVerify would fail. It does not.Cogon
@leppie: Nope, csc /o+ /debug- Test.cs fails for me too, which is odd.Jelena
Tried csc /o+ /debug- Test.cs, fails for me as well.Taveras
When you set the JIT tracking to on, everything works fine. [assembly: Debuggable(true, true)]Lavaliere
Yes, the bug only reproduces if there's no debugging information in the PE file. I've opened a bug against the CLR team and we'll see what they respond.Oralle
Using csc.exe to compile and run on .Net 4.5 RC / X64 I get the same exception. But when compiled and run in visual studio 2012 it runs successfully.Skyjack
@Jon, what was the outcome on this in the end?Distributary
@IanRingrose: It was acknowleged as a bug and fixed before release, I believe - or possibly in a patch. (I can't reproduce it now.)Jelena
L
285

with csc test.cs:

(196c.1874): Access violation - code c0000005 (first chance)
mscorlib_ni!System.RuntimeType.GetConstructorImpl(System.Reflection.BindingFlags, System.Reflection.Binder, System.Reflection.CallingConventions, System.Type[], System.Reflection.ParameterModifier[])+0xa3:
000007fe`e5735403 488b4608        mov     rax,qword ptr [rsi+8] ds:00000000`00000008=????????????????

Trying to load from [rsi+8] when @rsi is NULL. Lets inspect the function:

0:000> ln 000007fe`e5735403
(000007fe`e5735360)   mscorlib_ni!System.RuntimeType.GetConstructorImpl(System.Reflection.BindingFlags, System.Reflection.Binder, System.Reflection.CallingConventions, System.Type[], System.Reflection.ParameterModifier[])+0xa3
0:000> uf 000007fe`e5735360
Flow analysis was incomplete, some code may be missing
mscorlib_ni!System.RuntimeType.GetConstructorImpl(System.Reflection.BindingFlags, System.Reflection.Binder, System.Reflection.CallingConventions, System.Type[], System.Reflection.ParameterModifier[]):
000007fe`e5735360 53              push    rbx
000007fe`e5735361 55              push    rbp
000007fe`e5735362 56              push    rsi
000007fe`e5735363 57              push    rdi
000007fe`e5735364 4154            push    r12
000007fe`e5735366 4883ec30        sub     rsp,30h
000007fe`e573536a 498bf8          mov     rdi,r8
000007fe`e573536d 8bea            mov     ebp,edx
000007fe`e573536f 48c744242800000000 mov   qword ptr [rsp+28h],0
000007fe`e5735378 488bb42480000000 mov     rsi,qword ptr [rsp+80h]
000007fe`e5735380 4889742420      mov     qword ptr [rsp+20h],rsi
000007fe`e5735385 41b903000000    mov     r9d,3
...    
mscorlib_ni!System.RuntimeType.GetConstructorImpl(System.Reflection.BindingFlags, System.Reflection.Binder, System.Reflection.CallingConventions, System.Type[], System.Reflection.ParameterModifier[])+0x97:
000007fe`e57353f7 488b4b08        mov     rcx,qword ptr [rbx+8]
000007fe`e57353fb 85c9            test    ecx,ecx
000007fe`e57353fd 0f848e000000    je      mscorlib_ni!System.RuntimeType.GetConstructorImpl(System.Reflection.BindingFlags, System.Reflection.Binder, System.Reflection.CallingConventions, System.Type[], System.Reflection.ParameterModifier[])+0x131 (000007fe`e5735491)

mscorlib_ni!System.RuntimeType.GetConstructorImpl(System.Reflection.BindingFlags, System.Reflection.Binder, System.Reflection.CallingConventions, System.Type[], System.Reflection.ParameterModifier[])+0xa3:
000007fe`e5735403 488b4608        mov     rax,qword ptr [rsi+8]
000007fe`e5735407 85c0            test    eax,eax
000007fe`e5735409 7545            jne     mscorlib_ni!System.RuntimeType.GetConstructorImpl(System.Reflection.BindingFlags, System.Reflection.Binder, System.Reflection.CallingConventions, System.Type[], System.Reflection.ParameterModifier[])+0xf0 (000007fe`e5735450)
...

@rsi is loaded in the beginning from [rsp+20h] so it must be passed by caller. Lets look at the caller:

0:000> k3
Child-SP          RetAddr           Call Site
00000000`001fec70 000007fe`8d450110 mscorlib_ni!System.RuntimeType.GetConstructorImpl(System.Reflection.BindingFlags, System.Reflection.Binder, System.Reflection.CallingConventions, System.Type[], System.Reflection.ParameterModifier[])+0xa3
00000000`001fecd0 000007fe`ecb6e073 image00000000_01120000!Test.Main()+0x60
00000000`001fed20 000007fe`ecb6dcb2 clr!CoUninitializeEE+0x7ae1f
0:000> ln 000007fe`8d450110
(000007fe`8d4500b0)   image00000000_01120000!Test.Main()+0x60
0:000> uf 000007fe`8d4500b0
image00000000_01120000!Test.Main():
000007fe`8d4500b0 53              push    rbx
000007fe`8d4500b1 4883ec40        sub     rsp,40h
000007fe`8d4500b5 e8a69ba658      call    mscorlib_ni!System.Console.get_In() (000007fe`e5eb9c60)
000007fe`8d4500ba 4c8bd8          mov     r11,rax
000007fe`8d4500bd 498b03          mov     rax,qword ptr [r11]
000007fe`8d4500c0 488b5048        mov     rdx,qword ptr [rax+48h]
000007fe`8d4500c4 498bcb          mov     rcx,r11
000007fe`8d4500c7 ff5238          call    qword ptr [rdx+38h]
000007fe`8d4500ca 488d0d7737eeff  lea     rcx,[000007fe`8d333848]
000007fe`8d4500d1 e88acb715f      call    clr!CoUninitializeEE+0x79a0c (000007fe`ecb6cc60)
000007fe`8d4500d6 4c8bd8          mov     r11,rax
000007fe`8d4500d9 48b92012531200000000 mov rcx,12531220h
000007fe`8d4500e3 488b09          mov     rcx,qword ptr [rcx]
000007fe`8d4500e6 498b03          mov     rax,qword ptr [r11]
000007fe`8d4500e9 4c8b5068        mov     r10,qword ptr [rax+68h]
000007fe`8d4500ed 48c744242800000000 mov   qword ptr [rsp+28h],0
000007fe`8d4500f6 48894c2420      mov     qword ptr [rsp+20h],rcx
000007fe`8d4500fb 41b903000000    mov     r9d,3
000007fe`8d450101 4533c0          xor     r8d,r8d
000007fe`8d450104 ba38000000      mov     edx,38h
000007fe`8d450109 498bcb          mov     rcx,r11
000007fe`8d45010c 41ff5228        call    qword ptr [r10+28h]
000007fe`8d450110 48bb1032531200000000 mov rbx,12533210h
000007fe`8d45011a 488b1b          mov     rbx,qword ptr [rbx]
000007fe`8d45011d 33d2            xor     edx,edx
000007fe`8d45011f 488bc8          mov     rcx,rax
000007fe`8d450122 e829452e58      call    mscorlib_ni!System.Reflection.ConstructorInfo.op_Equality(System.Reflection.ConstructorInfo, System.Reflection.ConstructorInfo) (000007fe`e5734650)
000007fe`8d450127 0fb6c8          movzx   ecx,al
000007fe`8d45012a 33c0            xor     eax,eax
000007fe`8d45012c 85c9            test    ecx,ecx
000007fe`8d45012e 0f94c0          sete    al
000007fe`8d450131 0fb6c8          movzx   ecx,al
000007fe`8d450134 894c2430        mov     dword ptr [rsp+30h],ecx
000007fe`8d450138 488d542430      lea     rdx,[rsp+30h]
000007fe`8d45013d 488d0d24224958  lea     rcx,[mscorlib_ni+0x682368 (000007fe`e58e2368)]
000007fe`8d450144 e807246a5f      call    clr+0x2550 (000007fe`ecaf2550)
000007fe`8d450149 488bd0          mov     rdx,rax
000007fe`8d45014c 488bcb          mov     rcx,rbx
000007fe`8d45014f e81cab2758      call    mscorlib_ni!System.Console.WriteLine(System.String, System.Object) (000007fe`e56cac70)
000007fe`8d450154 90              nop
000007fe`8d450155 4883c440        add     rsp,40h
000007fe`8d450159 5b              pop     rbx
000007fe`8d45015a c3              ret

(My disassemble shows System.Console.get_In because I added a Console.GetLine() in test.cs to have an opportunity to break in debugger. I validated it doesn’t change the behavior).

We're in this call: 000007fe8d45010c 41ff5228 call qword ptr [r10+28h] (our AV frame ret address is the instruction right after this call).

Lets compare this with what happens when we compile csc /debug test.cs. We can set up a bp 000007fee5735360, luckily the module loads at the same address. On the instruction that loads @rsi:

0:000> r
rax=000007fee58e2f30 rbx=00000000027c6258 rcx=00000000027c6258
rdx=0000000000000038 rsi=00000000002debd8 rdi=0000000000000000
rip=000007fee5735378 rsp=00000000002de990 rbp=0000000000000038
 r8=0000000000000000  r9=0000000000000003 r10=000007fee58831c8
r11=00000000002de9c0 r12=0000000000000000 r13=00000000002dedc0
r14=00000000002dec58 r15=0000000000000004
iopl=0         nv up ei pl nz na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
mscorlib_ni!System.RuntimeType.GetConstructorImpl(System.Reflection.BindingFlags, System.Reflection.Binder, System.Reflection.CallingConventions, System.Type[], System.Reflection.ParameterModifier[])+0x18:
000007fe`e5735378 488bb42480000000 mov     rsi,qword ptr [rsp+80h] ss:00000000`002dea10=a0627c0200000000

Note that @rsi is 00000000002debd8. Stepping through the function shows that this the address that will be dereferenced later at the place when the bad exe bombs (ie. @rsi does not change). The stack is very interesting because it shows an extra frame:

0:000> k3
Child-SP          RetAddr           Call Site
00000000`002de990 000007fe`e5eddf68 mscorlib_ni!System.RuntimeType.GetConstructorImpl(System.Reflection.BindingFlags, System.Reflection.Binder, System.Reflection.CallingConventions, System.Type[], System.Reflection.ParameterModifier[])+0x18
00000000`002de9f0 000007fe`8d460119 mscorlib_ni!System.Type.get_TypeInitializer()+0x48
00000000`002dea30 000007fe`ecb6e073 good!Test.Main()+0x49*** WARNING: Unable to verify checksum for good.exe

0:000> ln 000007fe`e5eddf68
(000007fe`e5eddf20)   mscorlib_ni!System.Type.get_TypeInitializer()+0x48
0:000> uf 000007fe`e5eddf20
mscorlib_ni!System.Type.get_TypeInitializer():
000007fe`e5eddf20 53              push    rbx
000007fe`e5eddf21 4883ec30        sub     rsp,30h
000007fe`e5eddf25 488bd9          mov     rbx,rcx
000007fe`e5eddf28 ba22010000      mov     edx,122h
000007fe`e5eddf2d b901000000      mov     ecx,1
000007fe`e5eddf32 e8d1a075ff      call    CORINFO_HELP_GETSHARED_GCSTATIC_BASE (000007fe`e5638008)
000007fe`e5eddf37 488b88f0010000  mov     rcx,qword ptr [rax+1F0h]
000007fe`e5eddf3e 488b03          mov     rax,qword ptr [rbx]
000007fe`e5eddf41 4c8b5068        mov     r10,qword ptr [rax+68h]
000007fe`e5eddf45 48c744242800000000 mov   qword ptr [rsp+28h],0
000007fe`e5eddf4e 48894c2420      mov     qword ptr [rsp+20h],rcx
000007fe`e5eddf53 41b903000000    mov     r9d,3
000007fe`e5eddf59 4533c0          xor     r8d,r8d
000007fe`e5eddf5c ba38000000      mov     edx,38h
000007fe`e5eddf61 488bcb          mov     rcx,rbx
000007fe`e5eddf64 41ff5228        call    qword ptr [r10+28h]
000007fe`e5eddf68 90              nop
000007fe`e5eddf69 4883c430        add     rsp,30h
000007fe`e5eddf6d 5b              pop     rbx
000007fe`e5eddf6e c3              ret
0:000> ln 000007fe`8d460119

The call is the same call qword ptr [r10+28h] that we've seen before, so in the bad case this function was probably inlined in the Main(), so the fact that there is an extra frame is a red herring. If we look at the preparation of this call qword ptr [r10+28h] we notice this instruction: mov qword ptr [rsp+20h],rcx. This is what loads the address which gets eventually dereferenced as @rsi. In the good case, this is how @rcx is loaded:

000007fe`e5eddf32 e8d1a075ff      call    CORINFO_HELP_GETSHARED_GCSTATIC_BASE (000007fe`e5638008)
000007fe`e5eddf37 488b88f0010000  mov     rcx,qword ptr [rax+1F0h]

In the bad case it looks very different:

000007fe`8d4600d9 48b92012721200000000 mov rcx,12721220h
000007fe`8d4600e3 488b09          mov     rcx,qword ptr [rcx]

This is very different. Unlike the good case that calls CORINFO_HELP_GETSHARED_GCSTATIC_BASE and reads what ends up as the critical pointer that causes the AV from some member at offset 1F0 in a return structure, the optimized code loads it from a static address. And of course 12721220h contains NULL:

0:000> dp 12721220h L8
00000000`12721220  00000000`00000000 00000000`00000000
00000000`12721230  00000000`00000000 00000000`02722198
00000000`12721240  00000000`027221c8 00000000`027221f8
00000000`12721250  00000000`02722228 00000000`02722258

Unfortunately is too late for me to dig deeper right now, the dissasembly of CORINFO_HELP_GETSHARED_GCSTATIC_BASE is far from trivial. I'm posting this in hope someone more knowledgeable in CLR internals can make sense (as you can see, I really considered the issue just from the native instructions POV and completely ignored IL).

Latinist answered 21/7, 2012 at 21:4 Comment(9)
You deserve many more reps than this for your debugging skills.Pithecanthropus
This is an optimizer bug. CORINFO* is a function pointer, it calls JIT_GetSharedGCStaticBase. My guess is that it tripped up by the new 4.5 background jit feature and accesses a field before it is initialized, forgetting to jit the class. Report this at connect.microsoft.comAnsel
No need. We're already looking. You're absolutely right, what happens is that because we directly allocate an instance of RuntimeType the cctor of Type never gets called, so Type.EmptyTypes remains null and that's what is passed to GetConstructor.Oralle
Heh, I think I may be experiencing this bug as well, except I get an access violation throw by my program which is completely managed. My crude attempts at analyzing the native JIT code have gotten me nowhere, and the project is big and trying to capture the bug in a smaller test case is proving to be impossible :(Vtol
Actually, this program appears to cause an Access Violation as well. And I'm using the official release version of .Net 4.5Vtol
@Earlz: post a separate question and describe the issue there rather than in comments. FYI a managed NullReferenceException is just how a native AV gets translated into a managed exception. The hardware runs native.Latinist
Is there a book I can read to get these debugging skills? (Preferably beginning with "Idiots Guide to" or ending with "for Dummies")Axiology
@IgbyLargeman: Advanced Windows Debugging is pretty good.Latinist
@RemusRusanu: thanks. Do you have any experience with its counterpart, Advanced .Net Debugging?Axiology
C
10

As I believe I've found some new interesting findings about the problem, I decided to add them as an answer, acknowledging at the same time that they are not addressing the "why it happens" in the original question. Maybe someone who knows more about the internal workings of the involved types might post an edifying answer based also on the observations I'm posting.

I've also managed to reproduce the issue on my machine and I've tracked a connection with the System.Runtime.InteropServices._Type Interface, which is implemented by the System.Type class.

Initially, I've found at least 3 workaround approaches for fixing the problem:

  1. Simply by casting the Type to _Type inside the Main method:

    var cctor = ((_Type)typeof(Test)).TypeInitializer;
    
  2. Or making sure that approach 1 was used previously inside the method:

    var warmUp = ((_Type)typeof(Test)).TypeInitializer; 
    var cctor = ((Type)typeof(Test)).TypeInitializer;
    
  3. Or by adding a static field to the Test class and initializing it (with casting it to _Type):

    static ConstructorInfo _dummy1 = (typeof(object) as _Type).TypeInitializer;
    

Later on, I discovered that if we don't want to involve the System.Runtime.InteropServices._Type interface in the workarounds, the problem doesn't occur either by:

  1. Adding a static field to the Test class and initializing it (without casting it to _Type):

    static ConstructorInfo _dummy2 = typeof(object).TypeInitializer;
    
  2. Or by initializing the cctor variable itself as a static field of the class:

    static ConstructorInfo cctor = typeof(Test).TypeInitializer;
    

I'm looking forward to your feedback.

Collocation answered 16/5, 2013 at 13:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.