How do I trace an intermittent crash that occurs only under the debugger, but is not caught by it?
Asked Answered
S

4

8

I have an odd intermittent crash that only occurs under some circumstances that I am having trouble solving, and I'm seeking SO's advice for how to tackle it.

The bug

At apparently random points, Windows shows the "[App] has stopped working" dialog. It is an APPCRASH in ntdll.dll, exception code 4000001f, exception offset 000a2562. Here's where it gets tricky: this only occurs when running the application under the debugger. However, the debugger does not catch this exception, and at the point where Windows shows this dialog, the IDE is not responding. This bug does not occur when running normally, i.e. not within the IDE debugger.

Screenshot of the Windows crash dialog

I can't reproduce it outside the debugger, so I can't run the program and attach when it's already crashed. I can't pause execution when Windows shows this dialog, since the IDE isn't responding. I can manually trace through lines of code to see where it occurs. There are several, and where it occurs is apparently random. For a while it occurred when showing a window (or new form), for a while when creating a thread.

Edit: I have tracked it down to the IDE: if I pause on a breakpoint and click the Thread Status tab, the program will crash immediately with the above dialog even though it is, theoretically, paused. In this situation, the IDE remains responsive. This is really weird.

More information

I have just moved my development environment to VMWare Fusion. The bug also occurs running a build from my old (native Windows) computer on my new computer; it did not occur with the same EXE file on that old computer. This makes me wonder if it is related to Fusion or something in my new setup.

I am running:

  • Windows 7 Pro x64 on WMWare Fusion 3.1.3 on OSX Lion 10.7.1, all fully updated. Fusion is running in "Full screen" mode on one of my screens.
  • A colleague running Windows 7 natively (not in a VM) does not encounter this issue. Nor did I on my old Vista computer.
  • Embarcadero RAD Studio 2010, fully updated (I hope; there are about five updates and getting them all in order is tricky.) I have DDevExtensions 2.4.1 installed, and the latest IDE Fix Pack too: uninstalling both these has no effect.
  • The application is written mostly in C++, with snippets of Delphi. It is 32-bit.
  • We use EurekaLog, but the exception is not caught by it either. (Normally, an exception would be caught first by the debugger, then by EurekaLog.)
  • Running a debug build (no EurekaLog, extra debug info etc, debug DCUs set to true) also reproduces it. However, the "Debug DCUs" option on The Delphi Linking page of the C++Builder project settings dialog seems to have no effect - I can't step into the VCL code and find the line that actually triggers the error.
  • Codeguard (which detects memory access errors, double frees, access in freed memory, buffer overruns, etc) reports nothing.
Skidway answered 22/8, 2011 at 6:36 Comment(6)
ISTM that somehow, either VMWare Fusion or Delphi 2010 could be the culprit. Does QC report something similar? If not, it could be VMWare Fusion or Win64. I had some problems with Parallels (Win7 x64) when Lion came out, but that was solved by an update.Spaulding
@Rudy: that's my guess too (Fusion, not D2010; I've been using RAD Studio 2010 on Vista for ages, as has a colleague on Win7 running natively.) But I've had no trouble debugging with earlier versions - I last tried D2007 on XP on Fusion. The thing is, it's almost unusable as-is - I can't run it for more than a few seconds! What were your Parallels Lion problems, and should I try that?Skidway
The Parallels problems were minor (e.g. VM HDDs not showing up on the Mac desktop). I can recommend Parallels wholeheartedly.Spaulding
Try clearing unused entries in your watch list. I have found they can interfere with the debugger.Lurk
An appcrash with 4000001f is reproducable on my Delphi 2009 System (Windows 7 64 Bit) when I close an application which uses ADOConnection to open a xls / xlsx fileEnculturation
Update: only reproducable on one of two (very similar) development systemsEnculturation
M
8

This has all the hallmarks of a memory corruption. It only appears when you run under a one particular environment, and occurs at a different location each time. Both classic symptoms.

The best way I know to debug this is to download the full FastMM and run with full debugging options enabled.

If that doesn't help then you are reduced to removing parts of code, one by one, until you can isolate the problem.

Another problem I have seen in D2010 is a problem when mixing local class definitions (i.e. class inside class) with generics. The code generated is fine but the debug DCUs are wrong and when stepping through the code the debugger jumps to the wrong file and dies shortly after. You don't seem to have quite the same problem but there are similarities in the IDE deaths.

Finally I would advise you to suspect your own code rather than VMware. It's always tempting to blame something else but in my experience, whenever I have done so, it was always my code in the end!

Mcfarland answered 22/8, 2011 at 7:16 Comment(4)
That's very possible. I've seen bugs like this before caused by several things. But: why now, in Fusion only, for a product we've had released for a month now with no similar user bug reports? And why only under the debugger, and why does clicking the Thread Status tab immediately crash the (paused!) program?Skidway
Memory access after free can work for years until something, e.g. different system disturbs it. I had similar experience with code that failed with the client but not for us. Took ages to finally work it out.Mcfarland
You're right, it is tempting, given the juxtaposition. I'll restrain myself from blaming Fusion quite yet :) Btw, this one took me aaaages to work out - see the edit at the bottom for the actual cause.Skidway
Yup, every time you suspect VMWare, think again. Up till now we have only had one problem where VMWare turned out to be a factor (not the cause!). And it wasn't related to VMWare specifically, but virtualisation in general.Dora
H
3

I hit a quite similar problem. I've also been developing a .dll and when I've set a breakpoint anywhere in my code, Delphi stopped at the source code line and the host-application crashed immediately.

Closing the "Thread Status Window" in debug layout "fixed" the problem. I'm working on Windows 7 64-bit and Delphi XE3.

Hekking answered 26/6, 2013 at 14:47 Comment(0)
G
2

4000001F is STATUS_WX86_BREAKPOINT

In other words, it is INT 3, which was not handled by IDE.

Since it is raised in NTDLL - I would guess that this is indication of memory corruption in system heap. Remember, some Windows code would switch to debugger version when running under debugger. That's why you can not reproduce this when application is running as standalone outside of the debugger - because breakpoint is not generated.

You may try FastMM in full debug mode, but I do not think that it will help you. The corruption does not happen in your memory, it happens in system memory. Yes, perhaps memory allocation scheme will be changed - and your corruption will reveal itself in your code/memory... may be. Try use top-down allocations, try use SafeMM...

Another possible approach would be using Application Verifier.

See also:

Gebhart answered 21/5, 2015 at 22:24 Comment(0)
T
1

Check the The projects dsk file and make sure it does not have a reference pointing to the wrong unit. The fix is to open the dsk in an editor and change the file location to the correct location.

Trustful answered 11/12, 2013 at 19:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.