Hello and good day to you.
Need a bit of assitance here:
Situation:
I have an obscure DirectX 9 application (name and application details are irrelevant to the question) that causes blue screen of death on all nvidia cards (GeForce 8400GS and up) since certain driver version. I believe that the problem is indirectly caused by DirectX 9 call or a flag that triggers driver bug.
Goal:
I'd like to track down offending flag/function call (for fun, this isn't my job/homework) and bypass error condition by writing proxy dll. I already have a finished proxy dll that provides wrappers for IDirect3D9, IDirect3DDevice9, IDirect3DVertexBuffer9 and IDirect3DIndexBuffer9 and provides basic logging/tracing of Direct3D calls. However, I can't pinpoint function which causes crash.
Problems:
- No source code or technical support is available. There will be no assitance, and nobody else will fix the problem.
- Memory dump produced by kernel wasn't helpful - apparently an access violation happens within nv4_disp.dll, but I can't use stacktrace to go to IDirect3DDevice9 method call, plus there's a chance that bug happens asynchronously.
- (Main problem) Because of large number of Direct3D9Device method calls, I can't reliably log them into file or over network:
- Logging into file causes significant slowdown even without flushing, and because of that all last contents of the log are lost when system BSODs.
- Logging over network (using UDP and WINSOck's
sendto
)also causes significant slowdown and must not be done asynchronously (asynchronous packets are lost on BSOD), plus packets (the ones around the crash) are sometimes lost even when sent synchronously. - When application is "slowed" down by logging routines, BSOD is less likely to happen, which makes tracking it down harder.
Question:
I normally don't write drivers, and don't do this level of debugging, so I have impression that I'm missing something important there's a more trivial way to track down the problem than writing IDirect3DDevice9 proxy dll with custom logging mechanism. What is it? What is the standard way of diagnosing/handling/fixing problem like this (no source code, COM interface method triggers BSOD)?
Minidump analysis(WinDBG):
Loading User Symbols Loading unloaded module list ........... Unable to load image nv4_disp.dll, Win32 error 0n2 *** WARNING: Unable to verify timestamp for nv4_disp.dll *** ERROR: Module load completed but symbols could not be loaded for nv4_disp.dll ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* Use !analyze -v to get detailed debugging information. BugCheck 1000008E, {c0000005, bd0a2fd0, b0562b40, 0} Probably caused by : nv4_disp.dll ( nv4_disp+90fd0 ) Followup: MachineOwner --------- 0: kd> !analyze -v ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* KERNEL_MODE_EXCEPTION_NOT_HANDLED_M (1000008e) This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address. Some common problems are exception code 0x80000003. This means a hard coded breakpoint or assertion was hit, but this system was booted /NODEBUG. This is not supposed to happen as developers should never have hardcoded breakpoints in retail code, but ... If this happens, make sure a debugger gets connected, and the system is booted /DEBUG. This will let us see why this breakpoint is happening. Arguments: Arg1: c0000005, The exception code that was not handled Arg2: bd0a2fd0, The address that the exception occurred at Arg3: b0562b40, Trap Frame Arg4: 00000000 Debugging Details: ------------------ EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s". FAULTING_IP: nv4_disp+90fd0 bd0a2fd0 39b8f8000000 cmp dword ptr [eax+0F8h],edi TRAP_FRAME: b0562b40 -- (.trap 0xffffffffb0562b40) ErrCode = 00000000 eax=00000808 ebx=e37f8200 ecx=e4ae1c68 edx=e37f8328 esi=e37f8400 edi=00000000 eip=bd0a2fd0 esp=b0562bb4 ebp=e37e09c0 iopl=0 nv up ei pl nz na po nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010202 nv4_disp+0x90fd0: bd0a2fd0 39b8f8000000 cmp dword ptr [eax+0F8h],edi ds:0023:00000900=???????? Resetting default scope CUSTOMER_CRASH_COUNT: 3 DEFAULT_BUCKET_ID: DRIVER_FAULT BUGCHECK_STR: 0x8E LAST_CONTROL_TRANSFER: from bd0a2e33 to bd0a2fd0 STACK_TEXT: WARNING: Stack unwind information not available. Following frames may be wrong. b0562bc4 bd0a2e33 e37f8200 e37f8200 e4ae1c68 nv4_disp+0x90fd0 b0562c3c bf8edd6b b0562cfc e2601714 e4ae1c58 nv4_disp+0x90e33 b0562c74 bd009530 b0562cfc bf8ede06 e2601714 win32k!WatchdogDdDestroySurface+0x38 b0562d30 bd00b3a4 e2601008 e4ae1c58 b0562d50 dxg!vDdDisableSurfaceObject+0x294 b0562d54 8054161c e2601008 00000001 0012c518 dxg!DxDdDestroySurface+0x42 b0562d54 7c90e4f4 e2601008 00000001 0012c518 nt!KiFastCallEntry+0xfc 0012c518 00000000 00000000 00000000 00000000 0x7c90e4f4 STACK_COMMAND: kb FOLLOWUP_IP: nv4_disp+90fd0 bd0a2fd0 39b8f8000000 cmp dword ptr [eax+0F8h],edi SYMBOL_STACK_INDEX: 0 SYMBOL_NAME: nv4_disp+90fd0 FOLLOWUP_NAME: MachineOwner MODULE_NAME: nv4_disp IMAGE_NAME: nv4_disp.dll DEBUG_FLR_IMAGE_TIMESTAMP: 4e390d56 FAILURE_BUCKET_ID: 0x8E_nv4_disp+90fd0 BUCKET_ID: 0x8E_nv4_disp+90fd0 Followup: MachineOwner