How can I guarantee catching a EXCEPTION_STACK_OVERFLOW structured exception in C++ under Visual Studio 2005?
Asked Answered
E

5

6

Background

  • I have an application with a Poof-Crash[1]. I'm fairly certain it is due to a blown stack.
  • The application is Multi-Threaded.
  • I am compiling with "Enable C++ Exceptions: Yes With SEH Exceptions (/EHa)".
  • I have written an SE Translator function and called _set_se_translator() with it.
  • I have written functions for and setup set_terminate() and set_unexpected().
  • To get the Stack Overflow, I must run in release mode, under heavy load, for several days. Running under a debugger is not an option as the application can't perform fast enough to achieve the runtime necessary to see the issue.
  • I can simulate the issue by adding infinite recursion on execution of one of the functions, and thus test the catching of the EXCEPTION_STACK_OVERFLOW exception.
  • I have WinDBG setup as the crash dump program, and get good information for all other crash issues but not this one. The crash dump will only contain one thread, which is 'Sleep()'ing. All other threads have exited.

The Question

None of the things I've tried has resulted in picking up the EXCEPTION_STACK_OVERFLOW exception.

Does anyone know how to guarantee getting a a chance at this exception during runtime in release mode?

Definitions

  1. Poof-Crash: The application crashes by going "poof" and disappearing without a trace.

(Considering the name of this site, I'm kind of surprised this question isn't on here already!)

Notes

  1. An answer was posted briefly about adjusting the stack size to potentially force the issue sooner and allow catching it with a debugger. That is a clever thought, but unfortunately, I don't believe it would help. The issue is likely caused by a corner case leading to infinite recursion. Shortening the stack would not expose the issue any sooner and would likely cause an unrelated crash in validly deep code. Nice idea though, and thanks for posting it, even if you did remove it.
Edinburgh answered 12/1, 2009 at 19:28 Comment(0)
U
5

Everything prior to windows xp would not (or would be harder) generally be able to trap stack overflows. With the advent of xp, you can set vectored exception handler that gets a chance at stack overflow prior to any stack-based (structured exception) handlers (this is being the very reason - structured exception handlers are stack-based).

But there's really not much you can do even if you're able to trap such an exception.

In his blog, cbrumme (sorry, do not have his/her real name) discusses a stack page neighboring the guard page (the one, that generates the stack overflow) that can potentially be used for backout. If you can squeeze your backout code to use just one stack page - you can free as much as your logic allows. Otherwise, the application is pretty much dead upon encountering stack overflow. The only other reasonable thing to do, having trapped it, is to write a dump file for later debugging.

Hope, it helps.

Unplaced answered 13/1, 2009 at 12:57 Comment(2)
I'm not so worried about the application crashing, so long as I can figure out where it is going off the rails. I'll look into the Vectored execption handler. Good detail. Thanks! =DEdinburgh
My problem turns out to not be a Stack Overflow, but this answer answers the question I asked, about catching a Stack Overflow. Thanks for the great info!Edinburgh
A
4

I'm not convinced that you're on the right track in diagnosing this as a stack overflow.

But in any case, the fact that you're getting a poof!, plus what you're seeing in WinDbg

The crash dump will only contain one thread, which is 'Sleep()'ing. All other threads have exited.

suggests to me that somebody has called the C RTL exit() function, or possibly called the Windows API TerminateProcess() directly. That could have something to do with your interrupt handlers or not. Maybe something in the exception handling logic has a re-entrance check and arbitrarily decides to exit() if it's reentered.

My suggestion is to patch your executables to put maybe an INT 3 debug at the entry point to exit (), if it's statically linked, or if it's dynamically linked, patch up the import and also patch up any imports of kernel32::TerminateProcess to throw a DebugBreak() instead.

Of course, exit() and/or TerminateProcess() may be called on a normal shutdown, too, so you'll have to filter out the false alarms, but if you can get the call stack for the case where it's just about to go proof, you should have what you need.

EDIT ADD: Just simply writing your own version of exit() and linking it in instead of the CRTL version might do the trick.

Afresh answered 14/1, 2009 at 0:3 Comment(2)
Thank you, that's very insightful. We are staticly linked, so I will attempt the patched executable or linked in custom exit() you are suggesting. :)Edinburgh
You are correct. It turns out to not be an issue of a Stack Overflow. I'm upvoting your answer because it answers the conditions and addresses my mis-analysis of the issue. Thanks for the really useful info and the keen observation of the true issue.Edinburgh
A
1

I remember code from a previous workplace that sounded similar having explicit bounds checks on the stack pointer and throwing an exception manually.

It's been a while since I've touched C++ though, and even when I did touch it I didn't know what I was doing, so caveat implementor about portability/reliability of said advice.

Aachen answered 12/1, 2009 at 20:27 Comment(0)
H
1

Have you considered ADPlus from Debugging Tools for Windows?

ADPlus attaches the CDB debugger to a process in "crash" mode and will generate crash dumps for most exceptions the process generates. Basically, you run "ADPlus -crash -p yourPIDhere", it performs an invasive attach and begins logging.

Given your comment above about running under a debugger, I just wanted to add that CDB adds virtually zero overhead in -crash mode on a decent (dual-core, 2GB RAM) machine, so don't let that hold you back from trying it.

Hellgrammite answered 12/1, 2009 at 23:30 Comment(7)
Can you make your answer a bit more specific with regard to how it will solve the issue stated?Edinburgh
ADPlus attaches the CDB debugger to a process in "crash" mode and will generate crash dumps for most exceptions the process generates. Basically, you run "ADPlus -crash -p yourPIDhere", it performs an invasive attach and begins logging.Hellgrammite
Quick sample from the log file: Stack_buffer_overflow [sbo] return: GN GN 1st chance: Log;Time;Stack;MiniDump 2nd chance: Log;Time;Stack;FullDump;EventLog Starting to attach the debugger to each process Attaching to 3248 - RECURSIONTEST.EXEHellgrammite
Good luck. Given your comment above about running under a debugger, I just wanted to add that CDB adds virtually zero overhead in -crash mode on a decent (dual-core, 2GB RAM) machine, so don't let that hold you back from trying it.Hellgrammite
Good to know. These are Xeon 5150 Quad Core boxes with 8 Gig of RAM, so I should be alright. =DEdinburgh
Can you edit your answer to include what you've written in the comments so it's easier to see?Vernettaverneuil
@Kirkus, I wanted to let you know that I ended up using ADPlus to track down this issue, but not in exactly the way you mentioned. I ended up writing a custom config which logged the stack at each thread exit (our thread count is fairly static, so we can get away with that). This allowed us to pinpoint precisely where the problem was occurring and fix it within 24 hours of detection. Previous to the script, we had spend at least a month trying to figure it out. =DEdinburgh
F
0

You can generate debugging symbols without disabling optimizations. In fact, you should be doing that anyways. It just makes debugging harder.

And the documentation for _set_se_translator says that each thread has its own SE translator. Are you setting one for each thread?

set_unexpected is probably a no-op, at least according to the VS 2005 documentation. And each thread also has its own terminate handler, so you should install that per thread as well.

I would also strongly recommend NOT using SE translation. It takes hardware exceptions that you shouldn't ignore (i.e., you should really log an error and terminate) and turns them into something you can ignore (C++ exceptions). If you want to catch this kind of error, use a __try/__except handler.

Fluky answered 12/1, 2009 at 20:34 Comment(5)
I do build symbols in release mode, it is not a symbol issue. I'd be happy with an address only callstack I have to decode by hand. I'm getting nothing though. I realize that set_unexpected is a no-op, but wanted to avoid shallow answers that might include it. [continued]Edinburgh
__try/__except is not really an issue, due to the magnitude of legacy code. I don't plan to ignore the issue, simply forward them. However, the info on per thread settings is good. I'll explore that avenue and upvote/check if it turns out to be the issue. Thanks for the good info! =DEdinburgh
You could also use SetUnhandledExceptionFilter or AddVectoredExceptionHandler to catch all unhandled or any exceptions in the process, respectively. MSNFluky
I do have a SetUnhandledExceptionFilter setup, which is where I will be sending the unhandled exceptions. While Threading Might be an issue for my configuration, in my testing at least, it does not seem to make a difference. (I'm causing an exception before creating additional Threads).Edinburgh
You might also be triggering a pure virtual function call, so you might want to set the purecall handler via _set_purecall_handler. And to be extra pedantic, add an exit handler via atexit. It's still possible that someone is calling _exit(...) directly. MSNFluky

© 2022 - 2024 — McMap. All rights reserved.