I had this problem a while back. For richly connected classes, even if you are able to complete the serialization without a stack overflow, the serialization is dog slow. When we solved this problem, we had a handful of classes, so we just created our own serialization format that packed the data into a set of integer object ids, with integer fields ids for each field and described their connections through a series of object id, field id, other object id mappings. This custom approach was very fast and extremely light on memory, but really only works if you have a small set of classes you want to serialize.
The general case is much harder and the demand for serialization of richly connected classes is not that strong, so I would guess that's why no ones solved it.
You're basically onto the problem already though, you will always need a stack depth equal to the maximum height of a Depth First Search tree and so any time your graph is deeper than that, you'll get a stack overflow. It is fundamentally a recursive problem, so you're going to either need to use recursion or fake recursion by moving the stack allocations to a Stack object you put on the heap. I would take a look at the OpenJDK implementation:
http://hg.openjdk.java.net/jdk6/jdk6-gate/jdk/file/tip/src/share/classes/java/io/ObjectOutputStream.java
You already have a DebugTraceInfoStack, I would create a second Stack field for the current object you are writing and change the writeObject0 method to push the Object onto the stack, something like this:
stack.push(obj);
while(!stack.empty()) {
obj = stack.pop();
...
Then you just change all calls to writeObject0(x); to stack.push(x);. Simple, standard conversion between recursion and iteration, except the class is almost 2500 lines and there are probably tons of gotchas.
If you do end up building it though, I would advocate submitting at as a patch to the next version of java as it would be useful, something like IterativeObjectOutputStream for use on deep object graphs.