Deep graph results in stack overflow: non-recursive serialization options?
Asked Answered
D

4

6

We're getting StackOverflowErrors from Java's serialization library. The problem is that the default serialization implementation is recursive, the depth of which is bounded only by the longest path through a network of references.

We realize we could override the default methods but we have hundreds of richly-connected classes in our project so we're not enthusiastic about the override approach. We're more interested if there is a generalized solution that is non-recursive (or at least moves the recursion from the stack to the heap).

I googled this topic and found only lots of people bitterly complaining about the same thing but most of these complaints were from many years ago. Has the situation improved? If not and we write a generalized implementation, do you have any advice? We're presuming there is some reason (not yet obvious to us) why no one has cracked this nut. In theory, doing it 'right' sounds like it should be feasible.

Davao answered 16/9, 2011 at 1:42 Comment(5)
I wonder what the serializer will do with circular references? My guess is infinite loop leading to stack overflow.Wirer
It sounds like you have lots and lots of tightly coupled code. Bad juju.Dimmick
Is this really a problem with a path that is "too long" (but of a fixed length) or does it occur because a path is never terminating? The stack can contain quit a number of frames so I suspect it is the latter. If it is the latter, a serializer that knows how to handle cycles must be used (either by pruning or by adding "links" in the serialized data).Hannahhannan
Also, the stack size can be increased (again, only useful if the object graph actually terminates): See How to increase Java stack size? (On a another note, using the standard serializer? That should support cycles ...)Hannahhannan
Java's default serializer supports circular references. Indeed, we successfully serialize moderate-size objects that have circular references. But after a certain size, we do blow out the stack.Davao
P
2

I had this problem a while back. For richly connected classes, even if you are able to complete the serialization without a stack overflow, the serialization is dog slow. When we solved this problem, we had a handful of classes, so we just created our own serialization format that packed the data into a set of integer object ids, with integer fields ids for each field and described their connections through a series of object id, field id, other object id mappings. This custom approach was very fast and extremely light on memory, but really only works if you have a small set of classes you want to serialize.

The general case is much harder and the demand for serialization of richly connected classes is not that strong, so I would guess that's why no ones solved it.

You're basically onto the problem already though, you will always need a stack depth equal to the maximum height of a Depth First Search tree and so any time your graph is deeper than that, you'll get a stack overflow. It is fundamentally a recursive problem, so you're going to either need to use recursion or fake recursion by moving the stack allocations to a Stack object you put on the heap. I would take a look at the OpenJDK implementation:

http://hg.openjdk.java.net/jdk6/jdk6-gate/jdk/file/tip/src/share/classes/java/io/ObjectOutputStream.java

You already have a DebugTraceInfoStack, I would create a second Stack field for the current object you are writing and change the writeObject0 method to push the Object onto the stack, something like this:

stack.push(obj);
while(!stack.empty()) {
    obj = stack.pop();
    ...

Then you just change all calls to writeObject0(x); to stack.push(x);. Simple, standard conversion between recursion and iteration, except the class is almost 2500 lines and there are probably tons of gotchas.

If you do end up building it though, I would advocate submitting at as a patch to the next version of java as it would be useful, something like IterativeObjectOutputStream for use on deep object graphs.

Prospero answered 16/9, 2011 at 5:57 Comment(2)
What you say makes sense and your suggestion on how to fix it sounds like the right approach. Alas, you're surely correct about the "tons of gotchas" too. Thanks for the tip about openjdk - I wasn't aware of that. Curiously, it looks to be identical to Sun's library - at least for the parts I have studied, the implementation is the same. The variables are even named identically!Davao
Yeah, Sun opened sourced the JDK under the OpenJDK project, I think they kept their fork separate for a while (although many of the classes would see no changes), but for Java 7, the official reference implementation is the OpenJDK one.Prospero
L
1

Proof that JDK 6 serialization can handle recursive object graphs:

public static void main(String[] args) throws Exception {
    Foo foo = new Foo("bob");
    foo.setBar(new Bar("fred", foo));
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    ObjectOutputStream out = new ObjectOutputStream(baos);
    out.writeObject(foo);
    out.close();
    ObjectInputStream in = new ObjectInputStream(new ByteArrayInputStream(baos.toByteArray()));
    Object o = in.readObject();
    System.out.println(o);
}

static class Foo implements Serializable {
    String name;
    Bar bar;

    Foo(String name) {
        this.name = name;
    }

    void setBar(Bar bar) {
        this.bar = bar;
    }

    @Override
    public String toString() {
        return "Foo{" +
                "name='" + name + '\'' +
                ", bar=" + bar +
                '}';
    }
}

static class Bar implements Serializable {
    String name;
    Foo foo;

    Bar(String name, Foo foo) {
        this.name = name;
        this.foo = foo;
    }

    @Override
    public String toString() {
        return "Bar{" +
                "name='" + name + '\'' +
                '}';
    }
}

Output:

Foo{name='bob', bar=Bar{name='fred'}}
Laster answered 16/9, 2011 at 2:53 Comment(0)
A
0

It appears I did not read the question very well. You seem to be interested in serializing properties that may contain circular references. If this assumption is incorrect, and you would be fine with NOT serializing these objects which contain circular references, please refer to my original answer below.

New Answer

I think you would need to keep track of which objects have been serialized, and I cannot see this happening, unless you do it yourself. It shouldn't be too difficult though.

On these objects that contain circular references you can keep a transient boolean representing whether or not the object has been serialized already. Then, you must override the default serializing behavior, but this can be done with just a few lines.

public void writeExternal(ObjectOutput out) {
    if(!out.serialized) {
        out.serializeMethod();
    }
    out.serialized = true;
}

Original Answer

Take a look at the transient keyword

I would imagine that most serialization libraries would honor the transient keyword. If a member is transient it is meant to be excluded from serialization.

class Something {
    private Dog dog; // I will be serialized upon serialization.
    private transient SomethingElse somethingElse; // I will not be serialized upon serialization.
}

class SomethingElse {
    private Cat cat; // I will be serialized upon serialization.
    private transient Something something; // I will not be serialized upon serialization.
}

If you have recursive members similar to the above scenario, you would want to mark one or the other (or both) as transient so that this overflow does not happen.

Advertising answered 16/9, 2011 at 1:58 Comment(3)
Q: "How can we serialize our object graphs without recursion?" A: "This way you won't serialize your objects at all!".. well not really a solution that is.Aalesund
@Voo: Nice find. I have updated my answer with a potential solution for the actual problem! :PAdvertising
Java serialization supports circular references. That's not the cause of our stack overflow. (So the transient keyword doesn't help.)Davao
H
0

GWT RPC serialization is basically equivalent to JVM serialization, and both use a stack/recursive technique. Unfortunately this doesn't work well with slicing work into chunks (which you need to do if you're working in the browser, i.e. with GWT), so here's a non-recursive approach: https://github.com/nevella/alcina/blob/d3e37df57709620f7ad54d3d59b997e9c4c7d883/extras/rpc/client/src/com/google/gwt/user/client/rpc/impl/ClientSerializationStreamReader.java

Essentially, convert the serialization into three passes: * instantiate objects * set properties (via linking) * populate collections

Two tricks: some objects need properties on instantiation (e.g. Date) and you need to populate collections last since they may need the hashes of their members.

This allows non-recursive de-serialization - but in fact the non-recursive serialization is even simpler (as long as there's no custom writeReplace/readResolve), just within writeObject maintain two queues of objects-not-serialized, properties-not-serialized-for-current-object and have serialize-object-property push to the stack with a marker rather than make a recursive call.

There's a very basic example of that here: http://www.w3.org/2006/02/Sierra10022006.pdf

Hairworm answered 20/5, 2015 at 6:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.