Serializable, cloneable and memory use in Java
Asked Answered
G

2

15

I am using an inner class that is a subclass of a HashMap. I have a String as the key and double[] as the values. I store about 200 doubles per double[]. I should be using around 700 MB to store the keys, the pointers and the doubles. However, memory analysis reveals that I need a lot more than that (a little over 2 GB).

Using TIJmp (profiling tool) I saw there was a char[] that was using almost half of the total memory. TIJmp said that char[] came from Serializable and Cloneable. The values in it ranged from a list of fonts and default paths to messages and single characters.

What is the exact behavior of Serializable in the JVM? Is it keeping a "persistent" copy at all times thus, doubling the size of my memory footprint? How can I write binary copies of an object at runtime without turning the JVM into a memory hog?

PS: The method where the memory consumption increases the most is the one below. The file has around 229,000 lines and 202 fields per line.

public void readThetas(String filename) throws Exception
{
    long t1 = System.currentTimeMillis();
    documents = new HashMapX<String,double[]>(); //Document names to indices.
    Scanner s = new Scanner(new File(filename));
    int docIndex = 0;
    if (s.hasNextLine())
        System.out.println(s.nextLine()); // Consume useless first line :)
    while(s.hasNextLine())
    {
        String[] fields = s.nextLine().split("\\s+");
        String docName = fields[1];
        numTopics = fields.length/2-1;
        double[] thetas = new double[numTopics];
        for (int i=2;i<numTopics;i=i+2)
            thetas[Integer.valueOf(fields[i].trim())] = Double.valueOf(fields[i+1].trim());
        documents.put(docName,thetas);
        docIndex++;
        if (docIndex%10000==0)
            System.out.print("*"); //progress bar ;)
    }
    s.close();
    long t2 = System.currentTimeMillis();
    System.out.println("\nRead file in "+ (t2-t1) +" ms");
}

Oh!, and HashMapX is an inner class declared like this:

public static class HashMapX< K, V> extends HashMap<K,V> {
    public V get(Object key, V altVal) {
        if (this.containsKey(key))
            return this.get(key);
        else
            return altVal;
    }
}
Gothard answered 19/4, 2011 at 17:37 Comment(4)
Can you show some code samples?Arsenate
Please post the tests that show that Serializable increases memory footprint. If you could post the code that shows much RAM your Map<String,double[]> is using that would help too.Lifer
Let me see if I understand your statement up there. You are saying that by declaring a class Serializable the size occupied by instances of it is bigger than if it were transient?Asyndeton
Ok. So, I couldn't reproduce my earlier results about implementing the Serializable interface, so I will cross out that part of the post. It may have been JVM mixup with tijmp. However, it still stands that for some reason tjimp still reports char[] coming from Serializable and Cloneable that grow without control. This happens even after I got rid of the HashMap altogether and used double[][] and String[] objects. My class is not implementing Serializable per se, but tijmp reports that something is, and is using a lot of memory. Any help is appreciated.Gothard
G
4

So, I found the answer. It is a memory leak in my code. Had nothing to do with Serializable or Cloneable.

This code is trying to parse a file. Each line contains a set of values which I am trying to extract. Then, I keep some of those values and store them in a HashMapX or some other structure.

The core of the problem is here:

        String[] fields = s.nextLine().split("\\s+");
        String docName = fields[1];

and I propagate it here:

        documents.put(docName,thetas);

What happens is that docName is a reference to an element in an array (fields) and I am keeping that reference for the life of the program (by storing it in the global HashMap documents). As long as I keep that reference alive, the whole String[] fields cannot be garbage collected. The solution:

        String docName = new String(fields[1]); // A copy, not a reference.

Thus copying the object and releasing the reference to the array element. In this way, the garbage collector can free the memory used by the array once I process every field.

I hope this will be useful to all of those who parse large text files using split and store some of the fields in global variables.

Thanks everybody for their comments. They guided me in the right direction.

Gothard answered 21/4, 2011 at 15:12 Comment(0)
B
5

This may not address all of your questions, but is a way in which serialization can significantly increase memory usage: http://java.sun.com/javase/technologies/core/basic/serializationFAQ.jsp#OutOfMemoryError.

In short, if you keep an ObjectOutputStream open then none of the objects that have been written to it can be garbage-collected unless you explicitly call its reset() method.

Blouin answered 19/4, 2011 at 17:54 Comment(1)
This is a good lead, provided that the objects in question are being actually serialized, because the original post only suggests memory increase by making the classes serializable and the developer performed tests on non-serializable dummy classes and determined memory footprint was smaller (we yet don't know how this evaluation was carried out though), but if this is case, then the root case should be something else. Honestly, I am too inclined to believe that your explanation is the most logical one so far.Asyndeton
G
4

So, I found the answer. It is a memory leak in my code. Had nothing to do with Serializable or Cloneable.

This code is trying to parse a file. Each line contains a set of values which I am trying to extract. Then, I keep some of those values and store them in a HashMapX or some other structure.

The core of the problem is here:

        String[] fields = s.nextLine().split("\\s+");
        String docName = fields[1];

and I propagate it here:

        documents.put(docName,thetas);

What happens is that docName is a reference to an element in an array (fields) and I am keeping that reference for the life of the program (by storing it in the global HashMap documents). As long as I keep that reference alive, the whole String[] fields cannot be garbage collected. The solution:

        String docName = new String(fields[1]); // A copy, not a reference.

Thus copying the object and releasing the reference to the array element. In this way, the garbage collector can free the memory used by the array once I process every field.

I hope this will be useful to all of those who parse large text files using split and store some of the fields in global variables.

Thanks everybody for their comments. They guided me in the right direction.

Gothard answered 21/4, 2011 at 15:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.