Very strange OutOfMemoryError
Asked Answered
C

6

12

As always, a lengthy problem description.

We are currently stress testing our product - and we now we face a strange problem. After one to two hours, heap space begins to grow, the application dies sometime later.

Profiling the application shows a very large amount of Finalizer objects, filling the heap. Well, we thought "might be the weird finalizer thread to slow" issue and reviewed for reducing the amount of objects that need to be finalized (JNA native handles in this case). Good idea anyway and reduced thousands of new objects...

The next tests showed the same pattern, only one hour later and not so steep. This time the Finalizers originated from the FileInput- and FileOutput streams that are heavily used in the testbed. All resources are closed, but the Finalizers not cleaned up anymore.

I have no idea why after 1 or 2 hours (no exceptions) the FinalizerThread seems suddenly to stop working. If we force System.runFinalization() by hand in some of our threads, the profiler shows that the finalizers are cleaned up. Resuming the test immediately causes new heap allocation for Finalizers.

The FinalizerThread is still there, asking jConsole he's WAITING.

EDIT

First, inspecting the heap with HeapAnalyzer revealed nothing new/strange. HeapAnalyzer has some nice features, but i had my difficulties at first. Im using jProfiler, which comes along with nice heap inspection tools and will stay with it.

Maybe i'm missing some killer features in HeapAnalyzer?

Second, today we set up the tests with a debug connection instead of the profiler - the system is stable for nearly 5 hours now. This seems to be a very strange combination of too much Finalizers (that have been reduced in the first review), the profiler and the VM GC strategies. As everything runs fine at the moment, no real insights...

Thanks for the input so far - maybe you stay tuned and interested (now that you may have more reason to believe that we do not talk over a simple programming fault).

Cristalcristate answered 3/5, 2012 at 17:8 Comment(10)
Might want to give java.sun.com/developer/technicalArticles/javase/finalization a read.Fleta
Some ideas: #8355564Mainland
Finalizers are really not dependable. Avoid them.Graehl
@Fleta I did so, thanks. I think (thought) i'm fluent enough with GC details (until now). And here it's not even that i am implementing "finalize" - we talk about file streams...Cristalcristate
@Louis I feared someone would say this :-) I will forward it to oracle...Cristalcristate
Effective Java item 7 discusses some techniques for avoiding finalizers, and why you should avoid finalizers.Graehl
Have to ask - what platform you're running on and what Java version?Cephalothorax
@Louis - again, i do not get how this could be of any help. read the question.Cristalcristate
@matt not yet. we are monitoring the vm with a profiler and jconsole, so i think we have a good view on the activities. What could i learn from this log?Cristalcristate
@Cephalothorax The test is running on Windows 7 Professional, both 32 and 64 bit fail the same. Java version is 1.6.21Cristalcristate
C
3

I want to close this question with a summary of the current state.

The last test is now up over 60 hours without any problems. That leads us to the following summary/conclusions:

  • We have a high throughput server using lots of objects that in the end implement "finalize". These objects are mostly JNA memory handles and file streams. Building the Finalizers faster than GC and finalizer thread are able to clean up, this process fails after ~3 hours. This is a well known phenomenon (-> google).
  • We did some optimizations so the server got rid of nearly all the JNA Finalizers. This version was tested with jProfiler attached.
  • The server died some hours later than our initial attempt...
  • The profiler showed a huge amount of finalizers, this time caused mostly only by file streams. This queue was not cleaned up, even after pausing the server for some time.
  • Only after manually triggering "System.runFinalization()", the queue was emptied. Resuming the server started to refill...
  • This is still inexplicable. We now guess this is some profiler interaction with GC/finalization.
  • To debug what could be the reason for the inactive finalizer thread we detached the profiler and attached the debugger this time.
  • The system was running without noticeable defect... FinalizerThread and GC all "green".
  • We resumed the test (now for the first time again without any agents besides jConsole attached) and its up and fine now for over 60 hours. So apparently the initial JNA refactoring solved the issue, only the profiling session added some indeterminism (greetings from Heisenberg).

Other strategies for managing the finalizers are for example discussed in http://cleversoft.wordpress.com/2011/05/14/out-of-memory-exception-from-finalizer-object-overflow/ (besides the not overly clever "don't use finalizers"..).

Thank's for all your input.

Cristalcristate answered 7/5, 2012 at 10:57 Comment(0)
H
1

Difficult to give a specific answer to your dilemma but take a heap dump and run it through IBM's HeapAnalyzer. Search for "heap analyzer at: http://www.ibm.com/developerworks (direct link keeps changing). Seems highly unlikely the finalizer thread "suddenly stops working" if you are not overriding finalize.

Hellen answered 3/5, 2012 at 17:23 Comment(5)
"If you eliminate the impossible, whatever remains, however improbable, must be the truth"... what conclusion should i find?Cristalcristate
Are you saying "I don't know how to use HeapAnalyzer." ?Hellen
No - i'm just desperate.. And when you say "it's unlikely.." that reminded me of the great Spock citation :-) I will give it a try tomorrow..Cristalcristate
Agree - I think in the end you will discover there is a bug in your product. I'm just suggesting that you look at objects on the heap to help you identify the code that is allocating the objects that are not being cleaned up. HeapAnalyzer is a great tool to identify suspected leaks.Hellen
I really don't think so, but i will give it a try. Remember, the queue is emptied when triggered by hand. There's no leak in the sense that some object is referenced by some obscure path. Neither code inspection nor the profiler shows other GC roots but the Finalizer...Cristalcristate
S
1

It is possible for the Finalizer to be blocked, but I don't know how it could simply die.

If you have lots of FileInputStream and FileOutputStream finalize() methods, this indicates you are not closing your files correctly. Make sure that these stream are always closed in a finally block or use Java 7's ARM. (Automatic Resource Management)

jConsole he's WAITING.

To be WAITING it has to be waiting on an object.

Stravinsky answered 3/5, 2012 at 17:25 Comment(5)
This seems obvious, but i can assure you that it is not the case. The testbed uses mature tool methods, all of them closing in finally blocks. The test runs absolutely without leak for hours before degenerating. In addition, i don't get the statement "if you have ... finalize() methods" right. I do not have finalize methods - the streams do have.. and the Finalizers created for them are not cleaned up.. and when in the profiler, the descriptor references from the stream are all reset to -1 (so they are closed)Cristalcristate
I mean if these classes finalize methods are showing up as significant. From what you suggest, if they are showing up its because something else is delaying the queue or the finalizer thread is no longer the problem.Stravinsky
If its waiting for the queue, that suggests its empty.Stravinsky
This is what one should assume. But, when - There is nothing in the queue - There are a lot of Finalizers - They reference otherwise unreferenced objects who is to blame?Cristalcristate
Objects referenced by objects which need to be finalized first, can only be cleaned up on the next GC cycle.Stravinsky
C
1

Both FileInputStream and FileOutputStream have same comment in their finalize() methods:

. . .
/*
 * Finalizer should not release the FileDescriptor if another
 * stream is still using it. If the user directly invokes
 * close() then the FileDescriptor is also released.
 */
     runningFinalize.set(Boolean.TRUE); 
. . . 

which means your Finalizer may be waiting for stream to be released. Which means that, as Joop Eggen mentioned above, your app may be doing something bad when closing one of the streams.

Cephalothorax answered 3/5, 2012 at 18:24 Comment(2)
this is not the case when i look in my JDK 1.6.21. What version do you use?Cristalcristate
I will revisit this once again. But i don't see how this may explain the starvation of the finalization. Remember, call the runFinalization by hand cleans up the queue.Cristalcristate
H
0

My guess: it is an overriden close in your own stream (wrapper) classes. As the stream classes often are wrappers and do delegate to others, I could imagine that such a nested new A(new B(new C())) might cause some wrong logic on close. You should look for twice closing, delegate closing. And maybe still some forgotten close (close on the wrong object?).

Halliehallman answered 3/5, 2012 at 17:32 Comment(1)
Weird, maybe something totally different, like multiple accesses/deletes/creates to the same file name? That might cause some instability too, especially under Windows.Halliehallman
O
0

With a slow growing heap the Java garbage collector can run out of memory when it tries to belatedly garbage collect in a low memory situation. Try turning on the concurrent mark and sweep garbage collection with -XX:+UseConcMarkSweepGC and see if your problem goes away.

Obsolescent answered 3/5, 2012 at 17:35 Comment(2)
The heap is very active - not slow growing. There are a lot of garbage collects while running. Only after an hour the Finalizers are not collected anymore.Cristalcristate
Even in an active heap Java will often defer the garbage collection of some objects, which may account for the problems with your finalizer code.Obsolescent

© 2022 - 2024 — McMap. All rights reserved.