How are SoftReferences collected by JVMs in practice?
Asked Answered
M

5

17

I have two separate caches running in a JVM (one controlled by a third party library) each using soft references. I would prefer for the JVM to clear out my controlled cache before the one controlled by the library. The SoftReference javadoc states:

All soft references to softly-reachable objects are guaranteed to have been cleared before the virtual machine throws an OutOfMemoryError. Otherwise no constraints are placed upon the time at which a soft reference will be cleared or the order in which a set of such references to different objects will be cleared. Virtual machine implementations are, however, encouraged to bias against clearing recently-created or recently-used soft references.

Direct instances of this class may be used to implement simple caches; this class or derived subclasses may also be used in larger data structures to implement more sophisticated caches. As long as the referent of a soft reference is strongly reachable, that is, is actually in use, the soft reference will not be cleared. Thus a sophisticated cache can, for example, prevent its most recently used entries from being discarded by keeping strong referents to those entries, leaving the remaining entries to be discarded at the discretion of the garbage collector.

How do common JVM implementations, especially HotSpot, handle SoftReferences in practice? Do they "bias against clearing recently-created or recently-used soft references" as encouraged to by the spec?

Mobster answered 29/3, 2012 at 19:47 Comment(5)
relevant? How Hotspot Decides to Clear SoftReferencesPamper
@MiserableVariable Yes, quite relevant, though your link is missing a letter: jeremymanson.blogspot.com/2009/07/…Mobster
It wasn't missing it earlier but I had to act smart and make a link out of it.. Since I refuse to read documentation and want to do it by trial and error some real content gets deleted sometimes. Who care, as long as it looks nice, right?Pamper
If you want to move your comment to answer I'll give you an upvote and some bounty - that's been the most helpful source so far.Mobster
Previously I did not write it in an answer because I was only quoting the blog. But I did learn something from the question and the blog so with after your suggestion I have added an answer.Pamper
D
8

Looks like it could be tuneable, but it isn't. The concurrent mark-sweep collector hangs on the default heap's implementation of must_clear_all_soft_refs() which apparently is only true when performing a _last_ditch_collection.

bool GenCollectedHeap::must_clear_all_soft_refs() {
  return _gc_cause == GCCause::_last_ditch_collection;
}

While normal handling of failed allocation has three successive calls to the heap's do_collect method, in the CollectorPolicy.cpp

HeapWord* GenCollectorPolicy::satisfy_failed_allocation(size_t size,
                                                    bool   is_tlab) {

Which tries to collect, tries to reallocate, tries to expand the heap if that fails, and then as a last-ditch effort, tries to collect clearing soft references.

The comment on the last collection is quite telling (and the only one that triggers clearing soft refs)

  // If we reach this point, we're really out of memory. Try every trick
  // we can to reclaim memory. Force collection of soft references. Force
  // a complete compaction of the heap. Any additional methods for finding
  // free memory should be here, especially if they are expensive. If this
  // attempt fails, an OOM exception will be thrown.
  {
    IntFlagSetting flag_change(MarkSweepAlwaysCompactCount, 1); // Make sure the heap is fully compacted

    gch->do_collection(true             /* full */,
                       true             /* clear_all_soft_refs */,
                       size             /* size */,
                       is_tlab          /* is_tlab */,
                       number_of_generations() - 1 /* max_level */);
  }

--- Edited in response to the obvious, I was describing weak references, not soft ones ---

In practice, I would imagine that SoftReferences are only "not" followed when the JVM is called for garbage collection in response to they attempt to avoid an OutOfMemoryError.

For SoftReferences to be compatible with all four Java 1.4 garbage collectors, and with the new G1 collector, the decision must lie only with the reachability determination. By the time that reaping and compacting occur, it is far too late to decide if an object is reachable. This suggests (but does not require) that a collection "context" exists which determines reachability based on free memory availability in the heap. Such a context would have to indicate not following SoftReferences prior to attempting to follow them.

Since OutOfMemoryError avoidance garbage collection is specially scheduled in a full-collection, stop-the-world manner, it would not be a hard to imagine scenario where the heap manager sets a "don't follow SoftReference" flag before the collection occurs.

--- Ok, so I decided that a "must work this way" answer just wasn't good enough ---

From the source code src/share/vm/gc_implementation/concurrentMarkSweep/vmCMSOperations.cpp (highlights are mine)

The operation to actually "do" garbage collection:

  170 void VM_GenCollectFullConcurrent::doit() {

We better be a VM thread, otherwise a "program" thread is garbage collecting!

  171   assert(Thread::current()->is_VM_thread(), "Should be VM thread");

We are a concurrent collector, so we better be scheduled concurrently!

  172   assert(GCLockerInvokesConcurrent || ExplicitGCInvokesConcurrent, "Unexpected");
  173 

Grab the heap (which has the GCCause object in it).

  174   GenCollectedHeap* gch = GenCollectedHeap::heap();

Check to see if we need a foreground "young" collection

  175   if (_gc_count_before == gch->total_collections()) {
  176     // The "full" of do_full_collection call below "forces"
  177     // a collection; the second arg, 0, below ensures that
  178     // only the young gen is collected. XXX In the future,
  179     // we'll probably need to have something in this interface
  180     // to say do this only if we are sure we will not bail
  181     // out to a full collection in this attempt, but that's
  182     // for the future.

Are the program threads not meddling with the heap?

  183     assert(SafepointSynchronize::is_at_safepoint(),
  184       "We can only be executing this arm of if at a safepoint");

Fetch the garbage collection cause (the reason for this collection) from the heap.

  185     GCCauseSetter gccs(gch, _gc_cause);

Do a full collection of the young space

Note that his passes in the value of the heap's must_clear_all_soft_refs flag Which in an OutOfMemory scenario must have been set to true, and in either case directs the "do_full_collection" to no follow the soft references

  186     gch->do_full_collection(gch->must_clear_all_soft_refs(),
  187                             0 /* collect only youngest gen */);

The _gc_cause is an enum, which is (guesswork here) set to _allocation_failure in the first attempt at avoiding OutOfMemoryError and _last_ditch_collection after that fails (to attempt to collect transient garbage)

A quick look in the memory "heap" module shows that in do_full_collection which calls do_collection soft references are cleared explicitly (under the "right" conditions) with the line

  480   ClearedAllSoftRefs casr(do_clear_all_soft_refs, collector_policy());

--- Original post follows for those who want to learn about weak references ---

In the Mark and Sweep algorithm, Soft references are not followed from the Main thread (and thus not marked unless a different branch could reach it through non-soft references.

In the copy algorithm, Objects soft references point to are not copied (again unless they are reached by a different non-soft reference).

Basically, when following the web of references from the "main" thread of execution, soft references are not followed. This allows their objects to be garbage collected just as if they didn't have references pointing to them.

It is important to mention that soft references are almost never used in isolation. They are typically used in objects where the design is to have multiple references to the object, but only one reference need be cleared to trigger garbage collection (for ease of maintaining the container, or run time performance of not needing to look up expensive references).

Discrimination answered 29/3, 2012 at 19:54 Comment(5)
It sounds like you are describing weak references rather than soft references.Mobster
@DaveL. That's enough for me tonight. If you need more specific answers, they are probably rather close by in the JVM code reference above.Discrimination
Edwin, thanks for the research. If I understand correctly, this indicates that it will clear all soft references in an OutOfMemory scenario, which certainly makes sense. The question then is will it still clear some even if that flag is not sense. The HotSpot FAQ and Jerermy Manson's blog indicate an algorithm for that happening too.Mobster
@DaveL. I think the algorithm was planned, but it wasn't in this hotspot branch (at least it wasn't in it nine months ago). The collecors tend to rely on the heap's must_clear_all_soft_references() method, which only returns true on a _last_ditch_collection, which is typically only performed along the path to an OutOfMemoryError. There is no reason why this is the only way it is done, and a lot of framework indicates that it could be tuned. Perhaps they just couldn't find a fast clean way to fire the trigger. It would require double-tracking of memory stats (soft and not).Discrimination
While this might help to understand what happens, please note that this is just an implementation - it may change at will. If you want to play it safe, stick to the specification in the Javadoc for SoftReferences.Teacup
M
4

Found one piece of information in a HotSpot FAQ, that may be outdated: http://www.oracle.com/technetwork/java/hotspotfaq-138619.html#gc_softrefs

What determines when softly referenced objects are flushed?

Starting with 1.3.1, softly reachable objects will remain alive for some amount of time after the last time they were referenced. The default value is one second of lifetime per free megabyte in the heap. This value can be adjusted using the -XX:SoftRefLRUPolicyMSPerMB flag, which accepts integer values representing milliseconds. For example, to change the value from one second to 2.5 seconds, use this flag:

-XX:SoftRefLRUPolicyMSPerMB=2500

The Java HotSpot Server VM uses the maximum possible heap size (as set with the -Xmx option) to calculate free space remaining.

The Java Hotspot Client VM uses the current heap size to calculate the free space.

This means that the general tendency is for the Server VM to grow the heap rather than flush soft references, and -Xmx therefore has a significant effect on when soft references are garbage collected.

On the other hand, the Client VM will have a greater tendency to flush soft references rather than grow the heap.

The behavior described above is true for 1.3.1 through Java SE 6 versions of the Java HotSpot VMs. This behavior is not part of the VM specification, however, and is subject to change in future releases. Likewise the -XX:SoftRefLRUPolicyMSPerMB flag is not guaranteed to be present in any given release.

Prior to version 1.3.1, the Java HotSpot VMs cleared soft references whenever it found them.

Even more detail is available at: http://jeremymanson.blogspot.com/2009/07/how-hotspot-decides-to-clear_07.html (courtesy of MiserableVariable's comment)

Mobster answered 3/4, 2012 at 21:59 Comment(0)
T
3

Whatever the answer is, relying on a particular strategy would make your software unreliable because every JVM implementation may be different. Even for a given JVM, configuring it differently may alter the exact strategy and break your software. In short summary, it is an error to rely on a particular strategy.

What type of resource is your cache managing? If its a pure heap allocated object, then the strategy should not matter. Using a ReferenceQueue might help to get you notified when a SoftReference gets cleared, though.

If the resource type is not only a heap allocated object, then you must require your users to call an explicit release method, i.e. Closeable.close(). In order to protect against "forgotten" calls to this release method, you may consider implementing a finalize() method, but beware of its side effects. For more information about this, I recommend to read "Item 7: Avoid finalizers" from Joshua Bloch's "Effective Java (2nd Edition)".

Teacup answered 29/3, 2012 at 20:19 Comment(1)
Still, sometimes it is helpful to understand how your JVM is implemented and how it behaves.Mobster
P
2

Not that this is authoritative but using SoftReference in anger I have never seen VM to flush them instead of increasing VM size. Actually I somehow assumed that to be the case and the design very much depended on that. I did have same -ms and -mx but that should not matter.

But I cannot find any spec that actually says this is required. This blog seems to go into great detail of how SoftReferences are flushed. From a quick read it indeed seems like they can be cleared even if other memory is available.

Pamper answered 4/4, 2012 at 16:18 Comment(0)
P
0

Just brainstorming. If you want your cache to get cleared before the other cache, maybe you can link the two? Perhaps by keeping a strong reference to the entries in the second cache and only releasing those references when members of your own cache are cleared?

Seems convoluted. I would probably lean toward simply accepting that both caches are exactly that, a cache. Cache misses might be painful to performance, but at least your software won't have a convoluted cache management strategy.

Pitts answered 3/4, 2012 at 22:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.