Gracefully finalizing the SoftReference referent
Asked Answered
G

4

10

I am using a search library which advises keeping search handle object open for this can benefit query cache. Over the time I have observed that the cache tends to get bloated (few hundred megs and keeps growing) and OOMs started to kick in. There is no way to enforce limits of this cache nor plan how much memory it can use. So I have increased the Xmx limit, but that's only a temporary solution to the problem.

Eventually I am thinking to make this object a referent of java.lang.ref.SoftReference. So if the system runs low on free memory, it would let the object go and a new one would be created on demand. This would decrease some speed after fresh start, but this is a much better alternative than hitting OOM.

The only problem I see about SoftReferences is that there is no clean way of getting their referents finalized. In my case, before destroying the search handle I need to close it, otherwise the system might run out of file descriptors. Obviously, I can wrap this handle into another object, write a finalizer on it (or hook onto a ReferenceQueue/PhantomReference) and let go. But hey, every single article in this planet advises against using finalizers, and especially - against finalizers for freeing file handles (e.g. Effective Java ed. II, page 27.).

So I am somewhat puzzled. Should I carefully ignore all these advices and go on. Otherwise, are there any other viable alternatives? Thanks in advance.

EDIT #1: Text below was added after testing some code as suggested by Tom Hawtin. To me, it appears that either suggestion isn't working or I am missing something. Here's the code:

class Bloat {  // just a heap filler really
   private double a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z;

   private final int ii;

   public Bloat(final int ii) {
      this.ii = ii;
   }
}

// as recommended by Tom Hawtin
class MyReference<T> extends SoftReference<T> {
   private final T hardRef;

   MyReference(T referent, ReferenceQueue<? super T> q) {
      super(referent, q);
      this.hardRef = referent;
   }
}

//...meanwhile, somewhere in the neighbouring galaxy...
{
   ReferenceQueue<Bloat> rq = new ReferenceQueue<Bloat>();
   Set<SoftReference<Bloat>> set = new HashSet<SoftReference<Bloat>>();
   int i=0;

   while(i<50000) {
//      set.add(new MyReference<Bloat>(new Bloat(i), rq));
      set.add(new SoftReference<Bloat>(new Bloat(i), rq));

//      MyReference<Bloat> polled = (MyReference<Bloat>) rq.poll();
      SoftReference<Bloat> polled = (SoftReference<Bloat>) rq.poll();

      if (polled != null) {
         Bloat polledBloat = polled.get();
         if (polledBloat == null) {
           System.out.println("is null :(");
         } else {
           System.out.println("is not null!");
         }
      }
      i++;
   }
}

If I run the snippet above with -Xmx10m and SoftReferences (as in code above), I'm getting tons of is null :( printed. But if I replace the code with MyReference (uncommenting two lines with MyReference and commenting out ones with SoftReference) I always get OOM.

As I understood from the advice, having hard reference inside MyReference should not prevent object hitting ReferenceQueue, right?

Glaucoma answered 28/10, 2009 at 17:37 Comment(0)
E
6

Toms answer is the correct one, however the code that has been added to the question is not the same as what was proposed by Tom. What Tom was proposing looks more like this:

class Bloat {  // just a heap filler really
    public Reader res;
    private double a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z;

    private final int ii;

    public Bloat(final int ii, Reader res) {
       this.ii = ii;
       this.res = res;
    }
 }

 // as recommended by Tom Hawtin
 class MySoftBloatReference extends SoftReference<Bloat> {
    public final Reader hardRef;

    MySoftBloatReference(Bloat referent, ReferenceQueue<Bloat> q) {
       super(referent, q);
       this.hardRef = referent.res;
    }
 }

 //...meanwhile, somewhere in the neighbouring galaxy...
 {
    ReferenceQueue<Bloat> rq = new ReferenceQueue<Bloat>();
    Set<SoftReference<Bloat>> set = new HashSet<SoftReference<Bloat>>();
    int i=0;

    while(i<50000) {
        set.add(new MySoftBloatReference(new Bloat(i, new StringReader("test")), rq));

        MySoftBloatReference polled = (MySoftBloatReference) rq.poll();

        if (polled != null) {
            // close the reference that we are holding on to
            try {
                polled.hardRef.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        i++;
    }
}

Note that the big difference is that the hard reference is to the object that needs to be closed. The surrounding object can, and will, be garbage collected, so you won't hit the OOM, however you still get a chance to close the reference. Once you leave the loop, that will also be garbage collected. Of course, in the real world, you probably wouldn't make res a public instance member.

That said, if you are holding open file references, then you run a very real risk of running out of those before you run out of memory. You probably also want to have an LRU cache to ensure that you keep no more than sticks finger in the air 500 open files. These can also be of type MyReference so that they can also be garbage collected if need be.

To clarify a little on how MySoftBloatReference works, the base class, that is SoftReference, still holds the reference to the object that is hogging all of the memory. This is the object that you need to be freed to prevent the OOM from happening. However, If the object is freed, you still need to free the resources that the Bloat is using, that is, Bloat is using two types of resource, memory and a file handle, both of these resources need to be freed, or you run out of one or the other of the resources. The SoftReference handles the pressure on the memory resource by freeing that object, however you also need to release the other resource, the file handle. Because Bloat has already been freed, we can't use it to free the related resource, so MySoftBloatReference keeps a hard reference to the internal resource that needs to be closed. Once it has been informed that the Bloat has been freed, i.e. once the reference turns up in the ReferenceQueue, then MySoftBloatReference can also close the related resource, through the hard reference that it has.

EDIT: Updated the code so that it compiles when thrown into a class. It uses a StringReader to illustrate the concept of how to close the Reader, which is being used to represent the external resource that needs to be freed. In this particular case closing that stream is effectively a no-op, and so is not needed, but it shows how to do so if it is needed.

Electroform answered 3/12, 2009 at 22:11 Comment(3)
Would it be possible to fix your code so it compiles? E.g. MyReference constructor takes Bloat referent argument and is supposed to assign it to hardRef, but hardRef is of a totally different type (ResourceThatMustBeClosed). Also, can you elaborate on why Bloat is still necessary once we've got ResourceThatMustBeClosed? P.S. I wouldn't be so needy if this question had no bonus points attached :PGlaucoma
I have updated the code, and (hopefully) added a clear explanation of how it works? If not, let me know...Electroform
Code is fixed so that it compiles. Just throw it into an empty class, and add the appropriate imports.Electroform
C
8

For a finite number of resources: Subclass SoftReference. The soft reference should point to the enclosing object. A strong reference in the subclass should reference the resource, so it is always strongly reachable. When read through the ReferenceQueue poll the resource can be closed and removed from the cache. The cache needs to be released correctly (if a SoftReference itself is garbage collected, it can't be enqueued onto a ReferenceQueue).

Be careful that you only have a finite number of resources unreleased in the cache - evict old entries (indeed, you can discard the soft references with if finite cache, if that suits your situation). It is usually the case that it is the non-memory resource which is more important, in which case an LRU-eviction cache with no exotic reference objects should be sufficient.

(My answer #1000. Posted from London DevDay.)

Calamity answered 28/10, 2009 at 17:53 Comment(4)
I'm surprised it was remotely coherent (is it?) after an hour or so of sleep, a day in a darkened room (with poor coffee service though working wifi) and trying to listen to a speaker. But it had to be done.Calamity
Tom, could you please post(or edit this one) a more detailed answer, eventually accompanied by some (pseudo)code? I also had a hard day, maybe tomorrow I'll understand it better, but right now, unfortunately I don't seem to be able.Nansen
@marcob I think a similar implementation can be found here: javaspecialists.eu/archive/Issue015.html You might want to add generics on top of it, as this seems to be coded back in 2001. I originally wanted to use soft values with Google collections' MapMaker, but couldn't find ability to hook custom finalization logic there. I've posted a message to google-collections mailing list and see if can get anything out of it.Glaucoma
After reading this, I try to avoid that site: "On very careful inspection, I discovered the big difference between the phantom and weak references. Both are released rather quickly, but the phantom reference is enqueued in the reference queue before it's referent is cleared, whereas the weak reference is enqueued after the referent is cleared." That proves a clear lack of understanding what those references are and how they work. So, thanks, but no thanks.Nansen
E
6

Toms answer is the correct one, however the code that has been added to the question is not the same as what was proposed by Tom. What Tom was proposing looks more like this:

class Bloat {  // just a heap filler really
    public Reader res;
    private double a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z;

    private final int ii;

    public Bloat(final int ii, Reader res) {
       this.ii = ii;
       this.res = res;
    }
 }

 // as recommended by Tom Hawtin
 class MySoftBloatReference extends SoftReference<Bloat> {
    public final Reader hardRef;

    MySoftBloatReference(Bloat referent, ReferenceQueue<Bloat> q) {
       super(referent, q);
       this.hardRef = referent.res;
    }
 }

 //...meanwhile, somewhere in the neighbouring galaxy...
 {
    ReferenceQueue<Bloat> rq = new ReferenceQueue<Bloat>();
    Set<SoftReference<Bloat>> set = new HashSet<SoftReference<Bloat>>();
    int i=0;

    while(i<50000) {
        set.add(new MySoftBloatReference(new Bloat(i, new StringReader("test")), rq));

        MySoftBloatReference polled = (MySoftBloatReference) rq.poll();

        if (polled != null) {
            // close the reference that we are holding on to
            try {
                polled.hardRef.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        i++;
    }
}

Note that the big difference is that the hard reference is to the object that needs to be closed. The surrounding object can, and will, be garbage collected, so you won't hit the OOM, however you still get a chance to close the reference. Once you leave the loop, that will also be garbage collected. Of course, in the real world, you probably wouldn't make res a public instance member.

That said, if you are holding open file references, then you run a very real risk of running out of those before you run out of memory. You probably also want to have an LRU cache to ensure that you keep no more than sticks finger in the air 500 open files. These can also be of type MyReference so that they can also be garbage collected if need be.

To clarify a little on how MySoftBloatReference works, the base class, that is SoftReference, still holds the reference to the object that is hogging all of the memory. This is the object that you need to be freed to prevent the OOM from happening. However, If the object is freed, you still need to free the resources that the Bloat is using, that is, Bloat is using two types of resource, memory and a file handle, both of these resources need to be freed, or you run out of one or the other of the resources. The SoftReference handles the pressure on the memory resource by freeing that object, however you also need to release the other resource, the file handle. Because Bloat has already been freed, we can't use it to free the related resource, so MySoftBloatReference keeps a hard reference to the internal resource that needs to be closed. Once it has been informed that the Bloat has been freed, i.e. once the reference turns up in the ReferenceQueue, then MySoftBloatReference can also close the related resource, through the hard reference that it has.

EDIT: Updated the code so that it compiles when thrown into a class. It uses a StringReader to illustrate the concept of how to close the Reader, which is being used to represent the external resource that needs to be freed. In this particular case closing that stream is effectively a no-op, and so is not needed, but it shows how to do so if it is needed.

Electroform answered 3/12, 2009 at 22:11 Comment(3)
Would it be possible to fix your code so it compiles? E.g. MyReference constructor takes Bloat referent argument and is supposed to assign it to hardRef, but hardRef is of a totally different type (ResourceThatMustBeClosed). Also, can you elaborate on why Bloat is still necessary once we've got ResourceThatMustBeClosed? P.S. I wouldn't be so needy if this question had no bonus points attached :PGlaucoma
I have updated the code, and (hopefully) added a clear explanation of how it works? If not, let me know...Electroform
Code is fixed so that it compiles. Just throw it into an empty class, and add the appropriate imports.Electroform
S
2

Ahm.
(As far as I know) You can't hold the stick from both ends. Either you hold to your information, or you let it go.
However... you can hold to some key information that would enable you to finalize. Of course, the key information must be significantly smaller then the "real information" and must not have the real information in its reachable object graph (weak references might help you there).
Building on the existing example (pay attention to the key information field):

public class Test1 {
    static class Bloat {  // just a heap filler really
        private double a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z;

        private final int ii;

        public Bloat(final int ii) {
            this.ii = ii;
        }
    }

    // as recommended by Tom Hawtin
    static class MyReference<T, K> extends SoftReference<T> {
        private final K keyInformation;

        MyReference(T referent, K keyInformation, ReferenceQueue<? super T> q) {
            super(referent, q);
            this.keyInformation = keyInformation;
        }

        public K getKeyInformation() {
            return keyInformation;
        }
    }

    //...meanwhile, somewhere in the neighbouring galaxy...
    public static void main(String[] args) throws InterruptedException {
        ReferenceQueue<Bloat> rq = new ReferenceQueue<Bloat>();
        Set<SoftReference<Bloat>> set = new HashSet<SoftReference<Bloat>>();
        int i = 0;

        while (i < 50000) {
            set.add(new MyReference<Bloat, Integer>(new Bloat(i), i, rq));

            final Reference<? extends Bloat> polled = rq.poll();

            if (polled != null) {
                if (polled instanceof MyReference) {
                    final Object keyInfo = ((MyReference) polled).getKeyInformation();
                    System.out.println("not null, got key info: " + keyInfo + ", finalizing...");
                } else {
                    System.out.println("null, can't finalize.");
                }
                rq.remove();
                System.out.println("removed reference");
            }

Edit:
I want to elaborate on the "either hold your information or let it go". Assuming you had some way of holding to your information. That would have forced the GC to unmark your data, causing the data to actually be cleaned only after you're done with it, in a second GC cycle. This is possible - and its exactly what finalize() is for. Since you stated that you don't want the second cycle to occur, you can't hold your information (if a-->b then !b-->!a). which means you must let it go.

Edit2:
Actually, a second cycle would occur - but for your "key data", not your "major bloat data". The actual data would be cleared on the first cycle.

Edit3:
Obviously, the real solution would use a separate thread for removing from the reference queue (don't poll(), remove(), blocking on the dedicated thread).

Stink answered 6/12, 2009 at 15:19 Comment(1)
Forgot to mention - running this example with -Xmx 10mb doesn't yield OOM and does list all kind of numbers (assumed "key information").Stink
G
0

@Paul - thanks a lot for the answer and clarification.

@Ran - I think in your current code i++ is missing at the end of the loop. Also, you don't need to do rq.remove() in the loop as rq.poll() already removes top reference, isn't it?

Few points:

1) I had to add Thread.sleep(1) statement after i++ in the loop (for both solutions of Paul and Ran) to avoid OOM but that's irrelevant to the big picture and is also platform dependant. My machine has a quad-core CPU and is running Sun Linux 1.6.0_16 JDK.

2) After looking at these solutions I think I'll stick using finalizers. Bloch's book provides following reasons:

  • there is no guarantee finalizers will be executed promptly, therefore never do anything time critical in a finalizer -- nor there are any guarantees for SoftRererences!
  • Never depend on a finalizer to update critical persistent state -- I am not
  • there is a severe performance penalty for using finalizers -- In my worst case I'd be finalizing about a single object per minute or so. I think I can live with that.
  • use try/finally -- oh yes, I definitely will!

Having necessity to create enormous amount of scaffold just for what seems a simple task doesn't look reasonable to me. I mean, literally, the WTF per minute rate would be quite high for anybody else looking at such code.

3) Saddly, there is no way to split points between Paul, Tom and Ran :( I hope Tom wouldn't mind as he already got lots of them :) Judging between Paul and Ran was much harder - I think both answers work and are correct. I am only setting accept flag to Paul's answer because it was rated higher (and has more detailed explanation), but Ran's solution isn't bad at all and would probably be my choice if I'd chose to implement it using SoftReferences. Thanks guys!

Glaucoma answered 7/12, 2009 at 11:46 Comment(1)
i++ - yes, probably didn't make it through the copy/paste. no need for remove() - correct. I'm missing about half of the references.Stink

© 2022 - 2024 — McMap. All rights reserved.