Is there a SoftHashMap in Scala?
Asked Answered
B

4

7

I'm aware of this question for java, but none of those implementations seem to play well with scala.collection.JavaConversions.

I'm looking for something simple (e.g. single file, not a whole library) that implements SoftHashMap such that it plays well with Scala Map (i.e. supports getOrElseUpdate, unzip, and the remaining Scala Map methods).

Beneficial answered 25/2, 2011 at 0:22 Comment(2)
Can you explain what's the problem with JavaConversions? It will convert any Java Map (implementing the right interface) into a mutable.Map; at least Apache Commons and Google Guava's implementation satisfy that requirement.Revivalist
In my answer, I even show how to get a Scala concurrent map.Revivalist
B
4

Implementation inspired by this java WeakHashMap:

import scala.collection.mutable.{Map, HashMap}
import scala.ref._


class SoftMap[K, V <: AnyRef] extends Map[K, V]
{
  class SoftValue[K, +V <: AnyRef](val key:K, value:V, queue:ReferenceQueue[V]) extends SoftReference(value, queue)

  private val map = new HashMap[K, SoftValue[K, V]]

  private val queue = new ReferenceQueue[V]

  override def += (kv: (K, V)): this.type =
  {
    processQueue
    val sv = new SoftValue(kv._1, kv._2, queue)
    map(kv._1) = sv
    this
  }

  private def processQueue
  {
    while (true)
    {
      queue.poll match
      {   
        case Some(sv:SoftValue[K, _]) => map.remove(sv.key)
        case _ => return
      }
    }
  }


  override def get(key: K): Option[V] = map.get(key) match
  {
    case Some(sv) => sv.get match
      { case v:Some[_] => v
        case None => {map.remove(key); None} }
    case None => None
  }



  override def -=(key: K):this.type =
  {
    processQueue
    map.remove(key)
    this
  }

  override def iterator: Iterator[(K, V)] =
  {
    processQueue
    map.iterator.collect{ case (key, sv) if sv.get.isDefined => (key, sv.get.get) }
  }

  override def empty:SoftMap[K, V] = new SoftMap[K, V]

  override def size = {processQueue; map.size}
}
Beneficial answered 25/2, 2011 at 0:22 Comment(0)
H
2

I found one in liftweb.

I do not use it yet, so check it yourself please.

http://scala-tools.org/mvnsites/liftweb-2.4-M5/net/liftweb/util/SoftReferenceCache.html

Herrin answered 13/12, 2011 at 17:49 Comment(0)
S
2

Occasionally you get asked a question like, "what's the best way to poke your eye out with a stick" that you can go to great lengths answering how you should harden and sterilise the stick after carving a 1 inch hook into the end, and follow up with where the stick should be inserted and so on. Really though, the best answer is probably not exactly what was asked – but the question as to why on earth you'd want to do this in the first place!

This is one of those questions.

SoftReferences are something that initially sound like something you might want. A reference that does not get garbage collected until there is GC pressure. Presumably, you'd use this to cache something that was worth caching, usually because it was expensive to create in the first place.

The problem is, SoftRefs clear almost exactly when you don't want them to, when the GC is under pressure! This means that they will need to be recreated (expensive op) right when the VM is already busy and under GC pressure.

Furthermore, there is no way to hint the VM as to the priority of objects that are soft-referenced. The particular algorithm used for selecting which objects to clear is unspecified and VM dependent.

Essentially, SoftReferences are a misguided attempt to off-load an application level concern (caching) to the garbage-collector. You really should never* actually use them.

*never, modulo some very small and very specialised use-cases

Sixfold answered 14/12, 2011 at 0:58 Comment(4)
How do you know that the GC is going to collect an entry which you're actually going to need? They could well use LRU. "Virtual machine implementations are, however, encouraged to bias against clearing recently-created or recently-used soft references." (https://mcmap.net/q/705964/-how-are-softreferences-collected-by-jvms-in-practice/53974, quoting SoftReference's JavaDocs). According to Jeremy Manson, there are other problems, but they are solved by using timed expiration, implemented by Google's MapMaker: jeremymanson.blogspot.de/2009/07/…. So, what the OP should do is to wrap MapMaker.Revivalist
You don't know, that is the point – your code is at the mercy of the particular strategy the VM happens to use. Relying on the current implementation in Oracle's VM is only relevant if you truly understand the access patterns in your code (recency may not be the best metric for instance) and you can guarantee that you will only ever run in that VM and won't upgrade.Sixfold
Indeed - but Google's MapMaker doesn't rely (necessarily) on soft references. I updated the solution that I show in my answer to use no soft references but other expiration techniques. (Note: MapMaker is now CacheBuilder).Revivalist
right, my objections are only to the use of SoftReference for most usages. We use the MapMaker/Cache stuff extensively.Sixfold
R
2

As other people have observed, SoftReferences are usually not The Right Thing to build a cache. However, some libraries provide better replacements. While the OP requires not using a library, I still think this is the best answer possible. Plus, with SBT downloading the library is quite simple.

In build.sbt, assuming you're building your project with SBT >= 0.10 (tested with 0.12), add:

libraryDependencies += "com.google.guava" % "guava" % "13.0"

libraryDependencies += "com.google.code.findbugs" % "jsr305" % "1.3.9" //Needed by guava, but marked as optional; at least Scalac 2.10 however does require it to parse annotations.

In client code, you can build your map as follows (look into CacheBuilder's option for the meaning of the various parameters; the ones below are the ones I chose for my use case):

For Scala 2.10:

import com.google.common.cache.CacheBuilder
import collection.JavaConversions._

def buildCache[K <: AnyRef, V <: AnyRef]: collection.concurrent.Map[K, V] =
  CacheBuilder.newBuilder()
    .maximumSize(10000).expireAfterAccess(10, TimeUnit.MINUTES)
    .concurrencyLevel(2)
    .build[K, V]().asMap()

For Scala 2.9 (deprecated/not compiling in 2.10):

import com.google.common.cache.CacheBuilder
import collection.{JavaConversions, mutable}
import JavaConversions._

def buildCache2_9[K <: AnyRef, V <: AnyRef]: mutable.ConcurrentMap[K, V] =
  CacheBuilder.newBuilder()
    .maximumSize(10000).expireAfterAccess(10, TimeUnit.MINUTES)
    .concurrencyLevel(2)
    .build[K, V]().asMap()

To make it work for both versions, call the implicit conversion explicitly - it's JavaConversions.asScalaConcurrentMap. I've reported the problem on the scala-language mailing list (and will also report it on the bug tracker), so I hope that the 2.9 code will at least compile in 2.10 (while still causing deprecation warning): https://groups.google.com/d/topic/scala-language/uXKRiGXb-44/discussion.

Revivalist answered 4/8, 2012 at 23:47 Comment(2)
Update: I changed the example to not rely on soft references at all.Revivalist
Update 2: the 2.9 code now compiles in 2.10 without workarounds, even though it (correctly) causes deprecation warnings.Revivalist

© 2022 - 2024 — McMap. All rights reserved.