Scala Memoization: How does this Scala memo work?
Asked Answered
S

1

26

The following code is from Pathikrit's Dynamic Programming repository. I'm mystified by both its beauty and peculiarity.

def subsetSum(s: List[Int], t: Int) = {
  type DP = Memo[(List[Int], Int), (Int, Int), Seq[Seq[Int]]]
  implicit def encode(key: (List[Int], Int)) = (key._1.length, key._2)

  lazy val f: DP = Memo {
    case (Nil, 0) => Seq(Nil)
    case (Nil, _) => Nil
    case (a :: as, x) => (f(as, x - a) map {_ :+ a}) ++ f(as, x)
  }

  f(s, t)
}

The type Memo is implemented in another file:

case class Memo[I <% K, K, O](f: I => O) extends (I => O) {
  import collection.mutable.{Map => Dict}
  val cache = Dict.empty[K, O]
  override def apply(x: I) = cache getOrElseUpdate (x, f(x))
}

My questions are:

  1. Why is type K declared as (Int, Int) in subsetSum?

  2. What does the int in (Int, Int) stand for respectively?

3. How does (List[Int], Int) implicitly convert to (Int, Int)?
I see no implicit def foo(x:(List[Int],Int)) = (x._1.toInt,x._2). ( not even in the Implicits.scala file it imports.

*Edit: Well, I miss this:

implicit def encode(key: (List[Int], Int)) = (key._1.length, key._2)

I enjoy Pathikrit's library scalgos very much. There are a lot of Scala pearls in it. Please help me with this so I can appreciate Pathikrit's wit. Thank you. (:

Sal answered 5/8, 2014 at 0:52 Comment(7)
Regarding #3, you forgot to read the line with implicit def encodeVoracity
I've seen the question here, but I still wonder does value f really memoize? It seems that every time subsetSum is called, a new instance of Memo is created and it is not the last memo we want. Is it true?Sal
Yes, for a given s, t, it returns a Memo object. further calls on that Memo object would be O(1) if it was computed before. You can convince yourself by writing a simple fibonacci memoized function e.g. lazy val factorial: Int ==> BigInt = Memo { case 0 => 1 case n => n * factorial(n - 1) }Voracity
Another way to convince yourself is to put this line in Memo.scala: cache getOrElseUpdate (x, (y) => println(s"Cache miss: $y"); f(y))Voracity
@wrick See here This is the testing result. Memo doesn't seem to work....Sal
There is a difference between returning the computed value using dynamic programming i.e. f(s, t) versus return the dynamic program (aka memoized closure over s and t) i.e. returning f itself. See my reply to you hereVoracity
Thank you for your thorough explanation. Now I totally understand all the mechanism. You're kind (:Sal
V
59

I am the author of the above code.

/**
 * Generic way to create memoized functions (even recursive and multiple-arg ones)
 *
 * @param f the function to memoize
 * @tparam I input to f
 * @tparam K the keys we should use in cache instead of I
 * @tparam O output of f
 */
case class Memo[I <% K, K, O](f: I => O) extends (I => O) {
  import collection.mutable.{Map => Dict}
  type Input = I
  type Key = K
  type Output = O
  val cache = Dict.empty[K, O]
  override def apply(x: I) = cache getOrElseUpdate (x, f(x))
}

object Memo {
  /**
   * Type of a simple memoized function e.g. when I = K
   */
  type ==>[I, O] = Memo[I, I, O]
}

In Memo[I <% K, K, O]:

I: input
K: key to lookup in cache
O: output

The line I <% K means the K can be viewable (i.e. implicitly converted) from I.

In most cases, I should be K e.g. if you are writing fibonacci which is a function of type Int => Int, it is okay to cache by Int itself.

But, sometimes when you are writing memoization, you do not want to always memoize or cache by the input itself (I) but rather a function of the input (K) e.g when you are writing the subsetSum algorithm which has input of type (List[Int], Int), you do not want to use List[Int] as the key in your cache but rather you want use List[Int].size as the part of the key in your cache.

So, here's a concrete case:

/**
 * Subset sum algorithm - can we achieve sum t using elements from s?
 * O(s.map(abs).sum * s.length)
 *
 * @param s set of integers
 * @param t target
 * @return true iff there exists a subset of s that sums to t
 */
 def isSubsetSumAchievable(s: List[Int], t: Int): Boolean = {
    type I = (List[Int], Int)     // input type
    type K = (Int, Int)           // cache key i.e. (list.size, int)
    type O = Boolean              // output type      

    type DP = Memo[I, K, O]

    // encode the input as a key in the cache i.e. make K implicitly convertible from I
    implicit def encode(input: DP#Input): DP#Key = (input._1.length, input._2)   

    lazy val f: DP = Memo {
      case (Nil, x) => x == 0      // an empty sequence can only achieve a sum of zero
      case (a :: as, x) => f(as, x - a) || f(as, x)      // try with/without a.head
    }

    f(s, t)
 }

You can ofcourse shorten all these into a single line: type DP = Memo[(List[Int], Int), (Int, Int), Boolean]

For the common case (when I = K), you can simply do this: type ==>[I, O] = Memo[I, I, O] and use it like this to calculate the binomial coeff with recursive memoization:

  /**
   * http://mathworld.wolfram.com/Combination.html
   * @return memoized function to calculate C(n,r)
   */
  val c: (Int, Int) ==> BigInt = Memo {
    case (_, 0) => 1
    case (n, r) if r > n/2 => c(n, n - r)
    case (n, r) => c(n - 1, r - 1) + c(n - 1, r)
  }

To see details how above syntax works, please refer to this question.

Here is a full example which calculates editDistance by encoding both the parameters of the input (Seq, Seq) to (Seq.length, Seq.length):

 /**
   * Calculate edit distance between 2 sequences
   * O(s1.length * s2.length)
   *
   * @return Minimum cost to convert s1 into s2 using delete, insert and replace operations
   */
  def editDistance[A](s1: Seq[A], s2: Seq[A]) = {

    type DP = Memo[(Seq[A], Seq[A]), (Int, Int), Int]
    implicit def encode(key: DP#Input): DP#Key = (key._1.length, key._2.length)

    lazy val f: DP = Memo {
      case (a, Nil) => a.length
      case (Nil, b) => b.length
      case (a :: as, b :: bs) if a == b => f(as, bs)
      case (a, b) => 1 + (f(a, b.tail) min f(a.tail, b) min f(a.tail, b.tail))
    }

    f(s1, s2)
  }

And lastly, the canonical fibonacci example:

lazy val fib: Int ==> BigInt = Memo {
  case 0 => 0
  case 1 => 1
  case n if n > 1 => fib(n-1) + fib(n-2)
}

println(fib(100))
Voracity answered 5/8, 2014 at 1:15 Comment(9)
Very nice work. Just one thing: case class should not be used here given Memo contains a member with moving hash.Tepper
Where can I find something about ==> notation?Florencia
@Krever: Look at the line: type ==>[I, O] = Memo[I, I, O] In general these things are called infix types. See: stackoverflow.com/questions/3200380Voracity
why do you use lazy, e.g. in lazy val f: DP = Memo {...} ?Fundy
@ipburbank: Because f references itself in its own definition.Voracity
Beautiful implementation, so simple, with such a nice syntax. I learnt a few things on Scala just reading these few lines. Thanks :pCatawba
Note that view bounds are deprecated now issues.scala-lang.org/browse/SI-7629Catawba
@pathikrit, why not use scala.collection.concurrent.TrieMap?Larimer
Or have Memo parameterized by type of Map to be used?Larimer

© 2022 - 2024 — McMap. All rights reserved.