Scope & memory issues in Scala
Asked Answered
Y

3

5

I have a very large List of numbers, which undergo lots of math manipulation. I only care about the final result. To simulate this behavior, see my example code below:

object X { 
def main(args:Array[String]) = {
    val N = 10000000
    val x = List(1 to N).flatten
    println(x.slice(0,10))
    Thread.sleep( 5000)
    val y = x.map(_*5)
    println(y.slice(0,10))
    Thread.sleep( 5000)
    val z = y.map( _+4)
    println(z.slice(0,10))
    Thread.sleep( 5000)
}
     }

So x is a very large list. I care only about the result z. To obtain z, I first have to mathematically manipulate x to get y. Then I manipulate y to get z. ( I cannot go from x to z in one step, because the manipulations are quite complicated. This is just an example. )

So when I run this example, I run out of memory presumably because x, y and z are all in scope and they all occupy memory.

So I try the following:

def main(args:Array[String]) = {
    val N = 10000000
    val z = {
            val y = {
                val x = List(1 to N).flatten
                println(x.slice(0,10))
                Thread.sleep( 5000)
                x

            }.map(_*5)

            println(y.slice(0,10))
            Thread.sleep( 5000)
            y

    }.map( _+4)
    println(z.slice(0,10))
    Thread.sleep(5000)
}

So now only z is in scope. So presumably x and y are created and then garbage collected when they go out of scope. But this isn't what happens. Instead, I again run out of memory!

( Note: I am using java -Xincgc, but it doesn't help )

Question: When I have adequate memory for only 1 large list, can I somehow manipulate it using only val's ( ie. no mutable vars or ListBuffers ), maybe using scoping to force gc ? If so, how ? Thanks

Yet answered 25/11, 2011 at 19:30 Comment(2)
You will always need memory for two lists. Out of curiosity, have you set your Java heap? Considered Arrays?Hanny
True, I will always need memory for 2 Lists, which I have. But I should not need memory for 3 lists, which I don't have. Do you agree ? In any case, since x and y go out of scope, why are they not garbage collected once the VM realizes its short on memory & the variables aren't in scope ?Yet
T
8

Have you tried something like this?

val N = 10000000
val x = List(1 to N).flatten.view // get a view
val y = x.map(_ * 5)
val z = y.map(_ + 4)
println(z.force.slice(0, 10))

It should help avoiding creating the intermediate full structure for y and z.

Teresiateresina answered 25/11, 2011 at 19:47 Comment(1)
Hey thanks! That actually fixes the problem very nicely!! No out of memory errors. I have to include a "force" on the last operation, but it looks like I can do any number of intermediate operations on the view without allocation of any more memory. Exactly what I wanted.Yet
L
3

Look at using view. It takes a collection and lazily loads it, only calculates the value when required. It doesn't form an intermediate collection:

scala> (1 to 5000000).map(i => {i*i}).map(i=> {i*2}) .toList
java.lang.OutOfMemoryError: Java heap space
        at java.lang.Integer.valueOf(Integer.java:625)
        at scala.runtime.BoxesRunTime.boxToInteger(Unknown Source)
        at scala.collection.immutable.Range.foreach(Range.scala:75)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:194)
        at scala.collection.immutable.Range.map(Range.scala:43)
        at .<init>(<console>:8)
        at .<clinit>(<console>)
        at .<init>(<console>:11)
        at .<clinit>(<console>)
        at $print(<console>)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:704)
        at scala.tools.nsc.interpreter.IMain$Request$$anonfun$14.apply(IMain.scala:920)
        at scala.tools.nsc.interpreter.Line$$anonfun$1.apply$mcV$sp(Line.scala:43)
        at scala.tools.nsc.io.package$$anon$2.run(package.scala:25)
        at java.lang.Thread.run(Thread.java:662)
scala> (1 to 5000000).view.map(i => {i*i}).view.map(i=> {i*2}) .toList
res10: List[Int] = List(2, 8, 18, 32, 50, 72, ...
Leahleahey answered 25/11, 2011 at 19:55 Comment(0)
C
0

Its a cheap answer, but did you try starting the jvm with more memory?

e.g.

$ java -X ... -Xmx set maximum Java heap size

Also, GC probably won't help, because it sounds like you're getting caught with two lists in memory at the same time during the transition, and they're both referenced.

Cardiganshire answered 25/11, 2011 at 19:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.