Creating an O(1)-memory Iterable from an initial object and a function which generates the next object, in Scala
Asked Answered
E

4

6

I want a convenient way to generate an Iterable, given a initial object and a function to produce the next object from the current one, that consumes O(1) memory (i.e., it doesn't cache old results; if you want to iterate a second time, the function has to be applied again).

It doesn't seem like there's library support for this. In Scala 2.8, the method scala.collection.Iterable.iterate has signature

def iterate [A] (start: A, len: Int)(f: (A) ⇒ A) : Iterable[A]

so it requires that you specify how many iterated function applications you're interested in ahead of time, and my understanding of the documentation is that Iterable.iterate actually computes all these values immediately. On the other hand, the method scala.collection.Iterator.iterate has signature

def iterate [T] (start: T)(f: (T) ⇒ T) : Iterator[T]

which looks great, but we only get an Iterator which doesn't offer all the convenience of map, filter and friends.

Is there a convenient library method to produce what I want?

and if not,

Can someone suggest the 'colloquial' Scala code for doing this?

To summarise, given an initial object a: A, and a function f: A => A, I'd like a TraversableLike (e.g., probably an Iterable) that generates a, f(a), f(f(a)), ..., and uses O(1) memory, with map, filter etc. functions that also return something that is O(1) in memory.

Editor answered 24/9, 2010 at 3:44 Comment(3)
A "clue": reading the API some more, I'm beginning to suspect that a good answer will mention TraversableViewLike, but I'm also increasingly stumped.Editor
Iterator has map, filter and friends... Are you certain they use more than constant memory?Frisco
It's true, map and filter and so on are available on Iterator, and don't try anything silly like forcing the Iterator. But an Iterable would be more convenient; why shouldn't I expect to be able to use tail (which, whenever iterator is called, should remove the first element via a call to next before handing back the Iterator), etc? (In fact, when I tried to switch my code from expecting Iterables to Iterators, this was something I had to work around.)Editor
F
3

Iterator.iterate demo with filter:

object I {
  def main(args:Array[String]) {
    val mb = 1024 * 1024
    val gen = Iterator.iterate(new Array[Int](10 * mb)){arr => 
      val res = new Array[Int](10 * mb)
      arr.copyToArray(res)
      println("allocated 10mb")
      res(0) = arr(0) + 1 // store iteration count in first elem of new array
      res
    }
    // take 1 out of 100
    val gen2 = gen filter (arr => arr(0) % 100 == 0) 
    // print first 10 filtered
    gen2.take(10).foreach { arr => println("filtered " + arr(0)) } 
  }
}

(this may not work in the REPL as the PRINT step may mess up with memory management)

JAVA_OPTS="-Xmx128m" scala -cp classes I will show that the filtering works and is lazy. If it wasn't done in constant memory that would cause a heap error (since it's allocating something like 900*10mb).

Use JAVA_OPTS="-Xmx128m -verbose:gc" scala -cp classes I to see the garbage collection events.

Frisco answered 24/9, 2010 at 6:40 Comment(1)
Thanks for the details to convince me everything is really O(1). I'll go try this out.Editor
B
10

Stream will do what you want, just don't hold on to the cells; only iterate over the values.

It is a sadly common misconception floating around that Streams inherently cache every value they compute.

If you write this:

val s1: Stream[Thing] = initialValue #:: «expression computing next value»

then indeed every value produced by the stream is retained, but this is not necessary. If you write:

def s2: Stream[Thing] = initialValue #:: «expression computing next value»

and if the caller just iterates over the stream's values but does not remember the Stream value itself (specifically any of its cons cells), then no unwanted retention will occur. Of course, in this formulation, every call creates a new Stream starting from a fixed initial value. That's not necessary:

def s3(start: Thing): Stream[Thing] = start #:: «expression computing next value»

The one thing you have to watch out for is passing a Stream to a method. Doing so will capture the head of the stream passed in the method parameter. One way around this is to process the stream with tail-recursive code.

Bering answered 24/9, 2010 at 5:43 Comment(5)
I don't understand -- I need to be able to pass this object around to other consumers; that is, unknown other code will actually do the iterating. I don't see how I can do this without passing a reference to the head of the Stream.Editor
It is a limitation. As I said, you'll have to structure the code to pass the Stream through tail-call-optimized chains. But that "unknown" code knows it's getting a Stream so it knows it cannot retain the references to its (stream-) cons cells.Bering
No, this really won't do. Why would the "unknown" code know anything? If someone else calls into my code, why might they not just treat the return value as say an Iterable?Editor
@Scott Morrison: Is Daniel's answer (the newest one in this question) not suitable?Bering
No: see my comment on his answer. Perhaps I didn't sufficiently clearly explain what I'm after, and should try again with a new question.Editor
F
3

Iterator.iterate demo with filter:

object I {
  def main(args:Array[String]) {
    val mb = 1024 * 1024
    val gen = Iterator.iterate(new Array[Int](10 * mb)){arr => 
      val res = new Array[Int](10 * mb)
      arr.copyToArray(res)
      println("allocated 10mb")
      res(0) = arr(0) + 1 // store iteration count in first elem of new array
      res
    }
    // take 1 out of 100
    val gen2 = gen filter (arr => arr(0) % 100 == 0) 
    // print first 10 filtered
    gen2.take(10).foreach { arr => println("filtered " + arr(0)) } 
  }
}

(this may not work in the REPL as the PRINT step may mess up with memory management)

JAVA_OPTS="-Xmx128m" scala -cp classes I will show that the filtering works and is lazy. If it wasn't done in constant memory that would cause a heap error (since it's allocating something like 900*10mb).

Use JAVA_OPTS="-Xmx128m -verbose:gc" scala -cp classes I to see the garbage collection events.

Frisco answered 24/9, 2010 at 6:40 Comment(1)
Thanks for the details to convince me everything is really O(1). I'll go try this out.Editor
I
2

Iterator is the very thing what you want. And iterator do has map,filter,takeWhile and many other methods which is O(1) in memory.I don't think there is another collection type with O(1) in memory.

Iodic answered 24/9, 2010 at 6:48 Comment(0)
P
1
val it = new Iterable[Int] {
  def iterator = Iterator.iterate(0)(_+1)
  override
  def toString: String = "Infinite iterable"
}

Don't try it out on REPL (except by embedding it inside an object or class), as REPL will try to print it, and it doesn't use toString.

Photoneutron answered 24/9, 2010 at 17:13 Comment(3)
That prints "Infinite iterable" in trunk.Adiel
At least as far as I understand, it map { _ + 1 } take 5 will not terminate, however, since map will try to force the Iterable.Editor
@Scott Iterable is not necessarily lazy. Unless you take the time to make all methods lazy, that's what is provided. However, it.view map { _ + 1 } take 5 will work, so I don't see why worry about it.Photoneutron

© 2022 - 2024 — McMap. All rights reserved.