Kotlin's Iterable and Sequence look exactly same. Why are two types required?
Asked Answered
E

4

108

Both of these interfaces define only one method

public operator fun iterator(): Iterator<T>

Documentation says Sequence is meant to be lazy. But isn't Iterable lazy too (unless backed by a Collection)?

Emirate answered 25/2, 2016 at 13:48 Comment(0)
B
163

The key difference lies in the semantics and the implementation of the stdlib extension functions for Iterable<T> and Sequence<T>.

  • For Sequence<T>, the extension functions perform lazily where possible, similarly to Java Streams intermediate operations. For example, Sequence<T>.map { ... } returns another Sequence<R> and does not actually process the items until a terminal operation like toList or fold is called.

    Consider this code:

    val seq = sequenceOf(1, 2)
    val seqMapped: Sequence<Int> = seq.map { print("$it "); it * it } // intermediate
    print("before sum ")
    val sum = seqMapped.sum() // terminal
    

    It prints:

    before sum 1 2
    

    Sequence<T> is intended for lazy usage and efficient pipelining when you want to reduce the work done in terminal operations as much as possible, same to Java Streams. However, laziness introduces some overhead, which is undesirable for common simple transformations of smaller collections and makes them less performant.

    In general, there is no good way to determine when it is needed, so in Kotlin stdlib laziness is made explicit and extracted to the Sequence<T> interface to avoid using it on all the Iterables by default.

  • For Iterable<T>, on contrary, the extension functions with intermediate operation semantics work eagerly, process the items right away and return another Iterable. For example, Iterable<T>.map { ... } returns a List<R> with the mapping results in it.

    The equivalent code for Iterable:

    val lst = listOf(1, 2)
    val lstMapped: List<Int> = lst.map { print("$it "); it * it }
    print("before sum ")
    val sum = lstMapped.sum()
    

    This prints out:

    1 2 before sum
    

    As said above, Iterable<T> is non-lazy by default, and this solution shows itself well: in most cases it has good locality of reference thus taking advantage of CPU cache, prediction, prefetching etc. so that even multiple copying of a collection still works good enough and performs better in simple cases with small collections.

    If you need more control over the evaluation pipeline, there is an explicit conversion to a lazy sequence with Iterable<T>.asSequence() function.

Bevbevan answered 25/2, 2016 at 14:52 Comment(12)
Probably a big surprise for Java (mostly Guava) fansEmirate
@VenkataRaju for functional people they might be surprised at the alternative of lazy by default.Ezara
Lazy by default is usually less performant for smaller and more commonly used collections. A copy can be faster than a lazy eval if taking advantage of CPU cache and so on. So for common use cases is better to not be lazy. And unfortunately the common contracts for functions like map, filter and others don't carry enough information to decide other than from the source collection type, and since most collections are also Iterable, that isn't a good marker for "be lazy" because it is commonly EVERYWHERE. lazy must be explicit to be safe.Ezara
@JaysonMinard, thanks for your remark, updated the answer.Bevbevan
@JaysonMinard Excellent answers. The performance inversion comments are an unexpected insight. I recently noticed the default collection operations were making copies, was outraged, and seeked out an alternative: sequences. Is there a good rule of thumb or a very very brief summary writeup on how each approach performs on the average computers of today? If the choice isn't so simple, as you say, a simple guideline would help some of us who are prone to overthinking about those kinds of things.Spermatic
if the choice isn't so obvious— I meant.Spermatic
@Spermatic this comes under the heading of "cache aware computing and data structures" and it depends on the structure you are using (i.e. array list vs. linked list) and what you are doing with it. So unless you really become aware of it and care about all of the little details, I would follow this rule of thumb: don't worry about it unless you see a performance problem, memory problem or you are doing something that is processing millions of things per minute. Or just test things with a micro-benchmark if you can. Otherwise more details would never fit into a comment here in SO.Ezara
@Spermatic One example from a recent Apache Spark announcement, they are worrying about this obviously, see "Cache-aware Computation" section at databricks.com/blog/2015/04/28/… ... but they are worried about billions of things iterating so they need to go to the full extreme.Ezara
@JaysonMinard Mmkay. I think I will spend some time playing with some benchmarking soon. It'll be a fun and helpful experience to rediscover just what impact the different steps involved in common processing operations have, versus what I imagine. Also, coming from C++, it will be helpful for me to spend some time learning how things actually work on the JVM. A recent forced rush towards productivity has fortunately or unfortunately bypassed my usual desire to master and understand those details.Spermatic
@JaysonMinard Unfortunately no "if there are more the 3 operations being chained together, a sequence will always be better"-style hard rules, but I appreciate the "don't worry about it" advice :] Thanks.Spermatic
Those rules can't be applied generically @naki, 3 things chained together on a small list rock when in CPU cache, so do 100 working off the same small list!Ezara
Additionally, a common pitfall with lazy evaluation is capturing the context and storing resulting lazy computation in a field along with all the captured locals and whatever they hold. Hence, hard to debug memory leaks.Tonneau
R
70

Completing hotkey's answer:

It is important to notice how Sequence and Iterable iterates throughout your elements:

Sequence example:

list.asSequence().filter { field ->
    Log.d("Filter", "filter")
    field.value > 0
}.map {
    Log.d("Map", "Map")
}.forEach {
    Log.d("Each", "Each")
}

Log result:

filter - Map - Each; filter - Map - Each

Iterable example:

list.filter { field ->
    Log.d("Filter", "filter")
    field.value > 0
}.map {
    Log.d("Map", "Map")
}.forEach {
    Log.d("Each", "Each")
}

filter - filter - Map - Map - Each - Each

Rencontre answered 12/1, 2018 at 16:53 Comment(2)
That's an excellent example of the difference between the two.Sparkman
This is a great example.Leastwise
G
2

Iterable is mapped to the java.lang.Iterable interface on the JVM, and is implemented by commonly used collections, like List or Set. The collection extension functions on these are evaluated eagerly, which means they all immediately process all elements in their input and return a new collection containing the result.

Here’s a simple example of using the collection functions to get the names of the first five people in a list whose age is at least 21:

val people: List<Person> = getPeople()
val allowedEntrance = people
    .filter { it.age >= 21 }
    .map { it.name }
    .take(5)

Target platform: JVMRunning on kotlin v. 1.3.61 First, the age check is done for every single Person in the list, with the result put in a brand new list. Then, the mapping to their names is done for every Person who remained after the filter operator, ending up in yet another new list (this is now a List<String>). Finally, there’s one last new list created to contain the first five elements of the previous list.

In contrast, Sequence is a new concept in Kotlin to represent a lazily evaluated collection of values. The same collection extensions are available for the Sequence interface, but these immediately return Sequence instances that represent a processed state of the date, but without actually processing any elements. To start processing, the Sequence has to be terminated with a terminal operator, these are basically a request to the Sequence to materialize the data it represents in some concrete form. Examples include toList, toSet, and sum, to mention just a few. When these are called, only the minimum required number of elements will be processed to produce the demanded result.

Transforming an existing collection to a Sequence is pretty straightfoward, you just need to use the asSequence extension. As mentioned above, you also need to add a terminal operator, otherwise the Sequence will never do any processing (again, lazy!).

val people: List<Person> = getPeople()
val allowedEntrance = people.asSequence()
    .filter { it.age >= 21 }
    .map { it.name }
    .take(5)
    .toList()

Target platform: JVMRunning on kotlin v. 1.3.61 In this case, the Person instances in the Sequence are each checked for their age, if they pass, they have their name extracted, and then added to the result list. This is repeated for each person in the original list until there are five people found. At this point, the toList function returns a list, and the rest of the people in the Sequence are not processed.

There’s also something extra a Sequence is capable of: it can contain an infinite number of items. With this in perspective, it makes sense that operators work the way they do - an operator on an infinite sequence could never return if it did its work eagerly.

As an example, here’s a sequence that will generate as many powers of 2 as required by its terminal operator (ignoring the fact that this would quickly overflow):

generateSequence(1) { n -> n * 2 }
    .take(20)
    .forEach(::println)

You can find more here.

Greece answered 14/12, 2019 at 19:9 Comment(0)
M
2

Iterable is good enough for most use cases, the way iteration is performed on them it works very well with caches because of the spatial locality. But the issue with them is that whole collection must pass through first intermediate operation before it moves to second and so on.

In sequence each item passes through the full pipeline before the next is handled.

This property can be determental to the performance of your code especially when iterating over large data set. so, if your terminal operation is very likely to terminate early then sequence should be preferred choice because you save by not performing unnecessary operations. for example

sequence.filter { getFilterPredicate() }   
        .map    { getTransformation()  } 
        .first  { getSelector() }

In above case if first item satisfies the filter predicate and after map transformation meets the selection criteria then filter, map and first are invoked only once.

In case of iterable whole collection must first be filtered then mapped and then first selection starts

Megagamete answered 19/7, 2021 at 18:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.