Scala: sliding(N,N) vs grouped(N)
Asked Answered
M

2

33

I found myself lately using sliding(n,n) when I need to iterate collections in groups of n elements without re-processing any of them. I was wondering if it would be more correct to iterate those collections by using grouped(n). My question is if there is an special reason to use one or another for this specific case in terms of performance.

val listToGroup = List(1,2,3,4,5,6,7,8)
listToGroup: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8)

listToGroup.sliding(3,3).toList
res0: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8))

listToGroup.grouped(3).toList
res1: List[List[Int]] = List(List(1, 2, 3), List(4, 5, 6), List(7, 8))
Meadors answered 30/9, 2015 at 19:47 Comment(0)
O
36

The reason to use sliding instead of grouped is really only applicable when you want to have the 'windows' be of a length different than what you 'slide' by (that is to say, using sliding(m, n) where m != n):

listToGroup.sliding(2,3).toList
//returns List(List(1, 2), List(4, 5), List(7, 8))

listToGroup.sliding(4,3).toList
//returns List(List(1, 2, 3, 4), List(4, 5, 6, 7), List(7, 8))

As som-snytt points out in a comment, there's not going to be any performance difference, as both of them are implemented within Iterator as returning a new GroupedIterator. However, it's simpler to write grouped(n) than sliding(n, n), and your code will be cleaner and more obvious in its intended behavior, so I would recommend grouped(n).

As an example for where to use sliding, consider this problem where grouped simply doesn't suffice:

Given a list of numbers, find the sublist of length 4 with the greatest sum.

Now, putting aside the fact that a dynamic programming approach can produce a more efficient result, this can be solved as:

def maxLengthFourSublist(list: List[Int]): List[Int] = {
    list.sliding(4,1).maxBy(_.sum)
}

If you were to use grouped here, you wouldn't get all the sublists, so sliding is more appropriate.

Onomastic answered 30/9, 2015 at 20:1 Comment(1)
The implementation is DRY, so it doesn't matter which you call, for performance. github.com/scala/scala/blob/v2.11.7/src/library/scala/…Immutable
C
0
import scala.util.Random

val measurements: Vector[Long] = {
   val random = new Random(2021)
   (1 to 100_000_000).map(_ => 10 + random.nextLong(90)).toVector
}

 
def time(block: => Unit): Long = {
   val start = System.nanoTime()
   block
   val end = System.nanoTime()
   (end - start) / 1000 / 1000
}


scala> time {measurements.grouped(2).collect { case Seq(a, b) if a < b => b }.size}
val res38: Long = 2367

scala> time {measurements.sliding(2,1).collect { case Seq(a, b) if a < b => b }.size} 
val res39: Long = 5174

sliding is more than 2x slower.

Scala3, macbook, Apple M1 Pro, 16 GB

Circumscribe answered 16/12, 2023 at 4:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.