Scala Partition/Collect Usage
Asked Answered
S

8

39

Is it possible to use one call to collect to make 2 new lists? If not, how can I do this using partition?

Sw answered 24/1, 2011 at 15:58 Comment(0)
P
83

collect (defined on TraversableLike and available in all subclasses) works with a collection and a PartialFunction. It also just so happens that a bunch of case clauses defined inside braces are a partial function (See section 8.5 of the Scala Language Specification [warning - PDF])

As in exception handling:

try {
  ... do something risky ...
} catch {
  //The contents of this catch block are a partial function
  case e: IOException => ...
  case e: OtherException => ...
}

It's a handy way to define a function that will only accept some values of a given type.

Consider using it on a list of mixed values:

val mixedList = List("a", 1, 2, "b", 19, 42.0) //this is a List[Any]
val results = mixedList collect {
  case s: String => "String:" + s
  case i: Int => "Int:" + i.toString
}

The argument to to collect method is a PartialFunction[Any,String]. PartialFunction because it's not defined for all possible inputs of type Any (that being the type of the List) and String because that's what all the clauses return.

If you tried to use map instead of collect, the the double value at the end of mixedList would cause a MatchError. Using collect just discards this, as well as any other value for which the PartialFunction is not defined.

One possible use would be to apply different logic to elements of the list:

var strings = List.empty[String]
var ints = List.empty[Int]
mixedList collect {
  case s: String => strings :+= s
  case i: Int => ints :+= i
}

Although this is just an example, using mutable variables like this is considered by many to be a war crime - So please don't do it!

A much better solution is to use collect twice:

val strings = mixedList collect { case s: String => s }
val ints = mixedList collect { case i: Int => i }

Or if you know for certain that the list only contains two types of values, you can use partition, which splits a collections into values depending on whether or not they match some predicate:

//if the list only contains Strings and Ints:
val (strings, ints) = mixedList partition { case s: String => true; case _ => false }

The catch here is that both strings and ints are of type List[Any], though you can easily coerce them back to something more typesafe (perhaps by using collect...)

If you already have a type-safe collection and want to split on some other property of the elements, then things are a bit easier for you:

val intList = List(2,7,9,1,6,5,8,2,4,6,2,9,8)
val (big,small) = intList partition (_ > 5)
//big and small are both now List[Int]s

Hope that sums up how the two methods can help you out here!

Piliform answered 24/1, 2011 at 16:44 Comment(6)
Very nice explanation, but what I think OP wants is a combination of collect and partition that returns a tuple of a list of the collected values and a list of all the rest. def collectAndPartition[A, B](pf: PartialFunction[A, B]): (List[B], List[A]). This would probably be most elegantly achieved with a native library function, i.e. in the source of collect in TraversableLike we have for (x <- this) if (pf.isDefinedAt(x)) b += pf(x), one could simply tack an else a += x at the end of that, where a would be a builder for the list of all the rest.Cleora
I know what the OP needs, and I'm also aware that this is a homework question (it's been mentioned on stack overflow a lot recently), so I'll gladly give plenty of theory without actually solving it. As for collectAndPartition, I already wrote that, though I named the method collate. If anyone is teaching scala to the level where students are expected to work with CanBuildFrom then I'll be very surprised, it's beyond most people currently using scala in production.Piliform
That was very helpful. But I'm still thinking... is it possible to separate for example positive and negative values, without making a "war crime" as you wrote erlier? I'm just courious becouse I've allready done homework with using partition. Ohhh... And by the way, Thanks for help!Sw
collect, like map, takes one collection and converts it to another collection. It can certainly help in processing data and making it easier to separate in a following operation, but there's no way to generate two collections using just collect unless you rely on side effects (i.e. mutable variables). This sort of hack leads to code that's harder to read, and is best left for times when there's no other alternative or it gives a vital performance boost, neither of which applies here.Piliform
there is a nice example in this question that shows how to use collect to split a stream.Iny
One could also use groupBy to define some criteria for splitting to N number of sublists, and then take required groups by their keys. For example in this case key could be surrogate enum or java class instance, or boolean and then you're back with quite same result as partition providesAuxiliary
V
7

Not sure how to do it with collect without using mutable lists, but partition can use pattern matching as well (just a little more verbose)

List("a", 1, 2, "b", 19).partition { 
  case s:String => true
  case _ => false 
}
Velamen answered 24/1, 2011 at 16:12 Comment(2)
@coubeatczech - Because partition returns a (List[A],List[A]). That's all it can do since the input is a List[A] and an indicator function A => Boolean. It has no way of knowing that the indicator function might be type-specific.Dupuy
@Rex I defined my own collate method to pimp onto collections that solves just this problem. On a List[A] the use-case signature is collate[B](fn: PartialFunction[A,B]): (List(B),List(A)), obviously the actual signature is a bit hairier than that as I'm also using CanBuildFromPiliform
D
6

The signature of the normally-used collect on, say, Seq, is

collect[B](pf: PartialFunction[A,B]): Seq[B]

which is really a particular case of

collect[B, That](pf: PartialFunction[A,B])(
  implicit bf: CanBuildFrom[Seq[A], B, That]
): That

So if you use it in default mode, the answer is no, assuredly not: you get exactly one sequence out from it. If you follow CanBuildFrom through Builder, you see that it would be possible to make That actually be two sequences, but it would have no way of being told which sequence an item should go into, since the partial function can only say "yes, I belong" or "no, I do not belong".

So what do you do if you want to have multiple conditions that result in your list being split into a bunch of different pieces? One way is to create an indicator function A => Int, where your A is mapped into a numbered class, and then use groupBy. For example:

def optionClass(a: Any) = a match {
  case None => 0
  case Some(x) => 1
  case _ => 2
}
scala> List(None,3,Some(2),5,None).groupBy(optionClass)
res11: scala.collection.immutable.Map[Int,List[Any]] = 
  Map((2,List(3, 5)), (1,List(Some(2))), (0,List(None, None)))

Now you can look up your sub-lists by class (0, 1, and 2 in this case). Unfortunately, if you want to ignore some inputs, you still have to put them in a class (e.g. you probably don't care about the multiple copies of None in this case).

Dupuy answered 24/1, 2011 at 17:3 Comment(0)
C
5

I use this. One nice thing about it is it combines partitioning and mapping in one iteration. One drawback is that it does allocate a bunch of temporary objects (the Either.Left and Either.Right instances)

/**
 * Splits the input list into a list of B's and a list of C's, depending on which type of value the mapper function returns.
 */
def mapSplit[A,B,C](in: List[A])(mapper: (A) => Either[B,C]): (List[B], List[C]) = {
  @tailrec
  def mapSplit0(in: List[A], bs: List[B], cs: List[C]): (List[B], List[C]) = {
    in match {
      case a :: as =>
        mapper(a) match {
          case Left(b)  => mapSplit0(as, b :: bs, cs     )
          case Right(c) => mapSplit0(as, bs,      c :: cs)
        }
      case Nil =>
        (bs.reverse, cs.reverse)
    }
  }

  mapSplit0(in, Nil, Nil)
}

val got = mapSplit(List(1,2,3,4,5)) {
  case x if x % 2 == 0 => Left(x)
  case y               => Right(y.toString * y)
}

assertEquals((List(2,4),List("1","333","55555")), got)
Clintonclintonia answered 25/1, 2011 at 1:24 Comment(0)
D
5

Starting in Scala 2.13, most collections are now provided with a partitionMap method which partitions elements based on a function which returns either Right or Left.

That allows us to pattern match based on the type (which as a collect enables having specific types in the partitioned lists) or any other pattern:

val (strings, ints) =
  List("a", 1, 2, "b", 19).partitionMap {
    case s: String => Left(s)
    case x: Int    => Right(x)
  }
// strings: List[String] = List("a", "b")
// ints: List[Int] = List(1, 2, 19)
Daltondaltonism answered 7/10, 2018 at 9:37 Comment(0)
F
1

I could not find a satisfying solution to this basic problem here. I don't need a lecture on collect and don't care if this is someone's homework. Also, I don't want something that works only for List.

So here is my stab at it. Efficient and compatible with any TraversableOnce, even strings:

implicit class TraversableOnceHelper[A,Repr](private val repr: Repr)(implicit isTrav: Repr => TraversableOnce[A]) {

  def collectPartition[B,Left](pf: PartialFunction[A, B])
  (implicit bfLeft: CanBuildFrom[Repr, B, Left], bfRight: CanBuildFrom[Repr, A, Repr]): (Left, Repr) = {
    val left = bfLeft(repr)
    val right = bfRight(repr)
    val it = repr.toIterator
    while (it.hasNext) {
      val next = it.next
      if (!pf.runWith(left += _)(next)) right += next
    }
    left.result -> right.result
  }

  def mapSplit[B,C,Left,Right](f: A => Either[B,C])
  (implicit bfLeft: CanBuildFrom[Repr, B, Left], bfRight: CanBuildFrom[Repr, C, Right]): (Left, Right) = {
    val left = bfLeft(repr)
    val right = bfRight(repr)
    val it = repr.toIterator
    while (it.hasNext) {
      f(it.next) match {
        case Left(next) => left += next
        case Right(next) => right += next
      }
    }
    left.result -> right.result
  }
}

Example usages:

val (syms, ints) =
  Seq(Left('ok), Right(42), Right(666), Left('ko), Right(-1)) mapSplit identity

val ctx = Map('a -> 1, 'b -> 2) map {case(n,v) => n->(n,v)}
val (bound, unbound) = Vector('a, 'a, 'c, 'b) collectPartition ctx
println(bound: Vector[(Symbol, Int)], unbound: Vector[Symbol])
Felten answered 26/5, 2016 at 15:31 Comment(0)
M
0

Something like this could help

def partitionMap[IN, A, B](seq: Seq[IN])(function: IN => Either[A, B]): (Seq[A], Seq[B]) = {
  val (eitherLeft, eitherRight) = seq.map(function).partition(_.isLeft)
  eitherLeft.map(_.left.get) -> eitherRight.map(_.right.get)
}

To call it

val seq: Seq[Any] = Seq(1, "A", 2, "B")
val (ints, strings) = CollectionUtils.partitionMap(seq) {
  case int: Int    => Left(int)
  case str: String => Right(str)
}
ints shouldBe Seq(1, 2)
strings shouldBe Seq("A", "B")

Advantage is a simple API, similar with the one from Scala 2.12

Disadvantage; collection is ran twice and missing support for CanBuildFrom

Mete answered 1/8, 2019 at 10:1 Comment(0)
O
0

I would personally use a foldLeft or foldRight for this. It has a couple advantages over the some of the other answers here. No use of var, so this is a pure function (if you care about that type of thing). Only one traversal through the list. Does not create any extraneous Either objects.

The idea of a fold is to convert a list into a single type. However, nothing is stopping us from having this single type be a Tuple of any number of lists.

This example converts a list into three different lists:

  val list: List[Any] = List(1,"two", 3, "four", 5.5)

  // Start with 3 empty lists and prepend to them each time we find a new value
  list.foldRight( (List.empty[Int]), List.empty[String], List.empty[Double]) {
    (nextItem, newCollection) => {
      nextItem match {
        case i: Int => newCollection.copy(_1 = i :: newCollection._1)
        case s: String => newCollection.copy(_2 = s :: newCollection._2)
        case f: Double => newCollection.copy(_3 = f :: newCollection._3)
        case _ => newCollection
      }
    }
  }
Overstrain answered 7/7, 2020 at 0:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.