F# Split list into sublists based on comparison of adjacent elements

Asked 17/2, 2010 at 7:52 Answered 24/3, 2012 at 21:5

I've found this question on hubFS, but that handles a splitting criteria based on individual elements. I'd like to split based on a comparison of adjacent elements, so the type would look like this:

val split = ('T -> 'T -> bool) -> 'T list -> 'T list list

Currently, I am trying to start from Don's imperative solution, but I can't work out how to initialize and use a 'prev' value for comparison. Is fold a better way to go?

//Don's solution for single criteria, copied from hubFS
let SequencesStartingWith n (s:seq<_>) =
    seq { use ie = s.GetEnumerator()
          let acc = new ResizeArray<_>()
          while ie.MoveNext() do
             let x = ie.Current
             if x = n && acc.Count > 0 then
                 yield ResizeArray.to_list acc
                 acc.Clear()
             acc.Add x
          if acc.Count > 0 then
              yield  ResizeArray.to_list acc }

Pucka answered 17/2, 2010 at 7:52 Comment(2)

Note: this related question asks how to split and only keep pred = true elements – Pucka 5/2, 2013 at 5:48

Cross reference: here is the same question, but for sequences. – Pucka 5/2, 2013 at 7:27

This is an interesting problem! I needed to implement exactly this in C# just recently for my article about grouping (because the type signature of the function is pretty similar to groupBy, so it can be used in LINQ query as the group by clause). The C# implementation was quite ugly though.

Anyway, there must be a way to express this function using some simple primitives. It just seems that the F# library doesn't provide any functions that fit for this purpose. I was able to come up with two functions that seem to be generally useful and can be combined together to solve this problem, so here they are:

// Splits a list into two lists using the specified function
// The list is split between two elements for which 'f' returns 'true'
let splitAt f list =
  let rec splitAtAux acc list = 
    match list with
    | x::y::ys when f x y -> List.rev (x::acc), y::ys
    | x::xs -> splitAtAux (x::acc) xs
    | [] -> (List.rev acc), []
  splitAtAux [] list

val splitAt : ('a -> 'a -> bool) -> 'a list -> 'a list * 'a list

This is similar to what we want to achieve, but it splits the list only in two pieces (which is a simpler case than splitting the list multiple times). Then we'll need to repeat this operation, which can be done using this function:

// Repeatedly uses 'f' to take several elements of the input list and
// aggregate them into value of type 'b until the remaining list 
// (second value returned by 'f') is empty
let foldUntilEmpty f list = 
  let rec foldUntilEmptyAux acc list =
    match f list with
    | l, [] -> l::acc |> List.rev
    | l, rest -> foldUntilEmptyAux (l::acc) rest
  foldUntilEmptyAux [] list

val foldUntilEmpty : ('a list -> 'b * 'a list) -> 'a list -> 'b list

Now we can repeatedly apply splitAt (with some predicate specified as the first argument) on the input list using foldUntilEmpty, which gives us the function we wanted:

let splitAtEvery f list = foldUntilEmpty (splitAt f) list

splitAtEvery (<>) [ 1; 1; 1; 2; 2; 3; 3; 3; 3 ];;
val it : int list list = [[1; 1; 1]; [2; 2]; [3; 3; 3; 3]]

I think that the last step is really nice :-). The first two functions are quite straightforward and may be useful for other things, although they are not as general as functions from the F# core library.

Bevin answered 17/2, 2010 at 15:26 Comment(2)

@Tomas, nice. Empty lists give a value restriction error (for my solution also). I'm never sure what to do with it -> do I have to force the type of my empty list? Also, do you have any thoughts on the other half of my original question - which is how to initialize a 'prev' value for comparison when doing it imperatively. – Pucka 18/2, 2010 at 6:9

To initialize prev value, you'll need to call MoveNext at the end of the loop (see how this is done in my C# article). F# doesn't support do .. while loop, so you'll need to implement this using explicit recursion. – Bevin 18/2, 2010 at 18:55

How about:

let splitOn test lst =
    List.foldBack (fun el lst ->
            match lst with
            | [] -> [[el]]
            | (x::xs)::ys when not (test el x) -> (el::(x::xs))::ys
            | _ -> [el]::lst
         )  lst []

the foldBack removes the need to reverse the list.

Py answered 23/3, 2012 at 11:54 Comment(1)

(Welcome to Stack Overflow) It works, now just got to get my head round that...! – Pucka 23/3, 2012 at 12:35

Having thought about this a bit further, I've come up with this solution. I'm not sure that it's very readable (except for me who wrote it).

UPDATE Building on the better matching example in Tomas's answer, here's an improved version which removes the 'code smell' (see edits for previous version), and is slightly more readable (says me).

It still breaks on this (splitOn (<>) []), because of the dreaded value restriction error, but I think that might be inevitable.

(EDIT: Corrected bug spotted by Johan Kullbom, now works correctly for [1;1;2;3]. The problem was eating two elements directly in the first match, this meant I missed a comparison/check.)

//Function for splitting list into list of lists based on comparison of adjacent elements
let splitOn test lst = 
    let rec loop lst inner outer = //inner=current sublist, outer=list of sublists
        match lst with 
        | x::y::ys when test x y -> loop (y::ys) [] (List.rev (x::inner) :: outer)
        | x::xs ->                  loop xs (x::inner) outer
        | _ ->                      List.rev ((List.rev inner) :: outer)
    loop lst [] []

splitOn (fun a b -> b - a > 1) [1]
> val it : [[1]]

splitOn (fun a b -> b - a > 1) [1;3]
> val it : [[1]; [3]]

splitOn (fun a b -> b - a > 1) [1;2;3;4;6;7;8;9;11;12;13;14;15;16;18;19;21]
> val it : [[1; 2; 3; 4]; [6; 7; 8; 9]; [11; 12; 13; 14; 15; 16]; [18; 19]; [21]]

Any thoughts on this, or the partial solution in my question?

Pucka answered 17/2, 2010 at 8:41 Comment(5)

It's more readable than the function I came up with, but I would replace loop (List.head ...etc with match lst with | [] -> [[]] | hd::tl -> loop hd tl [] [], so it won't go poof on an empty input. – Modeling 17/2, 2010 at 9:21

@cfern, breaks on empty list anyway (as does Tomas's solution) because of the #£$£~"! value restriction error. – Pucka 18/2, 2010 at 6:6

@Benjol: In what way does your solution break on empty lists? splitOn (fun a b -> b - a > 1) [] gives me [[]]... – Crocein 18/2, 2010 at 21:12

@Benjol: Your current solution has a problem with a "corner case" - it does not give correct result for input like [1; 3; 5;] (gives [[1]; [3; 5]] when it should give [[1]; [3]; [5]]) – Crocein 18/2, 2010 at 21:21

@Johan Kullbom, well spotted! I've fixed it now. Also for your first comment, I've updated my answer. – Pucka 19/2, 2010 at 6:17

"adjacent" immediately makes me think of Seq.pairwise.

let splitAt pred xs =
    if Seq.isEmpty xs then
        []
    else
        xs
        |> Seq.pairwise
        |> Seq.fold (fun (curr :: rest as lists) (i, j) -> if pred i j then [j] :: lists else (j :: curr) :: rest) [[Seq.head xs]]
        |> List.rev
        |> List.map List.rev

Example:

[1;1;2;3;3;3;2;1;2;2]
|> splitAt (>)

Gives:

[[1; 1; 2; 3; 3; 3]; [2]; [1; 2; 2]]

Cerium answered 23/3, 2012 at 16:3 Comment(0)

I would prefer using List.fold over explicit recursion.

let splitOn pred = function
    | []       -> []
    | hd :: tl -> 
        let (outer, inner, _) =
            List.fold (fun (outer, inner, prev) curr ->
                            if pred prev curr 
                            then (List.rev inner) :: outer, [curr], curr
                            else outer, curr :: inner, curr)
                      ([], [hd], hd)
                      tl
        List.rev ((List.rev inner) :: outer)

Crocein answered 17/2, 2010 at 9:58 Comment(1)

Your solution is less readable than mine. Because I didn't write it! :) – Pucka 17/2, 2010 at 21:21

I like answers provided by @Joh and @Johan as these solutions seem to be most idiomatic and straightforward. I also like an idea suggested by @Shooton. However, each solution had their own drawbacks.
I was trying to avoid:

Reversing lists
Unsplitting and joining back the temporary results
Complex match instructions
Even Seq.pairwise appeared to be redundant
Checking list for emptiness can be removed in cost of using Unchecked.defaultof<_> below

Here's my version:

let splitWhen f src =
    if List.isEmpty src then [] else
    src
    |> List.foldBack
        (fun el (prev, current, rest) ->
            if f el prev
            then el , [el]          , current :: rest
            else el , el :: current , rest
        )
        <| (List.head src, [], [])               // Initial value does not matter, dislike using Unchecked.defaultof<_>
    |> fun (_, current, rest) -> current :: rest // Merge temporary lists
    |> List.filter (not << List.isEmpty)         // Drop tail element

Terpineol answered 24/3, 2012 at 21:5 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags