Splitting a list into list of lists based on predicate
Asked Answered
M

5

6

(I am aware of this question, but it relates to sequences, which is not my problem here)

Given this input (for example):

let testlist = 
    [  
       "*text1";
       "*text2";
       "text3";
       "text4";
       "*text5";
       "*text6";
       "*text7"
    ]

let pred (s:string) = s.StartsWith("*")

I would like to be able to call MyFunc pred testlist and get this output:

[
    ["*text1";"*text2"];
    ["*text5";"*text6";"*text7"]
]

This is my current solution, but I don't really like the nested List.revs (ignore the fact that it takes Seq as input)

let shunt pred sq =
    let shunter (prevpick, acc) (pick, a) = 
        match pick, prevpick with
        | (true, true)  -> (true, (a :: (List.hd acc)) :: (List.tl acc))
        | (false, _)    -> (false, acc)
        | (true, _)     -> (true, [a] :: acc)

    sq 
        |> Seq.map (fun a -> (pred a, a))
        |> Seq.fold shunter (false, []) 
        |> snd
        |> List.map List.rev
        |> List.rev
Maupassant answered 15/1, 2010 at 11:22 Comment(1)
Belated postscript: the title of this question was badly worded, so a few of the answers are more appropriate to this question.Maupassant
A
5

Edit: rev-less version using foldBack added below.

Here's some code that uses lists and tail-recursion:

//divides a list L into chunks for which all elements match pred
let divide pred L =
    let rec aux buf acc L =
        match L,buf with
        //no more input and an empty buffer -> return acc
        | [],[] -> List.rev acc 
        //no more input and a non-empty buffer -> return acc + rest of buffer
        | [],buf -> List.rev (List.rev buf :: acc) 
        //found something that matches pred: put it in the buffer and go to next in list
        | h::t,buf when pred h -> aux (h::buf) acc t
        //found something that doesn't match pred. Continue but don't add an empty buffer to acc
        | h::t,[] -> aux [] acc t
        //found input that doesn't match pred. Add buffer to acc and continue with an empty buffer
        | h::t,buf -> aux [] (List.rev buf :: acc) t
    aux [] [] L

usage:

> divide pred testlist;;
val it : string list list =
  [["*text1"; "*text2"]; ["*text5"; "*text6"; "*text7"]]

Using a list as data structure for a buffer means that it always needs to be reversed when outputting the contents. This may not be a problem if individual chunks are modestly sized. If speed/efficiency becomes an issue, you could use a Queue<'a> or a `List<'a>' for the buffers, for which appending is fast. But using these data structures instead of lists also means that you lose the powerful list pattern matching. In my opinion, being able to pattern match lists outweighs the presence of a few List.rev calls.

Here's a streaming version that outputs the result one block at a time. This avoids the List.rev on the accumulator in the previous example:

let dividestream pred L =
    let rec aux buf L =
        seq { match L, buf with
              | [],[] -> ()
              | [],buf -> yield List.rev buf
              | h::t,buf when pred h -> yield! aux (h::buf) t
              | h::t,[] -> yield! aux [] t
              | h::t,buf -> yield List.rev buf
                            yield! aux [] t }
    aux [] L

This streaming version avoids the List.rev on the accumulator. Using List.foldBack can be used to avoid reversing the accumulated chunks as well.

update: here's a version using foldBack

//divides a list L into chunks for which all elements match pred
let divide2 pred L =
    let f x (acc,buf) =
        match pred x,buf with
        | true,buf -> (acc,x::buf)
        | false,[] -> (acc,[])
        | false,buf -> (buf::acc,[])

    let rest,remainingBuffer = List.foldBack f L ([],[])
    match remainingBuffer with
    | [] -> rest
    | buf -> buf :: rest
Atilt answered 15/1, 2010 at 13:7 Comment(4)
Sorry, my error, it should effectively just be a list of strings, does that simplify your answer?Maupassant
It eliminates the extra flatten step I took. The divide(stream) algorithms remain the same. I'll edit my answer.Atilt
Nice, two answers in one, both of them 'faster' (based on my minimal side-by-sides), functional, and 'straight through'. Think I prefer divide2, short, sweet, and no risk of stack overflow (I believe).Maupassant
I can't resist to praise divide2. I am on a train and can hardly hold back laughing it's so beautiful. Thanks for the joy cfern.Emlin
P
8

there is a List.partition function in the F# core library (in case you wanted to implement this just to have it working and not to learn how to write recursive functions yourself). Using this function, you can write this:

> testlist |> List.partition (fun s -> s.StartsWith("*"))
val it : string list * string list =
  (["*text1"; "*text2"; "*text5"; "*text6"; "*text7"], ["text3"; "text4"])

Note that this function returns a tuple instead of returning a list of lists. This is a bit different to what you wanted, but if the predicate returns just true or false, then this makes more sense.

The implementation of partition function that returns tuples is also a bit simpler, so it may be useful for learning purposes:

let partition pred list = 
  // Helper function, which keeps results collected so
  // far in 'accumulator' arguments outTrue and outFalse
  let rec partitionAux list outTrue outFalse =
    match list with 
    | [] -> 
        // We need to reverse the results (as we collected
        // them in the opposite order!)
        List.rev outTrue, List.rev outFalse
    // Append element to one of the lists, depending on 'pred'
    | x::xs when pred x -> partitionAux xs (x::outTrue) outFalse
    | x::xs -> partitionAux xs outTrue (x::outFalse)

  // Run the helper function
  partitionAux list [] []   
Piacular answered 15/1, 2010 at 14:22 Comment(1)
No, sorry, I want to separate each group of lines which correspond to pred, and discard the others.Maupassant
A
5

Edit: rev-less version using foldBack added below.

Here's some code that uses lists and tail-recursion:

//divides a list L into chunks for which all elements match pred
let divide pred L =
    let rec aux buf acc L =
        match L,buf with
        //no more input and an empty buffer -> return acc
        | [],[] -> List.rev acc 
        //no more input and a non-empty buffer -> return acc + rest of buffer
        | [],buf -> List.rev (List.rev buf :: acc) 
        //found something that matches pred: put it in the buffer and go to next in list
        | h::t,buf when pred h -> aux (h::buf) acc t
        //found something that doesn't match pred. Continue but don't add an empty buffer to acc
        | h::t,[] -> aux [] acc t
        //found input that doesn't match pred. Add buffer to acc and continue with an empty buffer
        | h::t,buf -> aux [] (List.rev buf :: acc) t
    aux [] [] L

usage:

> divide pred testlist;;
val it : string list list =
  [["*text1"; "*text2"]; ["*text5"; "*text6"; "*text7"]]

Using a list as data structure for a buffer means that it always needs to be reversed when outputting the contents. This may not be a problem if individual chunks are modestly sized. If speed/efficiency becomes an issue, you could use a Queue<'a> or a `List<'a>' for the buffers, for which appending is fast. But using these data structures instead of lists also means that you lose the powerful list pattern matching. In my opinion, being able to pattern match lists outweighs the presence of a few List.rev calls.

Here's a streaming version that outputs the result one block at a time. This avoids the List.rev on the accumulator in the previous example:

let dividestream pred L =
    let rec aux buf L =
        seq { match L, buf with
              | [],[] -> ()
              | [],buf -> yield List.rev buf
              | h::t,buf when pred h -> yield! aux (h::buf) t
              | h::t,[] -> yield! aux [] t
              | h::t,buf -> yield List.rev buf
                            yield! aux [] t }
    aux [] L

This streaming version avoids the List.rev on the accumulator. Using List.foldBack can be used to avoid reversing the accumulated chunks as well.

update: here's a version using foldBack

//divides a list L into chunks for which all elements match pred
let divide2 pred L =
    let f x (acc,buf) =
        match pred x,buf with
        | true,buf -> (acc,x::buf)
        | false,[] -> (acc,[])
        | false,buf -> (buf::acc,[])

    let rest,remainingBuffer = List.foldBack f L ([],[])
    match remainingBuffer with
    | [] -> rest
    | buf -> buf :: rest
Atilt answered 15/1, 2010 at 13:7 Comment(4)
Sorry, my error, it should effectively just be a list of strings, does that simplify your answer?Maupassant
It eliminates the extra flatten step I took. The divide(stream) algorithms remain the same. I'll edit my answer.Atilt
Nice, two answers in one, both of them 'faster' (based on my minimal side-by-sides), functional, and 'straight through'. Think I prefer divide2, short, sweet, and no risk of stack overflow (I believe).Maupassant
I can't resist to praise divide2. I am on a train and can hardly hold back laughing it's so beautiful. Thanks for the joy cfern.Emlin
C
3

Just reverse the list once up front, and then build the structure in order easily:

let Shunt p l =
    let mutable r = List.rev l
    let mutable result = []
    while not r.IsEmpty do
        let mutable thisBatch = []
        while not r.IsEmpty && not(p r.Head) do
            r <- r.Tail 
        while not r.IsEmpty && p r.Head do
            thisBatch <- r.Head :: thisBatch
            r <- r.Tail
        if not thisBatch.IsEmpty then
            result <- thisBatch :: result
    result        

The outer while deals with each 'batch', and the first inner while skips over any that don't match the predicate, followed by another while that grabs all those that do and stores them in the current batch. If there was anything in this batch (the final one may be empty), prepend it to the final result.

This is an example where I think locally imperative code is simply superior to a purely functional counterpart. The code above is so easy to write and to reason about.

Cranford answered 15/1, 2010 at 16:59 Comment(4)
I'm new to functional code, so I'd ask how is this imperative code superior to Petricek's functional version?Barracoon
Well, my solution does what you wanted/asked, whereas his does not.Cranford
I think it must be the way I asked my question, I had several similar 'group by' answers which didn't answer my question. Surprised by your imperative solution, not because you're wrong, just because I'm so used to trying to do it "the F# way" that it didn't even occur to me to go looking in that direction.Maupassant
By the way, given that comments are disabled there, I'll use this space to give a thumbs up to the monadic parser combinators series on your blog (lorgonblog.spaces.live.com/blog/cns!701679AD17B6D310!123.entry), I just 'finished' the C# part. You say LukeH's version was better, but your's is WAY more understandable. You got the baud rate just right, once again :)Maupassant
B
0

Another version of shunt:

let shunt pred lst =
    let rec tWhile pred lst = 
        match lst with
        | []                    -> [], []
        | hd :: tl when pred hd -> let taken, rest = tWhile pred tl
                                   (hd :: taken), rest
        | lst                   -> [], lst
    let rec collect = function
        | []  -> []
        | lst -> let taken, rest = tWhile pred lst
                 taken :: (collect (snd (tWhile (fun x -> not (pred x)) rest)))
    collect lst

This one avoids List.rev but it's not tail recursive - so only suitable for small lists.

Beaumont answered 15/1, 2010 at 14:59 Comment(0)
L
0

yet another one...

let partition pred lst = 
    let rec trec xs cont =
        match xs with
        | []               -> ([],[]) |> cont
        | h::t when pred h -> (fun (y,n) -> h::y,n) >> cont |> trec t
        | h::t             -> (fun (y,n) -> y,h::n) >> cont |> trec t
    trec lst id

then we can define shunt:

let shunt pred lst = lst |> partition pred |> (fun (x,y) -> [x;y])
Liva answered 16/1, 2010 at 0:43 Comment(1)
No, sorry, I want to separate each group of lines which correspond to pred, and discard the others.Maupassant

© 2022 - 2024 — McMap. All rights reserved.