How can I group consecutive integers in a vector in Rust?
Asked Answered
L

5

9

I have a Vec<i64> and I want to know all the groups of integers that are consecutive. As an example:

let v = vec![1, 2, 3, 5, 6, 7, 9, 10];

I'm expecting something like this or similar:

[[1, 2, 3], [5, 6, 7], [9, 10]];

The view (vector of vectors or maybe tuples or something else) really doesn't matter, but I should get several grouped lists with continuous numbers.

At the first look, it seems like I'll need to use itertools and the group_by function, but I have no idea how...

Leandra answered 16/5, 2018 at 21:44 Comment(3)
Why is iterating over the vector and just checking if the next i64 is +1 to the previous not a option?Alinealinna
It's a good decision and I haven't any thoughts against it! But it will have much more code than a functional approach. And the main reason - I'm trying to rewrite my pet project from Python to Rust and I made it like this: group_normal_data = [list(map(itemgetter(1), g)) for (k, g) in groupby(enumerate(source_normal), lambda ix: ix[0] - ix[1])]. It's hard to understand of course, but it works just in only one line and I tried to make the same in RustLeandra
The final snippet in my answer is pretty similar in meaning to the Python version. In fact, it started out even more similar, using enumerate. But just like with list comprehensions in Python, it's easy to go crazy with iterator adapters until you have an unreadable mess.Blanc
B
12

You can indeed use group_by for this, but you might not really want to. Here's what I would probably write instead:

fn consecutive_slices(data: &[i64]) -> Vec<&[i64]> {
    let mut slice_start = 0;
    let mut result = Vec::new();
    for i in 1..data.len() {
        if data[i - 1] + 1 != data[i] {
            result.push(&data[slice_start..i]);
            slice_start = i;
        }
    }
    if data.len() > 0 {
        result.push(&data[slice_start..]);
    }
    result
}

This is similar in principle to eXodiquas' answer, but instead of accumulating a Vec<Vec<i64>>, I use the indices to accumulate a Vec of slice references that refer to the original data. (This question explains why I made consecutive_slices take &[T].)

It's also possible to do the same thing without allocating a Vec, by returning an iterator; however, I like the above version better. Here's the zero-allocation version I came up with:

fn consecutive_slices(data: &[i64]) -> impl Iterator<Item = &[i64]> {
    let mut slice_start = 0;
    (1..=data.len()).flat_map(move |i| {
        if i == data.len() || data[i - 1] + 1 != data[i] {
            let begin = slice_start;
            slice_start = i;
            Some(&data[begin..i])
        } else {
            None
        }
    })
}

It's not as readable as a for loop, but it doesn't need to allocate a Vec for the return value, so this version is more flexible.


Here's a "more functional" version using group_by:

use itertools::Itertools;

fn consecutive_slices(data: &[i64]) -> Vec<Vec<i64>> {
    (&(0..data.len()).group_by(|&i| data[i] as usize - i))
        .into_iter()
        .map(|(_, group)| group.map(|i| data[i]).collect())
        .collect()
}

The idea is to make a key function for group_by that takes the difference between each element and its index in the slice. Consecutive elements will have the same key because indices increase by 1 each time. One reason I don't like this version is that it's quite difficult to get slices of the original data structure; you almost have to create a Vec<Vec<i64>> (hence the two collects). The other reason is that I find it harder to read.

However, when I first wrote my preferred version (the first one, with the for loop), it had a bug (now fixed), while the other two versions were correct from the start. So there may be merit to writing denser code with functional abstractions, even if there is some hit to readability and/or performance.

Blanc answered 17/5, 2018 at 13:14 Comment(4)
Thank you! The first version is quite simple for understanding and it's really good! The last version is really one that I wanted. Now I can play with these functions and understand how to use it in my project. Big Thanks!Leandra
The first version at least is buggy, try a vector with only consecutive elements.Mcalpin
@Mcalpin Thanks! I was trying to make sure it wouldn't return [[]] when passed an empty slice, but I overlooked that possibility. Nice catch. I hope this bug hasn't made it into anyone's production code over the last 2 years...Blanc
To get rid of if data.len() > 0 in the first snippet, you can modify the condition above: if i == data.len() || data[i - 1] + 1 != data[i]. Also, don't forget to modify the range: 1..=data.len().Picaresque
A
6
let v = vec![1, 2, 3, 5, 6, 7, 9, 10];
let mut res = Vec::new();
let mut prev = v[0];
let mut sub_v = Vec::new();

sub_v.push(prev);

for i in 1..v.len() {
    if v[i] == prev + 1 {
        sub_v.push(v[i]);
        prev = v[i];
    } else {
        res.push(sub_v.clone());
        sub_v.clear();
        sub_v.push(v[i]);
        prev = v[i];
    }
}

res.push(sub_v);

This should solve your problem.

Iterating over the given vector, checking if the current i64 (in my case i32) is +1 to the previous i64, if so push it into a vector (sub_v). After the series breaks, push the sub_v into the result vector. Repeat.

But I guess you wanted something functional?

Alinealinna answered 16/5, 2018 at 22:39 Comment(2)
This will panic if the starting vector is empty. You can use mem::replace instead of cloning & clearing the vector.Compute
Thank you for your decision! It completely resolves my problem and I think that I will use exactly this approach. But you are right: I wanted some functionals due to less code. Other reason: I haven't been successful with group_byfunction in Rust (But I resolve the same task in Python exactly with a very similar itertools library without any problem) and I want to understand how I can use it.Leandra
C
3

Another possible solution, that uses std only, could be:

fn consecutive_slices(v: &[i64]) -> Vec<Vec<i64>> {
    let t: Vec<Vec<i64>> = v
        .into_iter()
        .chain([*v.last().unwrap_or(&-1)].iter())
        .scan(Vec::new(), |s, &e| {
            match s.last() {
                None => { s.push(e); Some((false, Vec::new())) },
                Some(&p) if p == e - 1 => { s.push(e); Some((false, Vec::new()))},
                Some(&p) if p != e - 1 => {let o = s.clone(); *s = vec![e]; Some((true, o))},
                _ => None,
            }
         })
         .filter_map(|(n, v)| {
             match n {
                 true => Some(v.clone()),
                 false => None,
             }
         })
         .collect();

    t
}

The chain is used to get the last vector.

Canuck answered 17/5, 2018 at 21:46 Comment(1)
I've never seen anyone actually use scan before, but it works! A few suggestions: Option implements IntoIterator, so you can .chain(v.last().or(Some(&-1))) instead of making a 1-element array; if you return an Option<Option<...>> you can simplify the flat_map call; and mem::replace can avoid the overhead of cloning s. Here's a playground link.Blanc
O
1

I like the answers above but you could also use peekable() to tell if the next value is different.

https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html

Oust answered 30/9, 2021 at 12:19 Comment(0)
A
0

I would probably use a fold for this? That's because I'm very much a functional programmer. Obviously mutating the accumulator is weird :P but this works too and represents another way of thinking about it.

This is basically a recursive solution and can be modified easily to use immutable datastructures.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=43b9e3613c16cb988da58f08724471a4


fn main() {

    let v = vec![1, 2, 3, 5, 6, 7, 9, 10];

    let mut res: Vec<Vec<i32>> = vec![];

    let (last_group, _): (Vec<i32>, Option<i32>) = v
        .iter()
        .fold((vec![], None), |(mut cur_group, last), x| {
            match last {
                None => {
                    cur_group.push(*x);
                    (cur_group, Some(*x))
                }
                Some(last) => {
                    if x - last == 1 {
                        cur_group.push(*x);
                        (cur_group, Some(*x))
                    } else {
                        res.push(cur_group);
                        (vec![*x], Some(*x))
                    }
                }
            }
        });

    res.push(last_group);
    println!("{:?}", res);
}
Amandine answered 18/2, 2022 at 22:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.