Using generic iterators instead of specific list types
Asked Answered
S

4

28

I'm very new to Rust, coming from C# / Java / similar.

In C# we have IEnumerable<T> that can be used to iterate almost any kind of array or list. C# also has a yield keyword that you can use to return a lazy list. Here's an example...

// Lazily returns the even numbers out of an enumerable
IEnumerable<int> Evens(IEnumerable<int> input)
{
    foreach (var x in input)
    {
        if (x % 2 == 0)
        {
            yield return x;
        }
    }
}

This is a silly example of course. I know I could do this with Rust's map function, but I would like to know how to create my own methods that accept and return generic iterators.

From what I can gather, Rust has generic iterators that can be use similarly, but they are above my understanding. I see Iter, IntoIterator, Iterator types, and probably more in documentation, but no good way to understand them.

Can anyone provide clear examples of how to create something like above? Thank you!

P.S. The lazy aspect is optional. I am more concerned with abstraction away from specific list and array types.

Stanhope answered 3/6, 2015 at 21:10 Comment(2)
As I understand it, you are also asking about generators - specifically revolving around the yield keyword. Rust doesn't quite have those, but you should be able to do all the same things with an Iterator. It may be a bit more complicated to type out when implementing the iterator, though.Flume
@Flume Yes, generators! That's the computer science word I was looking for. This is secondary, but I understand how the Iterator would help cover that.Stanhope
K
49

First, forget about IntoIterator and other traits or types. The core iteration trait in Rust is Iterator. Its trimmed down definition is as follows:

trait Iterator {
    type Item;  // type of elements returned by the iterator
    fn next(&mut self) -> Option<Self::Item>;
}

As you probably know, you can think of an iterator as a cursor inside of some structure. next() method advances this cursor forward, returning an element it pointed at previously. Naturally, if the collection is exhausted, there is nothing to return, and so next() returns Option<Self::Item>, not just Self::Item.

Iterator is a trait, and so it can be implemented by specific types. Note that Iterator itself is not a proper type which you can use as a return value or a function argument - you have to use concrete types which implement this trait.

The above statement may sound too restrictive - how to use arbitrary iterator types then? - but because of generics this is not so. If you want a function to accept arbitrary iterators, just make it generic in the corresponding argument, adding an Iterator bound over the corresponding type parameter:

fn iterate_bytes<I>(iter: I) where I: Iterator<Item=u8> { ... }

Returning iterators from functions may be difficult, but see below.

For example, there is a method on &[T], called iter(), which returns an iterator which yields references into the slice. This iterator is an instance of this structure. You can see on that page how Iterator is implemented for Iter:

impl<'a, T> Iterator for Iter<'a, T> {
    type Item = &'a T;
    fn next(&mut self) -> Option<&'a T> { ... }
    ...
}

This structure holds a reference to the original slice and some iteration state inside it. Its next() method updates this state and returns the next value, if there is any.

Any value whose type implements Iterator can be used in a for loop (for loop in fact works with IntoIterator, but see below):

let s: &[u8] = b"hello";
for b in s.iter() {
    println!("{}", b);   // prints numerical value of each byte
}

Now, Iterator trait is actually more complex than the above one. It also defines a lot of transformation methods which consume the iterator they are called on and return a new iterator which somehow transforms or filters values from the original iterator. For example, enumerate() method returns an iterator which yields values from the original iterator together with the positional number of the element:

let s: &[u8] = b"hello";
for (i, b) in s.iter().enumerate() {
    println!("{} at {}", b, i);   // prints "x at 0", "y at 1", etc.
}

enumerate() is defined like this:

trait Iterator {
    type Item;
    ...
    fn enumerate(self) -> Enumerate<Self> {
        Enumerate {
            iter: self,
            count: 0
        }
    }
    ...
}

Enumerate is just a struct which contains an iterator and a counter inside it and which implements Iterator<Item=(usize, I::Item)>:

struct Enumerate<I> {
    iter: I,
    count: usize
}

impl<I> Iterator for Enumerate<I> where I: Iterator {
    type Item = (usize, I::Item);

    #[inline]
    fn next(&mut self) -> Option<(usize, I::Item)> {
        self.iter.next().map(|a| {
            let ret = (self.count, a);
            self.count += 1;
            ret
        })
    }
}

And this is how most iterator transformations are implemented: each transformation is a wrapping struct which wraps the original iterator and implements Iterator trait by delegating to the original iterator and transforming the resulting value somehow. For example, s.iter().enumerate() from the example above returns a value of type Enumerate<Iter<'static, u8>>.

Note that while enumerate() is defined in Iterator trait directly, it can be a standalone function as well:

fn enumerate<I>(iter: I) -> Enumerate<I> where I: Iterator {
    Enumerate {
        iter: iter,
        count: 0
    }
}

The method works very similarly - it just uses implicit Self type parameter instead of an explicitly named one.


You may wonder what IntoIterator trait is. Well, it is just a convenience conversion trait which can be implemented by any type which can be converted to an iterator:

pub trait IntoIterator where Self::IntoIter::Item == Self::Item {
    type Item;
    type IntoIter: Iterator;

    fn into_iter(self) -> Self::IntoIter;
}

For example, &'a [T] can be converted into Iter<'a, T>, and so it has the following implementation:

impl<'a, T> IntoIterator for &'a [T] {
    type Item = &'a T;
    type IntoIter = Iter<'a, T>;

    fn into_iter(self) -> Iter<'a, T> {
        self.iter()  // just delegate to the existing method
    }
}

This trait is implemented for most container types and references to these types. It is in fact used by for loops - a value of any type which implements IntoIterator can be used in in clause:

let s: &[u8] = b"hello";
for b in s { ... }

This is very nice from learning and reading perspective because it has less noise (in form of iter()-like methods). It even allows things like these:

let v: Vec<u8> = ...;

for i in &v { /* i is &u8 here, v is borrowed immutably */ }
for i in &mut v { /* i is &mut u8 here, v is borrowed mutably */ }
for i in v { /* i is just u8 here, v is consumed */ }

This is possible because IntoIterator is implemented differently for &Vec<T>, &mut Vec<T> and just Vec<T>.

Every Iterator implements IntoIterator which performs an identity conversion (into_iter() just returns the iterator it is called on), so you can use Iterator instances in for loops as well.

Consequently, it makes sense to use IntoIterator in generic functions because it will make the API more convenient for the user. For example, enumerate() function from above could be rewritten as such:

fn enumerate<I>(source: I) -> Enumerate<I::IntoIter> where I: IntoIterator {
    Enumerate {
        iter: source.into_iter(),
        count: 0
    }
}

Now you can see how generics can be used to implement transformations with static typing easily. Rust does not have anything like C#/Python yield (but it is one of the most desired features, so one day it may appear in the language!), thus you need to wrap source iterators explicitly. For example, you can write something analogous to the above Enumerate structure which does the task you want.

However, the most idiomatic way would be to use existing combinators to do the work for you. For example, your code may be written as follows:

let iter = ...;  // iter implements Iterator<Item=i32>
let r = iter.filter(|&x| x % 2 == 0);  // r implements Iterator<Item=i32>
for i in r {
    println!("{}", i);  // prints only even items from the iterator
}

However, using combinators may turn ugly when you want to write custom combinator functions because a lot of existing combinator functions accept closures (e.g. the filter() one above), but closures in Rust are implemented as values of anonymous types, so there is just no way to write the signature of the function returning the iterator out:

fn filter_even<I>(source: I) -> ??? where I: IntoIter<Item=i32> {
    source.into_iter().filter(|&x| x % 2 == 0)
}

There are several ways around this, one of them is using trait objects:

fn filter_even<'a, I>(source: I) -> Box<Iterator<Item=i32>+'a>
    where I: IntoIterator<Item=i32>, I::IntoIter: 'a
{
    Box::new(source.into_iter().filter(|&x| x % 2 == 0))
}

Here we hide the actual iterator type returned by filter() behind a trait object. Note that in order to make the function fully generic I had to add a lifetime parameter and a corresponding bound to Box trait object and I::IntoIter associated type. This is necessary because I::IntoIter may contain arbitrary lifetimes inside it (just like Iter<'a, T> type above), and we have to specify them in the trait object type (otherwise the lifetime information would be lost).

Trait objects created from Iterator trait implement Iterator themselves, so you can continue using these iterators as usual:

let source = vec![1_i32, 2, 3, 4];
for i in filter_even(source) {
    println!("{}", i);  // prints 2 and 4
}
Karole answered 4/6, 2015 at 10:55 Comment(8)
This is really great information and examples! Thanks so much! I have a problem with the last example, though: is.gd/MKImuQ IntoIter doesn't seem to be accessible or used like this. Can you show how to fix it? The examples consuming iterators are perfect!Stanhope
@jocull, oh, sorry, that should be IntoIterator, of course. I've updated the example and also fixed a lifetimes issue there. It works now: is.gd/7AZVstKarole
Thank you! I see that the example also changed to include lifetimes (I was running into that issue). Could you explain what the lifetime is doing here? It feels like it has something to do with moving the memory into the Box, but the whole memory model is a really new concept for me.Stanhope
@jocull, you're right about boxes, I've already added an explanation)Karole
Thanks! There's still so much to learn (lifetimes and boxes are fuzzy). For example, I'm not sure how to borrow a boxed iterator iterate over it multiple times. Perhaps it's just best practice to always push new collections into a vector? is.gd/WNZiyAStanhope
@jocull, boxing iterators do not have anything to do with multiple iterations. Any iterator can only be iterated through once. Remember, iterators are one-way cursors, once they reach the end, they become useless. If you want to iterate over something multiple times, you have to store it in some "stable" form, like a collection.Karole
Perfect, that's exactly the explanation I needed. That would likely explain why @Veedrac's commented example ( is.gd/lBZCVO ) shows the iterators being cloned.Stanhope
Well, some iterators can be cloned, but the example you linked to does not have "iterators being cloned". cloned() is just another iterator transformation method which is described here. It is useful to obtain Iterator<Item=T> from Iterator<Item=&T> if T is cloneable.Karole
C
6

Here is the full version of Map, and here is the function that builds it.

A minimal implementation would look something like

fn map<I, E, B, F>(i: I, f: F) -> Map<I, F> where
    F: FnMut(E) -> B,
    I: Iterator<Item=E>
{
    Map {iter: i, f: f}
}

pub struct Map<I, F> {
    iter: I,
    f: F,
}

impl<B, I: Iterator, F> Iterator for Map<I, F> where F: FnMut(I::Item) -> B {
    type Item = B;

    fn next(&mut self) -> Option<B> {
        self.iter.next().map(|a| (self.f)(a))
    }
}

Playpen link. Note that the map used inside the iterator is the method on Option; this isn't recursively defined!

It's not too convenient to write, but boy is it fast!


Now, to write this for an arbitrary "enumerable" type one would change map to

fn map<I, E, B, F>(i: I, f: F) -> Map<I::IntoIter, F> where
    F: FnMut(E) -> B,
    I: IntoIterator<Item=E>
{
    Map {iter: i.into_iter(), f: f}
}

IntoIterator is basically IEnumerable, only instead of GetEnumerator there's into_iter.

Crystallite answered 3/6, 2015 at 21:59 Comment(3)
I'm failing to wrap my brain around this I think. I don't understand how the Iterator and IntoIter traits can exist, but not be a valid input or return type -- I would expect at least a Box or Borrow of them to work (because the size is not known). I would really love an example of this where code isn't used or modified from the std lib. Could you perhaps show an example of returning a my_vec.map(...) operation without first collecting it into a Vec? Is that possible?Stanhope
I tried setting up something to use &Iterator<Item=i32> as an argument and got close, but still borrow errors. is.gd/00LPZ6Stanhope
@jocull: next() takes &mut self, so the iterator need be mutable; why don't you take it by value like in the example provided by Veedrac?Chlorohydrin
S
0

Implement the Iterator trait for the struct that should serve as iterator. You only need to implement the next method. The other methods have default implementations.

It is not possible to create an iterator that works with any container. The type system machinery needed for this doesn't exist yet.

Stoicism answered 3/6, 2015 at 21:15 Comment(3)
I was mostly aiming for iterating things like Vec or LinkedList generically, not iterating a custom struct.Stanhope
"It is not possible to create an iterator that works with any container." → Just implement it for IntoIterator.Crystallite
@Crystallite Can you explain IntoIterator at all? There's so many traits!Stanhope
C
0

I am too learning Rust coming from a C# background. Here is how I would implement it:

fn evens<'a>(input: impl Iterator<Item = &'a i32>) -> impl Iterator<Item = &'a i32> {
    input.filter(|x| (*x % 2) == 0)
}

and here is the test

#[test]
fn test_evens() {
    let input = vec![1, 2, 3];

    let mut iter = evens(input.iter());
    assert_eq!(iter.next(), Some(&2));
    assert_eq!(iter.next(), None);
}
Camisole answered 23/10, 2022 at 16:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.