Get next N elements from enumerable
Asked Answered
E

10

13

Context: C# 3.0, .Net 3.5
Suppose I have a method that generates random numbers (forever):

private static IEnumerable<int> RandomNumberGenerator() {
    while (true) yield return GenerateRandomNumber(0, 100);
}

I need to group those numbers in groups of 10, so I would like something like:

foreach (IEnumerable<int> group in RandomNumberGenerator().Slice(10)) {
    Assert.That(group.Count() == 10);
}

I have defined Slice method, but I feel there should be one already defined. Here is my Slice method, just for reference:

    private static IEnumerable<T[]> Slice<T>(IEnumerable<T> enumerable, int size) {
        var result = new List<T>(size);
        foreach (var item in enumerable) {
            result.Add(item);
            if (result.Count == size) {
                yield return result.ToArray();
                result.Clear();
            }
        }
    }

Question: is there an easier way to accomplish what I'm trying to do? Perhaps Linq?

Note: above example is a simplification, in my program I have an Iterator that scans given matrix in a non-linear fashion.

EDIT: Why Skip+Take is no good.

Effectively what I want is:

var group1 = RandomNumberGenerator().Skip(0).Take(10);
var group2 = RandomNumberGenerator().Skip(10).Take(10);
var group3 = RandomNumberGenerator().Skip(20).Take(10);
var group4 = RandomNumberGenerator().Skip(30).Take(10);

without the overhead of regenerating number (10+20+30+40) times. I need a solution that will generate exactly 40 numbers and break those in 4 groups by 10.

Equalizer answered 19/8, 2010 at 17:22 Comment(3)
Why do you want to enumerate it than? Just pass number of random numbers you want to generate and than return a collection of random numbers.Deviate
Am I correct in saying that the RandomNumberGenerator code you posted is not actual code that you are using? Because it looks to me like it should only return a single value, rather than enumerating forever (for that you would need to use a looping construct).Whiney
@user93422, wouldn't it be easier to create a generator context class and provide a GetNext() and GetNext(int count) rather than trying to slice up an infinite enumeration? This is the design that the Random class uses and it's a clean way to accomplish what you're after.Ulna
P
8

I have done something similar. But I would like it to be simpler:

//Remove "this" if you don't want it to be a extension method
public static IEnumerable<IList<T>> Chunks<T>(this IEnumerable<T> xs, int size)
{
    var curr = new List<T>(size);

    foreach (var x in xs)
    {
        curr.Add(x);

        if (curr.Count == size)
        {
            yield return curr;
            curr = new List<T>(size);
        }
    }
}

I think yours are flawed. You return the same array for all your chunks/slices so only the last chunk/slice you take would have the correct data.

Addition: Array version:

public static IEnumerable<T[]> Chunks<T>(this IEnumerable<T> xs, int size)
{
    var curr = new T[size];

    int i = 0;

    foreach (var x in xs)
    {
        curr[i % size] = x;

        if (++i % size == 0)
        {
            yield return curr;
            curr = new T[size];
        }
    }
}

Addition: Linq version (not C# 2.0). As pointed out, it will not work on infinite sequences and will be a great deal slower than the alternatives:

public static IEnumerable<T[]> Chunks<T>(this IEnumerable<T> xs, int size)
{
    return xs.Select((x, i) => new { x, i })
             .GroupBy(xi => xi.i / size, xi => xi.x)
             .Select(g => g.ToArray());
}
Pargeting answered 19/8, 2010 at 17:33 Comment(13)
right, I suppose me using array instead of list is an example of premature optimization. BTW, why do you use (++i % size == 0) instead of curr.Length == size? also your curr = new List<T>() should be curr = new List<T>(size)Equalizer
Yes, I was reusing same array (worked for my case, but that might be a bug in most cases).Equalizer
I do use (size) both places, I guess you got a early version :) I don't think List will be much slower than arrays. Because then I don't have to reset index each time. It would have been a bug if you did a .ToList() or .ToArray() but if you process the data immediately then it will be fine. See #3447855Pargeting
Your right, I could have been using .Count instead. Thanks for improving :)Pargeting
well if I'd use ToList or ToArray - that would actually fix the bug, since that would create a new instance on each yield return. no?Equalizer
Yes, but I would guess that my method is faster because it ain't copying the list to an array - but I have not tested it.Pargeting
The difference in the return, I return T[] chunks, when you do IList<T> - that's what makes a difference, too bad there is no ((T[]) list)Equalizer
The array internally in a list can be larger than the actual list, so either it could return a larger array than you want or do the same as ToArray() - unless of cause the array and list size is equal.Pargeting
See my linq version. I think it looks nice. You may be able to convert it to C# 2.0Pargeting
my bad. I am using C# 3.0, not 2.0Equalizer
one caveat, linq version will not work with infinite sequence. For my real case it does what it supposed to do.Equalizer
The LINQ version is nice, but has the disadvantage of consuming all of xs before returning the first slice. GroupBy can't know that there isn't another item with a key of 0 until it has read everything. For a typical enumerable that's just different to the other implementations rather than really wrong, but for the infinite random number generator it means an infinite loop.Conway
@user93422,stevemegson Yeah, it must also be a lot slower than the alternatives. I have pointed it out in my answer.Pargeting
M
12

Are Skip and Take of any use to you?

Use a combination of the two in a loop to get what you want.

So,

list.Skip(10).Take(10);

Skips the first 10 records and then takes the next 10.

Midriff answered 19/8, 2010 at 17:26 Comment(2)
I don't want overhead of Skip/Take, I have edited the question.Equalizer
I don't think this will work how you think it will with a generator.Yamauchi
P
8

I have done something similar. But I would like it to be simpler:

//Remove "this" if you don't want it to be a extension method
public static IEnumerable<IList<T>> Chunks<T>(this IEnumerable<T> xs, int size)
{
    var curr = new List<T>(size);

    foreach (var x in xs)
    {
        curr.Add(x);

        if (curr.Count == size)
        {
            yield return curr;
            curr = new List<T>(size);
        }
    }
}

I think yours are flawed. You return the same array for all your chunks/slices so only the last chunk/slice you take would have the correct data.

Addition: Array version:

public static IEnumerable<T[]> Chunks<T>(this IEnumerable<T> xs, int size)
{
    var curr = new T[size];

    int i = 0;

    foreach (var x in xs)
    {
        curr[i % size] = x;

        if (++i % size == 0)
        {
            yield return curr;
            curr = new T[size];
        }
    }
}

Addition: Linq version (not C# 2.0). As pointed out, it will not work on infinite sequences and will be a great deal slower than the alternatives:

public static IEnumerable<T[]> Chunks<T>(this IEnumerable<T> xs, int size)
{
    return xs.Select((x, i) => new { x, i })
             .GroupBy(xi => xi.i / size, xi => xi.x)
             .Select(g => g.ToArray());
}
Pargeting answered 19/8, 2010 at 17:33 Comment(13)
right, I suppose me using array instead of list is an example of premature optimization. BTW, why do you use (++i % size == 0) instead of curr.Length == size? also your curr = new List<T>() should be curr = new List<T>(size)Equalizer
Yes, I was reusing same array (worked for my case, but that might be a bug in most cases).Equalizer
I do use (size) both places, I guess you got a early version :) I don't think List will be much slower than arrays. Because then I don't have to reset index each time. It would have been a bug if you did a .ToList() or .ToArray() but if you process the data immediately then it will be fine. See #3447855Pargeting
Your right, I could have been using .Count instead. Thanks for improving :)Pargeting
well if I'd use ToList or ToArray - that would actually fix the bug, since that would create a new instance on each yield return. no?Equalizer
Yes, but I would guess that my method is faster because it ain't copying the list to an array - but I have not tested it.Pargeting
The difference in the return, I return T[] chunks, when you do IList<T> - that's what makes a difference, too bad there is no ((T[]) list)Equalizer
The array internally in a list can be larger than the actual list, so either it could return a larger array than you want or do the same as ToArray() - unless of cause the array and list size is equal.Pargeting
See my linq version. I think it looks nice. You may be able to convert it to C# 2.0Pargeting
my bad. I am using C# 3.0, not 2.0Equalizer
one caveat, linq version will not work with infinite sequence. For my real case it does what it supposed to do.Equalizer
The LINQ version is nice, but has the disadvantage of consuming all of xs before returning the first slice. GroupBy can't know that there isn't another item with a key of 0 until it has read everything. For a typical enumerable that's just different to the other implementations rather than really wrong, but for the infinite random number generator it means an infinite loop.Conway
@user93422,stevemegson Yeah, it must also be a lot slower than the alternatives. I have pointed it out in my answer.Pargeting
W
8

Using Skip and Take would be a very bad idea. Calling Skip on an indexed collection may be fine, but calling it on any arbitrary IEnumerable<T> is liable to result in enumeration over the number of elements skipped, which means that if you're calling it repeatedly you're enumerating over the sequence an order of magnitude more times than you need to be.

Complain of "premature optimization" all you want; but that is just ridiculous.

I think your Slice method is about as good as it gets. I was going to suggest a different approach that would provide deferred execution and obviate the intermediate array allocation, but that is a dangerous game to play (i.e., if you try something like ToList on such a resulting IEnumerable<T> implementation, without enumerating over the inner collections, you'll end up in an endless loop).

(I've removed what was originally here, as the OP's improvements since posting the question have since rendered my suggestions here redundant.)

Whiney answered 19/8, 2010 at 17:45 Comment(5)
I have changed the implementation to use foreach instead, that should take care of disposing. good point though +1Equalizer
yield return result.ToArray(); ensures that every returned chunk is a unique array-object.Equalizer
@user93422: Yes, I see that we came up with the same solution to that issue. Your original code did return the same array, though, as you yourself conceded.Whiney
Why not use a foreach when you are doing just what the foreach-loop does?Pargeting
@lasseespeholt: Good question; really my code was just a modified version of what the OP had originally. I can't really give much of a reason, other than that it didn't occur to me. foreach would be more readable.Whiney
B
2

Let's see if you even need the complexity of Slice. If your random number generates is stateless, I would assume each call to it would generate unique random numbers, so perhaps this would be sufficient:

var group1 = RandomNumberGenerator().Take(10);  
var group2 = RandomNumberGenerator().Take(10);  
var group3 = RandomNumberGenerator().Take(10);  
var group4 = RandomNumberGenerator().Take(10);

Each call to Take returns a new group of 10 numbers.

Now, if your random number generator re-seeds itself with a specific value each time it's iterated, this won't work. You'll simply get the same 10 values for each group. So instead, you would use:

var generator  = RandomNumberGenerator();
var group1     = generator.Take(10);  
var group2     = generator.Take(10);  
var group3     = generator.Take(10);  
var group4     = generator.Take(10);

This maintains an instance of the generator so that you can continue retrieving values without re-seeding the generator.

Barlow answered 19/8, 2010 at 17:42 Comment(2)
yield result.ToArray() clones data into a new array, so calling Clear() on the list will not affect the returned instance of the array.Equalizer
@user93422: Yes I noticed that shorty after updating my post. I've corrected my mistake.Barlow
B
1

You could use the Skip and Take methods with any Enumerable object.

For your edit :

How about a function that takes a slice number and a slice size as a parameter?

private static IEnumerable<T> Slice<T>(IEnumerable<T> enumerable, int sliceSize, int sliceNumber) {
    return enumerable.Skip(sliceSize * sliceNumber).Take(sliceSize);
}
Bronze answered 19/8, 2010 at 17:26 Comment(1)
it will still dry-run enumerable for slices to be skipped. I'd like to avoid dry runs (performance is an official excuse, but actually it just feels wrong). Also your example enumerates T's not T[]'s as it should have (but I get the point).Equalizer
C
1

It seems like we'd prefer for an IEnumerable<T> to have a fixed position counter so that we can do

var group1 = items.Take(10);
var group2 = items.Take(10);
var group3 = items.Take(10);
var group4 = items.Take(10);

and get successive slices rather than getting the first 10 items each time. We can do that with a new implementation of IEnumerable<T> which keeps one instance of its Enumerator and returns it on every call of GetEnumerator:

public class StickyEnumerable<T> : IEnumerable<T>, IDisposable
{
    private IEnumerator<T> innerEnumerator;

    public StickyEnumerable( IEnumerable<T> items )
    {
        innerEnumerator = items.GetEnumerator();
    }

    public IEnumerator<T> GetEnumerator()
    {
        return innerEnumerator;
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return innerEnumerator;
    }

    public void Dispose()
    {
        if (innerEnumerator != null)
        {
            innerEnumerator.Dispose();
        }
    }
}

Given that class, we could implement Slice with

public static IEnumerable<IEnumerable<T>> Slices<T>(this IEnumerable<T> items, int size)
{
    using (StickyEnumerable<T> sticky = new StickyEnumerable<T>(items))
    {
        IEnumerable<T> slice;
        do
        {
            slice = sticky.Take(size).ToList();
            yield return slice;
        } while (slice.Count() == size);
    }
    yield break;
}

That works in this case, but StickyEnumerable<T> is generally a dangerous class to have around if the consuming code isn't expecting it. For example,

using (var sticky = new StickyEnumerable<int>(Enumerable.Range(1, 10)))
{
    var first = sticky.Take(2);
    var second = sticky.Take(2);
    foreach (int i in second)
    {
        Console.WriteLine(i);
    }
    foreach (int i in first)
    {
        Console.WriteLine(i);
    }
}

prints

1
2
3
4

rather than

3
4
1
2
Conway answered 19/8, 2010 at 18:47 Comment(0)
D
0

Take a look at Take(), TakeWhile() and Skip()

Deviate answered 19/8, 2010 at 17:26 Comment(0)
Y
0

I think the use of Slice() would be a bit misleading. I think of that as a means to give me a chuck of an array into a new array and not causing side effects. In this scenario you would actually move the enumerable forward 10.

A possible better approach is to just use the Linq extension Take(). I don't think you would need to use Skip() with a generator.

Edit: Dang, I have been trying to test this behavior with the following code

Note: this is wasn't really correct, I leave it here so others don't fall into the same mistake.

var numbers = RandomNumberGenerator();
var slice = numbers.Take(10);

public static IEnumerable<int> RandomNumberGenerator()
{
    yield return random.Next();
}

but the Count() for slice is alway 1. I also tried running it through a foreach loop since I know that the Linq extensions are generally lazily evaluated and it only looped once. I eventually did the code below instead of the Take() and it works:

public static IEnumerable<int> Slice(this IEnumerable<int> enumerable, int size)
{
    var list = new List<int>();
    foreach (var count in Enumerable.Range(0, size)) list.Add(enumerable.First());
    return list;
}

If you notice I am adding the First() to the list each time, but since the enumerable that is being passed in is the generator from RandomNumberGenerator() the result is different every time.

So again with a generator using Skip() is not needed since the result will be different. Looping over an IEnumerable is not always side effect free.

Edit: I'll leave the last edit just so no one falls into the same mistake, but it worked fine for me just doing this:

var numbers = RandomNumberGenerator();

var slice1 = numbers.Take(10);
var slice2 = numbers.Take(10);

The two slices were different.

Yamauchi answered 19/8, 2010 at 17:37 Comment(5)
That's what I thought, but I failed to construct an actual method using that idea. Care to produce a snippet?Equalizer
I hope that helps at least some what. I probably should brush up on generators myself.Yamauchi
@Sean: Your RandomNumberGenerator function returns a single value. You would need to put it in a loop: while (true) yield return random.Next();Whiney
how do you change Slice method so it returns IEnumerable<int[]> without calling enmerable.Count()? And does yield return instead of return, what you have right now is exactly: return enumerable.Take(size);...I think.Equalizer
I'll add my adaptation to a new answer.Yamauchi
Y
0

I had made some mistakes in my original answer but some of the points still stand. Skip() and Take() are not going to work the same with a generator as it would a list. Looping over an IEnumerable is not always side effect free. Anyway here is my take on getting a list of slices.

    public static IEnumerable<int> RandomNumberGenerator()
    {
        while(true) yield return random.Next();
    }

    public static IEnumerable<IEnumerable<int>> Slice(this IEnumerable<int> enumerable, int size, int count)
    {
        var slices = new List<List<int>>();
        foreach (var iteration in Enumerable.Range(0, count)){
            var list = new List<int>();
            list.AddRange(enumerable.Take(size));
            slices.Add(list);
        }
        return slices;
    }
Yamauchi answered 19/8, 2010 at 18:23 Comment(0)
P
0

I got this solution for the same problem:

int[] ints = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
IEnumerable<IEnumerable<int>> chunks = Chunk(ints, 2, t => t.Dump());
//won't enumerate, so won't do anything unless you force it:
chunks.ToList();

IEnumerable<T> Chunk<T, R>(IEnumerable<R> src, int n, Func<IEnumerable<R>, T> action){
  IEnumerable<R> head;
  IEnumerable<R> tail = src;
  while (tail.Any())
  {
    head = tail.Take(n);
    tail = tail.Skip(n);
    yield return action(head);
  }
}

if you just want the chunks returned, not do anything with them, use chunks = Chunk(ints, 2, t => t). What I would really like is to have to have t=>t as default action, but I haven't found out how to do that yet.

Pairoar answered 21/6, 2012 at 14:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.