LINQ Partition List into Lists of 8 members [duplicate]
Asked Answered
O

7

46

How would one take a List (using LINQ) and break it into a List of Lists partitioning the original list on every 8th entry?

I imagine something like this would involve Skip and/or Take, but I'm still pretty new to LINQ.

Edit: Using C# / .Net 3.5

Edit2: This question is phrased differently than the other "duplicate" question. Although the problems are similar, the answers in this question are superior: Both the "accepted" answer is very solid (with the yield statement) as well as Jon Skeet's suggestion to use MoreLinq (which is not recommended in the "other" question.) Sometimes duplicates are good in that they force a re-examination of a problem.

Obviate answered 22/9, 2010 at 20:34 Comment(2)
Are you using VB or C#? The presence of iterators makes a big difference.Cherri
This is not a duplicate. The other question wanted a to break the list into sublists of every n-th element, so a list with elements 0, 8, 16, 24, etc and a list with elements 1, 9, 17, 25, etc. and a list with elements 2, 10, 18, etc. This user wants to break into a list with 0..7 and a list with 8..15 and a list with 16..24, similar to pagingSpada
V
57

Use the following extension method to break the input into subsets

public static class IEnumerableExtensions
{
    public static IEnumerable<List<T>> InSetsOf<T>(this IEnumerable<T> source, int max)
    {
        List<T> toReturn = new List<T>(max);
        foreach(var item in source)
        {
                toReturn.Add(item);
                if (toReturn.Count == max)
                {
                        yield return toReturn;
                        toReturn = new List<T>(max);
                }
        }
        if (toReturn.Any())
        {
                yield return toReturn;
        }
    }
}
Viipuri answered 22/9, 2010 at 20:40 Comment(2)
I'm going to try this now as this seems quite clever... The thought of "yield return" popped into my head while mulling this over, but I couldn't see a clear way to do it... I'll let you know how this works for me.Obviate
Wow! That's really frickin' cool. I'm going with this! Thanks for the help! :-)Obviate
L
46

We have just such a method in MoreLINQ as the Batch method:

// As IEnumerable<IEnumerable<T>>
var items = list.Batch(8);

or

// As IEnumerable<List<T>>
var items = list.Batch(8, seq => seq.ToList());
Loden answered 22/9, 2010 at 20:39 Comment(5)
Cool, this is implemented very nicely, including a resultSelector (important for manipulating/ordering the inner list).Vmail
Phew. I thought perhaps I was a bit daft in not being able to figure this out. Good to see that there are some things that are "missing" from regular LINQ-to-Objects. :)Obviate
@Pretzel: It's not that this is impossible using plain old LINQ ... it's just that it's neither terribly efficient or easy to understand. See my answer for a "plain LINQ" example.Fingerbreadth
+1, Thanks for the link to this library. I'll see if I can use it in future projects.Obviate
Minor question - I happened to copy the Batch code, and Resharper warns me that in return Batch(source, size, x => x); the x is a possible multiple enumeration issue. Correct/Ignore?Ferry
F
15

You're better off using a library like MoreLinq, but if you really had to do this using "plain LINQ", you can use GroupBy:

var sequence = new[] {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};

var result = sequence.Select((x, i) => new {Group = i/8, Value = x})
                     .GroupBy(item => item.Group, g => g.Value)
                     .Select(g => g.Where(x => true));

// result is: { {1,2,3,4,5,6,7,8}, {9,10,11,12,13,14,15,16} }

Basically, we use the version of Select() that provides an index for the value being consumed, we divide the index by 8 to identify which group each value belongs to. Then we group the sequence by this grouping key. The last Select just reduces the IGrouping<> down to an IEnumerable<IEnumerable<T>> (and isn't strictly necessary since IGrouping is an IEnumerable).

It's easy enough to turn this into a reusable method by factoring our the constant 8 in the example, and replacing it with a specified parameter. It's not necessarily the most elegant solution, and it is not longer a lazy, streaming solution ... but it does work.

You could also write your own extension method using iterator blocks (yield return) which could give you better performance and use less memory than GroupBy. This is what the Batch() method of MoreLinq does IIRC.

Fingerbreadth answered 22/9, 2010 at 20:47 Comment(2)
Thanks for your input. Yeah, it doesn't seem efficient and as you can imagine, I was struggling to understand how I could do it with regular LINQ. (I'm staring at your answer right now and I really don't understand it very well.) I'll have to fiddle with it more later. (Thanks again!)Obviate
The approach using GroupBy() breaks down if the sequence you're planning on batching is going to be extremely large (or infinite). As far as how it works - it creates an anonymous object which associates each item with it's index, and then groups this into a series of sequences based on divisibility by 8 (or any other non-zero constant).Fingerbreadth
W
5

It's not at all what the original Linq designers had in mind, but check out this misuse of GroupBy:

public static IEnumerable<IEnumerable<T>> BatchBy<T>(this IEnumerable<T> items, int batchSize)
{
    var count = 0;
    return items.GroupBy(x => (count++ / batchSize)).ToList();
}

[TestMethod]
public void BatchBy_breaks_a_list_into_chunks()
{
    var values = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
    var batches = values.BatchBy(3);
    batches.Count().ShouldEqual(4);
    batches.First().Count().ShouldEqual(3);
    batches.Last().Count().ShouldEqual(1);
}

I think it wins the "golf" prize for this question. The ToList is very important since you want to make sure the grouping has actually been performed before you try doing anything with the output. If you remove the ToList, you will get some weird side effects.

Weigel answered 13/10, 2010 at 21:20 Comment(5)
For the record, Handcraftsman's "yield return"-based version performs much better, but I still like the "Hey, you're not supposed to be doing that" aspect of this code.Weigel
-1 This answer is wrong. If you replace division by modulo operator you get batchSize number of partitions which is more suited for this thread #438688Swig
I disagree. If you use modulo, then you're assigning items to the groups by the remainder. You'd get item 0 in group 0, item 1 in group 1, etc. This would totally scramble their order. Using the integer division, as I have done here, means that items 0-2 go in group 0, 3-5 go in group 1, etc. I believe this is what was intended. If you agree, please remove the -1 from my answer.Weigel
you're right. I missed the ToList part which actually executes the query before leaving. Why not use a lighter ToArray? I have to edit your answer to remove my -1. +1ed too! :)Swig
ToList is just what I used at the time, but ToArray ought to work just fine.Weigel
W
0

Take won't be very efficient, because it doesn't remove the entries taken.

why not use a simple loop:

public IEnumerable<IList<T>> Partition<T>(this/* <-- see extension methods*/ IEnumerable<T> src,int num)  
{  
    IEnumerator<T> enu=src.getEnumerator();  
    while(true)  
    {  
        List<T> result=new List<T>(num);  
        for(int i=0;i<num;i++)  
        {  
            if(!enu.MoveNext())  
            {  
                if(i>0)yield return result;  
                yield break;  
            }  
            result.Add(enu.Current);  
        }  
        yield return result;  
    }  
}
Womanhater answered 22/9, 2010 at 20:52 Comment(0)
U
0
from b in Enumerable.Range(0,8) select items.Where((x,i) => (i % 8) == b);
Uranyl answered 14/10, 2010 at 2:42 Comment(3)
+1 But note this groups every 8th item, i.e. you get {0,8,16}, {1,9,17}, ...Vasoconstrictor
This is more about splitting than partitioning - more suitable hereSwig
Furthermore this might give empty inner IEnumerables if number 8 is more than the number of items in items which may or may not be desirable. I incorporated your answer in the other thread anyway..Swig
S
0

The simplest solution is given by Mel:

public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, 
                                                       int partitionSize)
{
    int i = 0;
    return items.GroupBy(x => i++ / partitionSize).ToArray();
}

Concise but slower. The above method splits an IEnumerable into chunks of desired fixed size with total number of chunks being unimportant. To split an IEnumerable into N number of chunks of equal sizes or close to equal sizes, you could do:

public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items, 
                                                   int numOfParts)
{
    int i = 0;
    return items.GroupBy(x => i++ % numOfParts);
}

To speed up things, a straightforward approach would do:

public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, 
                                                       int partitionSize)
{
    if (partitionSize <= 0)
        throw new ArgumentOutOfRangeException("partitionSize");

    int innerListCounter = 0;
    int numberOfPackets = 0;
    foreach (var item in items)
    {
        innerListCounter++;
        if (innerListCounter == partitionSize)
        {
            yield return items.Skip(numberOfPackets * partitionSize).Take(partitionSize);
            innerListCounter = 0;
            numberOfPackets++;
        }
    }

    if (innerListCounter > 0)
        yield return items.Skip(numberOfPackets * partitionSize);
}

This is faster than anything currently on planet now :) The equivalent methods for a Split operation here

Swig answered 6/12, 2012 at 13:53 Comment(6)
whew, that is much faster! although https://mcmap.net/q/174633/-split-a-collection-into-n-parts-with-linq-duplicate started edging you out after a couple runs in linqpad... ;)Ferry
@Ferry keep in mind that that answer is a dangerous construct with side effects. Since the inner and outer runs on the same enumerator, you get results you do not expect. If you enumerate just the outer sequence, the inner sequences are no longer valid; or you can not iterate the inner sequences twice. Or simply try var x = returnedResult.ElementAt(n).ToList();, you get unexpected results.Swig
So I recently tried running some perf comparisons again (tests+results here) between just calling this method and doing something with the results, and it wasn't as fast vs some of the other suggestions. Am I missing something?Ferry
@Ferry it was wonderful you did that benchmarking. I will see to your link in a week. I need some time now. Thanks :) will let you know.Swig
Recently realized the 'straightforward' solution results in duplicate enumeration/evaluation -- see explanation gist.github.com/zaus/4cae1a5b42475a083287 which is probably also pointed out on the "already answered" questionFerry
I will see all that. Thanks. Gimme some time.Swig

© 2022 - 2024 — McMap. All rights reserved.