Can someone suggest a way to create batches of a certain size in LINQ?
Ideally I want to be able to perform operations in chunks of some configurable amount.
Can someone suggest a way to create batches of a certain size in LINQ?
Ideally I want to be able to perform operations in chunks of some configurable amount.
An Enumerable.Chunk()
extension method was added to .NET 6.0.
Example:
var list = new List<int> { 1, 2, 3, 4, 5, 6, 7 };
var chunks = list.Chunk(3);
// returns { { 1, 2, 3 }, { 4, 5, 6 }, { 7 } }
For those who cannot upgrade, the source is available on GitHub.
You don't need to write any code. Use MoreLINQ Batch method, which batches the source sequence into sized buckets (MoreLINQ is available as a NuGet package you can install):
int size = 10;
var batches = sequence.Batch(size);
Which is implemented as:
public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(
this IEnumerable<TSource> source, int size)
{
TSource[] bucket = null;
var count = 0;
foreach (var item in source)
{
if (bucket == null)
bucket = new TSource[size];
bucket[count++] = item;
if (count != size)
continue;
yield return bucket;
bucket = null;
count = 0;
}
if (bucket != null && count > 0)
yield return bucket.Take(count).ToArray();
}
Batch(new int[] { 1, 2 }, 1000000)
–
Aleutian ToArray()
there: yield return bucket.Take(count).ToArray();
–
Nalepka ToList
is preferable to ToArray
in most situations. –
Slough ToList
will grab extra space for the underlying array. It could use twice more memory depending on batch size. ToArray
on the other hand returns an array with exactly the required count of items. Cost is the usage of Array.Copy
once –
Tigges ToArray
is effectively implemented by ToList().ToArray()
because an extra allocation is required to return an array of exactly the right size when the count is unknown. Since you are doing TSource[].Take().ToArray()
, the count will pass through the Take
and the extra allocation will not occur, in .Net Core. If you were using .Net, you would incur the extra allocation. I prefer to assume ToList
is always more efficient. –
Slough An Enumerable.Chunk()
extension method was added to .NET 6.0.
Example:
var list = new List<int> { 1, 2, 3, 4, 5, 6, 7 };
var chunks = list.Chunk(3);
// returns { { 1, 2, 3 }, { 4, 5, 6 }, { 7 } }
For those who cannot upgrade, the source is available on GitHub.
public static class MyExtensions
{
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> items,
int maxItems)
{
return items.Select((item, inx) => new { item, inx })
.GroupBy(x => x.inx / maxItems)
.Select(g => g.Select(x => x.item));
}
}
and the usage would be:
List<int> list = new List<int>() { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
foreach(var batch in list.Batch(3))
{
Console.WriteLine(String.Join(",",batch));
}
OUTPUT:
0,1,2
3,4,5
6,7,8
9
GroupBy
starts enumeration, doesn't it have to fully enumerate its source? This loses lazy evaluation of the source and thus, in some cases, all of the benefit of batching! –
Tartu If you start with sequence
defined as an IEnumerable<T>
, and you know that it can safely be enumerated multiple times (e.g. because it is an array or a list), you can just use this simple pattern to process the elements in batches:
while (sequence.Any())
{
var batch = sequence.Take(10);
sequence = sequence.Skip(10);
// do whatever you need to do with each batch here
}
This is a fully lazy, low overhead, one-function implementation of Batch that doesn't do any accumulation and instead forwards iteration steps directly to the source IEnumerable
, similar to python's itertools.GroupBy
.
This design eliminates copying and buffering, which is nice, but has the following consequences:
If these conditions are violated, that is, trying to access elements out of order, a second time, or via a saved iterator, .NET will throw an exception such as InvalidOperationException: Enumeration already finished.
. Note that the design and type, IEnumerable<IEnumerable<T>>
, naturally implies these conditions; make up your own mind whether that counts as "exception unsafe" or "misuse".
You can test a complete sample at .NET Fiddle.
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size)
{
if (source == null)
throw new ArgumentNullException("source");
if (size <= 0)
throw new ArgumentOutOfRangeException("size", "Must be greater than zero.");
using (var enumerator = source.GetEnumerator())
while (enumerator.MoveNext())
{
int i = 0;
// Batch is a local iterator function closing over `i` and
// `enumerator` that executes the inner batch enumeration
IEnumerable<T> Batch()
{
do yield return enumerator.Current;
while (++i < size && enumerator.MoveNext());
}
yield return Batch();
// discard skipped items
while (++i < size && enumerator.MoveNext());
}
}
// Buffer() explicitly buffers the contents of intermediate IEnumerables, lifting the conditions of enumerating in order etc.
// but loses the memory and overhead benefits of not copying/buffering.
public static IEnumerable<IReadOnlyList<T>> Buffer<T>(this IEnumerable<IEnumerable<T>> batched) => batched.Select(e => e.ToList());
This solution is based on (and fixes issues in) Nick Whaley's solution with help from EricRoller.
done
by just always calling e.Count()
after yield return e
. You would need to rearrange the loop in BatchInner to not invoke the undefined behavior source.Current
if i >= size
. This will eliminate the need to allocate a new BatchInner
for each batch. –
Kenleigh i
so this isn't necessarily more efficient than defining a separate class, but it is a little cleaner I think. –
Kenleigh i
isn't really bound to the loop body (as long as you set i=0
each iteration) you can hoist it out with Batch() definition to avoid the extra allocations like this. Compared to a modified version of yours for benchmarking (10k elements, batch size 3, counting only) it appears to reduce memory use from 640 -> 430 kb according to the fiddle stats (a weak metric, admittedly). –
Flyblown i
could get into trouble if you keep access previous iterators, it'll let previous iterators access items in later batches. It's probably best to just keep them separate like you said (I feel like this is the fourth time I've 'discovered' this haha). –
Flyblown var last = a.Batch(3).Last().ToList();
–
Rupp discard skipped items
line seems redundant. Can you tell me a case when it is not? –
Multiple enumerator
the captured variable is disposed in the outer scope and the inner i
the captured variable is "modified in the outer scope". I know R# warnings are not be sneezed at though I ignored these until eventually one day, I get a null ref using this code - and I somehow doubt the code I'm using to call it, could cause it null ref at System.Linq.Enumerable.<SelectManyIterator>d__23
3.MoveNext() at Core.Extensions.LinqExtensions.<>c__DisplayClass0_21.<<Batch>g__Batch|0>d.MoveNext()
–
Dashing source
? Was that it? –
Flyblown All of the above perform terribly with large batches or low memory space. Had to write my own that will pipeline (notice no item accumulation anywhere):
public static class BatchLinq {
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size) {
if (size <= 0)
throw new ArgumentOutOfRangeException("size", "Must be greater than zero.");
using (IEnumerator<T> enumerator = source.GetEnumerator())
while (enumerator.MoveNext())
yield return TakeIEnumerator(enumerator, size);
}
private static IEnumerable<T> TakeIEnumerator<T>(IEnumerator<T> source, int size) {
int i = 0;
do
yield return source.Current;
while (++i < size && source.MoveNext());
}
}
Edit: Known issue with this approach is that each batch must be enumerated and enumerated fully before moving to the next batch. For example this doesn't work:
//Select first item of every 100 items
Batch(list, 100).Select(b => b.First())
ArgumentOutOfRangeException
won't throw until the enumerable is enumerated (it should throw right away at creation time). Also, name the inner method BatchImpl
or BatchInner
or something like that so it at least has the same name as the method calling it. –
Tartu I wonder why nobody has ever posted an old school for-loop solution. Here is one:
List<int> source = Enumerable.Range(1,23).ToList();
int batchsize = 10;
for (int i = 0; i < source.Count; i+= batchsize)
{
var batch = source.Skip(i).Take(batchsize);
}
This simplicity is possible because the Take method:
... enumerates
source
and yields elements untilcount
elements have been yielded orsource
contains no more elements. Ifcount
exceeds the number of elements insource
, all elements ofsource
are returned
Disclaimer:
Using Skip and Take inside the loop means that the enumerable will be enumerated multiple times. This is dangerous if the enumerable is deferred. It may result in multiple executions of a database query, or a web request, or a file read. This example is explicitly for the usage of a List which is not deferred, so it is less of a problem. It is still a slow solution since skip will enumerate the collection each time it is called.
This can also be solved using the GetRange
method, but it requires an extra calculation to extract a possible rest batch:
for (int i = 0; i < source.Count; i += batchsize)
{
int remaining = source.Count - i;
var batch = remaining > batchsize ? source.GetRange(i, batchsize) : source.GetRange(i, remaining);
}
Here is a third way to handle this, which works with 2 loops. This ensures that the collection is enumerated only 1 time!:
int batchsize = 10;
List<int> batch = new List<int>(batchsize);
for (int i = 0; i < source.Count; i += batchsize)
{
// calculated the remaining items to avoid an OutOfRangeException
batchsize = source.Count - i > batchsize ? batchsize : source.Count - i;
for (int j = i; j < i + batchsize; j++)
{
batch.Add(source[j]);
}
batch.Clear();
}
Skip
and Take
inside the loop means that the enumerable will be enumerated multiple times. This is dangerous if the enumerable is deferred. It may result in multiple executions of a database query, or a web request, or a file read. In your example you have a List
which is not deferred, so it is less of a problem. –
Pyszka Here is an attempted improvement of Nick Whaley's (link) and infogulch's (link) lazy Batch
implementations. This one is strict. You either enumerate the batches in the correct order, or you get an exception.
public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(
this IEnumerable<TSource> source, int size)
{
if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size));
using (var enumerator = source.GetEnumerator())
{
int i = 0;
while (enumerator.MoveNext())
{
if (i % size != 0) throw new InvalidOperationException(
"The enumeration is out of order.");
i++;
yield return GetBatch();
}
IEnumerable<TSource> GetBatch()
{
while (true)
{
yield return enumerator.Current;
if (i % size == 0 || !enumerator.MoveNext()) break;
i++;
}
}
}
}
And here is a lazy Batch
implementation for sources of type IList<T>
. This one imposes no restrictions on the enumeration. The batches can be enumerated partially, in any order, and more than once. The restriction of not modifying the collection during the enumeration is still in place though. This is achieved by making a dummy call to enumerator.MoveNext()
before yielding any chunk or element. The downside is that the enumerator is left undisposed, since it is unknown when the enumeration is going to finish.
public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(
this IList<TSource> source, int size)
{
if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size));
var enumerator = source.GetEnumerator();
for (int i = 0; i < source.Count; i += size)
{
enumerator.MoveNext();
yield return GetChunk(i, Math.Min(i + size, source.Count));
}
IEnumerable<TSource> GetChunk(int from, int toExclusive)
{
for (int j = from; j < toExclusive; j++)
{
enumerator.MoveNext();
yield return source[j];
}
}
}
Same approach as MoreLINQ, but using List instead of Array. I haven't done benchmarking, but readability matters more to some people:
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size)
{
List<T> batch = new List<T>();
foreach (var item in source)
{
batch.Add(item);
if (batch.Count >= size)
{
yield return batch;
batch.Clear();
}
}
if (batch.Count > 0)
{
yield return batch;
}
}
size
parameter to your new List
to optimize its size. –
Tartu batch.Clear();
with batch = new List<T>();
–
Slough Here's the cleanest version of Batch
that I can come up with:
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int count)
{
if (source == null) throw new System.ArgumentNullException("source");
if (count <= 0) throw new System.ArgumentOutOfRangeException("count");
using (var enumerator = source.GetEnumerator())
{
IEnumerable<T> BatchInner()
{
int counter = 0;
do
yield return enumerator.Current;
while (++counter < count && enumerator.MoveNext());
}
while (enumerator.MoveNext())
yield return BatchInner().ToArray();
}
}
Using this code:
Console.WriteLine(String.Join(Environment.NewLine,
Enumerable.Range(0, 20).Batch(8).Select(xs => String.Join(",", xs))));
I get:
0,1,2,3,4,5,6,7
8,9,10,11,12,13,14,15
16,17,18,19
It's important to note that on the answers from "" & "" that this code fails:
var e = Enumerable.Range(0, 20).Batch(8).ToArray();
Console.WriteLine(String.Join(Environment.NewLine, e.Select(xs => String.Join(",", xs))));
Console.WriteLine();
Console.WriteLine(String.Join(Environment.NewLine, e.Select(xs => String.Join(",", xs))));
On their answers it gives:
19
19
19
19
19
19
Due to the inner enumerable not being computed as an array.
ToArray
, but it also seems to work without. Are there case that require this ToArray
call? –
Torsi .ToArray()
the code only runs once for var e = Enumerable.Range(0, 20).Batch(8).ToArray();
. Try iterating over the inner enumerable with that. It effectively captures the enumerator
in BatchInner
so on subsequent executions the enumerator is done. –
Multiple So with a functional hat on, this appears trivial....but in C#, there are some significant downsides.
you'd probably view this as an unfold of IEnumerable (google it and you'll probably end up in some Haskell docs, but there may be some F# stuff using unfold, if you know F#, squint at the Haskell docs and it will make sense).
Unfold is related to fold ("aggregate") except rather than iterating through the input IEnumerable, it iterates through the output data structures (its a similar relationship between IEnumerable and IObservable, in fact I think IObservable does implement an "unfold" called generate...)
anyway first you need an unfold method, I think this works (unfortunately it will eventually blow the stack for large "lists"...you can write this safely in F# using yield! rather than concat);
static IEnumerable<T> Unfold<T, U>(Func<U, IEnumerable<Tuple<U, T>>> f, U seed)
{
var maybeNewSeedAndElement = f(seed);
return maybeNewSeedAndElement.SelectMany(x => new[] { x.Item2 }.Concat(Unfold(f, x.Item1)));
}
this is a bit obtuse because C# doesn't implement some of the things functional langauges take for granted...but it basically takes a seed and then generates a "Maybe" answer of the next element in the IEnumerable and the next seed (Maybe doesn't exist in C#, so we've used IEnumerable to fake it), and concatenates the rest of the answer (I can't vouch for the "O(n?)" complexity of this).
Once you've done that then;
static IEnumerable<IEnumerable<T>> Batch<T>(IEnumerable<T> xs, int n)
{
return Unfold(ys =>
{
var head = ys.Take(n);
var tail = ys.Skip(n);
return head.Take(1).Select(_ => Tuple.Create(tail, head));
},
xs);
}
it all looks quite clean...you take the "n" elements as the "next" element in the IEnumerable, and the "tail" is the rest of the unprocessed list.
if there is nothing in the head...you're over...you return "Nothing" (but faked as an empty IEnumerable>)...else you return the head element and the tail to process.
you probably can do this using IObservable, there's probably a "Batch" like method already there, and you can probably use that.
If the risk of stack overflows worries (it probably should), then you should implement in F# (and there's probably some F# library (FSharpX?) already with this).
(I have only done some rudimentary tests of this, so there may be the odd bugs in there).
Maybe
can exist in C# - see for example Option
in Language Ext. –
Endostosis I'm joining this very late but i found something more interesting.
So we can use here Skip
and Take
for better performance.
public static class MyExtensions
{
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> items, int maxItems)
{
return items.Select((item, index) => new { item, index })
.GroupBy(x => x.index / maxItems)
.Select(g => g.Select(x => x.item));
}
public static IEnumerable<T> Batch2<T>(this IEnumerable<T> items, int skip, int take)
{
return items.Skip(skip).Take(take);
}
}
Next I checked with 100000 records. The looping only is taking more time in case of Batch
Code Of console application.
static void Main(string[] args)
{
List<string> Ids = GetData("First");
List<string> Ids2 = GetData("tsriF");
Stopwatch FirstWatch = new Stopwatch();
FirstWatch.Start();
foreach (var batch in Ids2.Batch(5000))
{
// Console.WriteLine("Batch Ouput:= " + string.Join(",", batch));
}
FirstWatch.Stop();
Console.WriteLine("Done Processing time taken:= "+ FirstWatch.Elapsed.ToString());
Stopwatch Second = new Stopwatch();
Second.Start();
int Length = Ids2.Count;
int StartIndex = 0;
int BatchSize = 5000;
while (Length > 0)
{
var SecBatch = Ids2.Batch2(StartIndex, BatchSize);
// Console.WriteLine("Second Batch Ouput:= " + string.Join(",", SecBatch));
Length = Length - BatchSize;
StartIndex += BatchSize;
}
Second.Stop();
Console.WriteLine("Done Processing time taken Second:= " + Second.Elapsed.ToString());
Console.ReadKey();
}
static List<string> GetData(string name)
{
List<string> Data = new List<string>();
for (int i = 0; i < 100000; i++)
{
Data.Add(string.Format("{0} {1}", name, i.ToString()));
}
return Data;
}
Time taken Is like this.
First - 00:00:00.0708 , 00:00:00.0660
Second (Take and Skip One) - 00:00:00.0008, 00:00:00.0008
GroupBy
fully enumerates before it produces a single row. This is not a good way to do batching. –
Tartu foreach (var batch in Ids2.Batch(5000))
to var gourpBatch = Ids2.Batch(5000)
and check the timed results. or add tolist to var SecBatch = Ids2.Batch2(StartIndex, BatchSize);
i would be interested if your results for timing change. –
Halfblood I wrote a custom IEnumerable implementation that works without linq and guarantees a single enumeration over the data. It also accomplishes all this without requiring backing lists or arrays that cause memory explosions over large data sets.
Here are some basic tests:
[Fact]
public void ShouldPartition()
{
var ints = new List<int> {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
var data = ints.PartitionByMaxGroupSize(3);
data.Count().Should().Be(4);
data.Skip(0).First().Count().Should().Be(3);
data.Skip(0).First().ToList()[0].Should().Be(0);
data.Skip(0).First().ToList()[1].Should().Be(1);
data.Skip(0).First().ToList()[2].Should().Be(2);
data.Skip(1).First().Count().Should().Be(3);
data.Skip(1).First().ToList()[0].Should().Be(3);
data.Skip(1).First().ToList()[1].Should().Be(4);
data.Skip(1).First().ToList()[2].Should().Be(5);
data.Skip(2).First().Count().Should().Be(3);
data.Skip(2).First().ToList()[0].Should().Be(6);
data.Skip(2).First().ToList()[1].Should().Be(7);
data.Skip(2).First().ToList()[2].Should().Be(8);
data.Skip(3).First().Count().Should().Be(1);
data.Skip(3).First().ToList()[0].Should().Be(9);
}
The Extension Method to partition the data.
/// <summary>
/// A set of extension methods for <see cref="IEnumerable{T}"/>.
/// </summary>
public static class EnumerableExtender
{
/// <summary>
/// Splits an enumerable into chucks, by a maximum group size.
/// </summary>
/// <param name="source">The source to split</param>
/// <param name="maxSize">The maximum number of items per group.</param>
/// <typeparam name="T">The type of item to split</typeparam>
/// <returns>A list of lists of the original items.</returns>
public static IEnumerable<IEnumerable<T>> PartitionByMaxGroupSize<T>(this IEnumerable<T> source, int maxSize)
{
return new SplittingEnumerable<T>(source, maxSize);
}
}
This is the implementing class
using System.Collections;
using System.Collections.Generic;
internal class SplittingEnumerable<T> : IEnumerable<IEnumerable<T>>
{
private readonly IEnumerable<T> backing;
private readonly int maxSize;
private bool hasCurrent;
private T lastItem;
public SplittingEnumerable(IEnumerable<T> backing, int maxSize)
{
this.backing = backing;
this.maxSize = maxSize;
}
public IEnumerator<IEnumerable<T>> GetEnumerator()
{
return new Enumerator(this, this.backing.GetEnumerator());
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
private class Enumerator : IEnumerator<IEnumerable<T>>
{
private readonly SplittingEnumerable<T> parent;
private readonly IEnumerator<T> backingEnumerator;
private NextEnumerable current;
public Enumerator(SplittingEnumerable<T> parent, IEnumerator<T> backingEnumerator)
{
this.parent = parent;
this.backingEnumerator = backingEnumerator;
this.parent.hasCurrent = this.backingEnumerator.MoveNext();
if (this.parent.hasCurrent)
{
this.parent.lastItem = this.backingEnumerator.Current;
}
}
public bool MoveNext()
{
if (this.current == null)
{
this.current = new NextEnumerable(this.parent, this.backingEnumerator);
return true;
}
else
{
if (!this.current.IsComplete)
{
using (var enumerator = this.current.GetEnumerator())
{
while (enumerator.MoveNext())
{
}
}
}
}
if (!this.parent.hasCurrent)
{
return false;
}
this.current = new NextEnumerable(this.parent, this.backingEnumerator);
return true;
}
public void Reset()
{
throw new System.NotImplementedException();
}
public IEnumerable<T> Current
{
get { return this.current; }
}
object IEnumerator.Current
{
get { return this.Current; }
}
public void Dispose()
{
}
}
private class NextEnumerable : IEnumerable<T>
{
private readonly SplittingEnumerable<T> splitter;
private readonly IEnumerator<T> backingEnumerator;
private int currentSize;
public NextEnumerable(SplittingEnumerable<T> splitter, IEnumerator<T> backingEnumerator)
{
this.splitter = splitter;
this.backingEnumerator = backingEnumerator;
}
public bool IsComplete { get; private set; }
public IEnumerator<T> GetEnumerator()
{
return new NextEnumerator(this.splitter, this, this.backingEnumerator);
}
IEnumerator IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
private class NextEnumerator : IEnumerator<T>
{
private readonly SplittingEnumerable<T> splitter;
private readonly NextEnumerable parent;
private readonly IEnumerator<T> enumerator;
private T currentItem;
public NextEnumerator(SplittingEnumerable<T> splitter, NextEnumerable parent, IEnumerator<T> enumerator)
{
this.splitter = splitter;
this.parent = parent;
this.enumerator = enumerator;
}
public bool MoveNext()
{
this.parent.currentSize += 1;
this.currentItem = this.splitter.lastItem;
var hasCcurent = this.splitter.hasCurrent;
this.parent.IsComplete = this.parent.currentSize > this.splitter.maxSize;
if (this.parent.IsComplete)
{
return false;
}
if (hasCcurent)
{
var result = this.enumerator.MoveNext();
this.splitter.lastItem = this.enumerator.Current;
this.splitter.hasCurrent = result;
}
return hasCcurent;
}
public void Reset()
{
throw new System.NotImplementedException();
}
public T Current
{
get { return this.currentItem; }
}
object IEnumerator.Current
{
get { return this.Current; }
}
public void Dispose()
{
}
}
}
}
Another way is using Rx Buffer operator
//using System.Linq;
//using System.Reactive.Linq;
//using System.Reactive.Threading.Tasks;
var observableBatches = anAnumerable.ToObservable().Buffer(size);
var batches = aList.ToObservable().Buffer(size).ToList().ToTask().GetAwaiter().GetResult();
GetAwaiter().GetResult()
. This is a code smell for synchronous code forcefully calling async code. –
Audiovisual Just another one line implementation. It works even with an empty list, in this case you get a zero size batches collection.
var aList = Enumerable.Range(1, 100).ToList(); //a given list
var size = 9; //the wanted batch size
//number of batches are: (aList.Count() + size - 1) / size;
var batches = Enumerable.Range(0, (aList.Count() + size - 1) / size).Select(i => aList.GetRange( i * size, Math.Min(size, aList.Count() - i * size)));
Assert.True(batches.Count() == 12);
Assert.AreEqual(batches.ToList().ElementAt(0), new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9 });
Assert.AreEqual(batches.ToList().ElementAt(1), new List<int>() { 10, 11, 12, 13, 14, 15, 16, 17, 18 });
Assert.AreEqual(batches.ToList().ElementAt(11), new List<int>() { 100 });
An easy version to use and understand.
public static List<List<T>> chunkList<T>(List<T> listToChunk, int batchSize)
{
List<List<T>> batches = new List<List<T>>();
if (listToChunk.Count == 0) return batches;
bool moreRecords = true;
int fromRecord = 0;
int countRange = 0;
if (listToChunk.Count >= batchSize)
{
countRange = batchSize;
}
else
{
countRange = listToChunk.Count;
}
while (moreRecords)
{
List<T> batch = listToChunk.GetRange(fromRecord, countRange);
batches.Add(batch);
if ((fromRecord + batchSize) >= listToChunk.Count)
{
moreRecords = false;
}
fromRecord = fromRecord + batch.Count;
if ((fromRecord + batchSize) > listToChunk.Count)
{
countRange = listToChunk.Count - fromRecord;
}
else
{
countRange = batchSize;
}
}
return batches;
}
As a new helper method for LINQ in .NET 6 you can chunk any IEnumerable into batches:
int chunkNumber = 1;
foreach (int[] chunk in Enumerable.Range(0, 9).Chunk(3))
{
Console.WriteLine($"Chunk {chunkNumber++}");
foreach (var item in chunk)
{
Console.WriteLine(item);
}
}
Here is an implementation that uses Async iteration in C# via IAsyncEnumerable
- https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/tutorials/generate-consume-asynchronous-stream
public static class EnumerableExtensions
{
/// <summary>
/// Chunks a sequence into a sub-sequences each containing maxItemsPerChunk, except for the last
/// which will contain any items left over.
///
/// NOTE: this implements a streaming implementation via <seealso cref="IAsyncEnumerable{T}"/>.
/// </summary>
public static async IAsyncEnumerable<IEnumerable<T>> ChunkAsync<T>(this IAsyncEnumerable<T> sequence, int maxItemsPerChunk)
{
if (sequence == null) throw new ArgumentNullException(nameof(sequence));
if (maxItemsPerChunk <= 0)
{
throw new ArgumentOutOfRangeException(nameof(maxItemsPerChunk), $"{nameof(maxItemsPerChunk)} must be greater than 0");
}
var chunk = new List<T>(maxItemsPerChunk);
await foreach (var item in sequence)
{
chunk.Add(item);
if (chunk.Count == maxItemsPerChunk)
{
yield return chunk.ToArray();
chunk.Clear();
}
}
// return the "crumbs" that
// didn't make it into a full chunk
if (chunk.Count > 0)
{
yield return chunk.ToArray();
}
}
/// <summary>
/// Chunks a sequence into a sub-sequences each containing maxItemsPerChunk, except for the last
/// which will contain any items left over.
/// </summary>
public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> sequence, int maxItemsPerChunk)
{
if (sequence == null) throw new ArgumentNullException(nameof(sequence));
if (maxItemsPerChunk <= 0)
{
throw new ArgumentOutOfRangeException(nameof(maxItemsPerChunk), $"{nameof(maxItemsPerChunk)} must be greater than 0");
}
var chunk = new List<T>(maxItemsPerChunk);
foreach (var item in sequence)
{
chunk.Add(item);
if (chunk.Count == maxItemsPerChunk)
{
yield return chunk.ToArray();
chunk.Clear();
}
}
// return the "crumbs" that
// didn't make it into a full chunk
if (chunk.Count > 0)
{
yield return chunk.ToArray();
}
}
}
I know everybody used complex systems to do this work, and I really don't get it why.
Take and skip will allow all those operations using the common select with Func<TSource,Int32,TResult>
transform function.
Like:
public IEnumerable<IEnumerable<T>> Buffer<T>(IEnumerable<T> source, int size)=>
source.Select((item, index) => source.Skip(size * index).Take(size)).TakeWhile(bucket => bucket.Any());
source
will be iterated very often. –
Acyl Enumerable.Range(0, 1).SelectMany(_ => Enumerable.Range(0, new Random().Next()))
. –
Pyszka Another way to perform batching:
public static class Extensions
{
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
yield return func(v0, v1);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
yield return func(v0, v1, v2);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
yield return func(v0, v1, v2, v3);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v11 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v11 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v12 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v11 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v12 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v13 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v11 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v12 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v13 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v14 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14);
}
}
}
public static IEnumerable<TOut> Batch<T, TOut>(this IEnumerable<T> source, Func<T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, TOut> func)
{
using (var enumerator = source.GetEnumerator())
{
while (true)
{
bool state;
state = enumerator.MoveNext(); if (!state) break; var v0 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v1 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v2 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v3 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v4 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v5 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v6 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v7 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v8 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v9 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v10 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v11 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v12 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v13 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v14 = enumerator.Current;
state = enumerator.MoveNext(); if (!state) break; var v15 = enumerator.Current;
yield return func(v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15);
}
}
}
}
Here's an example usage:
using System;
using System.Linq;
namespace TestProgram
{
class Program
{
static void Main(string[] args)
{
foreach (var item in Enumerable.Range(0, 12).ToArray().Batch((R, X1, Y1, X2, Y2) => (R, X1, Y1, X2, Y2)))
{
Console.WriteLine($"{item.R}, {item.X1}, {item.Y1}, {item.X2}, {item.Y2}");
}
}
}
}
static IEnumerable<IEnumerable<T>> TakeBatch<T>(IEnumerable<T> ts,int batchSize)
{
return from @group in ts.Select((x, i) => new { x, i }).ToLookup(xi => xi.i / batchSize)
select @group.Select(xi => xi.x);
}
© 2022 - 2024 — McMap. All rights reserved.