Array.Copy vs Skip and Take in c#
Asked Answered
T

1

5

I was browsing this question and some similar ones:

Getting a sub-array from an existing array

Many places I read answers like this:

Getting a sub-array from an existing array

What I am wondering is why Skip and Take are not constant time operations for arrays?

In turn, if they were constant time operations, won't the Skip and Take method (without calling ToArray() at the end) have the same running time without the overhead of doing an Array.Copy, but also more space efficient?

Trev answered 9/9, 2011 at 9:17 Comment(2)
Considering you are researching this stuff here is a useful tidbit: Buffer.BlockCopy (DMA) is really fast compared to Array.Copy (O(n)) - it only works for primitives (int, float, etc.) though.Chairborne
That isn't going to help me with exactly what I'm looking at since I am using arrays of objects, but it is definitely good to know, thanks.Trev
C
5

You have to differentiate between the work that the Skip and Take methods do, and the work of consuming the data that the methods return.

The Skip and Take methods themselves are O(1) operations, as the work they do does not scale with the input size. They just set up an enumerator that is capable of returning items from the array.

It's when you use the enumerator that the work is done. That is an O(n) operation, where n is the number of items that the enumerator produces. As the enumerators read from the array, they don't contain a copy of the data, and you have to keep the data in the array intact as long as you are using the enumerator.

(If you use Skip on a collection that is not accessible by index like an array, gettting the first item is an O(n) operation, where n is the number of items skipped.)

Cheston answered 9/9, 2011 at 9:27 Comment(7)
Enumerating the array with Skip and Take is no less efficient than enumerating a copy of the array though, right? The evaluation of Skip and Take are constant time. Just FYI, in my use case the referenced array remains unchanged.Trev
@Rob Hinchliff: There is overhead in both methods. If you copy the array you have the overhead in the copying, but then you have the full benefit of looping an array, i.e. the range checks when accessing the array can be optimised away. When you use an enumerator the overhead is in the code of the enumerator, i.e. it takes slightly longer to access each item. The real benefit of the enumerator is that you don't have to keep a copy of the array in memory, which is mostly only a problem for large arrays.Cheston
Where does the additional cost of enumerating the collection returned by Take() lie? Is it something like each iteration it needs to check if the current index is higher than the count passed as a parameter to Take()? If so, why isn't that optimised away in the same way as the copied array since the range checks could be done up front here as well?Trev
@Rob Hinchliff: The overhead is mainly that it's not a collection that is returned, but an enumerator. The code calls the MoveNext method and the Current property of the enumerator, and that can't be optimised the same way as accessing an array in a loop can. So, an array access corresponds to two method calls, and while this is not very much work, it's a lot more than just accessing the array directly.Cheston
Is this because the compiler can't see that the underlying collection is in fact an array behind the IEnumerable interface returned by Take()? I guess I was hoping that the object returned by Take() would perhaps have a bit of meta data or something that persuades the runtime to use an array loop.Trev
@Rob Hinchliff: Yes, it gets too complicated for the compiler to optimise. It would be possible to build the compiler more complex so that it would be able to catch things like that, but it's not that likely that it will develop in that direction. It's good enough as it for most uses, and if you really need that extra performance you can loop the array yourself instead of using extension methods.Cheston
And it seems, with the current JITters, it doesn't matter whether the "collection ... is not accessible by index".Turnpike

© 2022 - 2024 — McMap. All rights reserved.