Why does reusing arrays increase performance so significantly in c#?

Asked 15/6, 2010 at 14:59 Answered 15/6, 2010 at 15:5

In my code, I perform a large number of tasks, each requiring a large array of memory to temporarily store data. I have about 500 tasks. At the beginning of each task, I allocate memory for an array :

double[] tempDoubleArray = new double[M];

M is a large number depending on the precise task, typically around 2000000. Now, I do some complex calculations to fill the array, and in the end I use the array to determine the result of this task. After that, the tempDoubleArray goes out of scope.

Profiling reveals that the calls to construct the arrays are time consuming. So, I decide to try and reuse the array, by making it static and reusing it. It requires some additional juggling to figure out the minimum size of the array, requiring an extra pass through all tasks, but it works. Now, the program is much faster (from 80 sec to 22 sec for execution of all tasks).

double[] tempDoubleArray = staticDoubleArray;

However, I'm a bit in the dark of why precisely this works so well. Id say that in the original code, when the tempDoubleArray goes out of scope, it can be collected, so allocating a new array should not be that hard right?

I ask this because understanding why it works might help me figuring out other ways to achieve the same effect, and because I would like to know in what cases allocation gives performance issues.

Christenson answered 15/6, 2010 at 14:59 Comment(0)

Just because something can be collected doesn't mean that it will. In fact, were the garbage collector as aggressive as that in its collection, your performance would be significantly worse.

Bear in mind that creating an array is not just creating one variable, it's creating N variables (N being the number of elements in the array). Reusing arrays is a good bang-for-your-buck way of increasing performance, though you have to do so carefully.

To clarify, what I mean by "creating variables" specifically is allocating the space for them and performing whatever steps the runtime has to in order to make them usable (i.e. initializing the values to zero/null). Because arrays are reference types, they are stored on the heap, which makes life a little more complicated when it comes to memory allocation. Depending on the size of the array (whether or not it's over 85KB in total storage space), it will either be stored in the normal heap or the Large Object Heap. An array stored on the ordinary heap, as with all other heap objects, can trigger garbage collection and compaction of the heap (which involves shuffling around currently in-use memory in order to maximize contiguous available space). An array stored on the Large Object Heap would not trigger compaction (as the LOH is never compacted), but it could trigger premature collection by taking up another large contiguous block of memory.

Cryan answered 15/6, 2010 at 15:3 Comment(8)

It's not creating N variables - it's just allocating a chunk of memory and zero-ing out the bits. No constructors are ever called during array creation, for instance. Based on a really rough estimate of core i7 memory bandwidth, you'd expect zero-ing out those 16MB to take on the order of 1ms. – Tidy 3/9, 2012 at 16:56

@EamonNerbonne: To be clear, I said it was creating variables (which it is), not instances. – Cryan 3/9, 2012 at 17:7

But you've got to agree that's a weird thing to say. What's it mean to be creating a variable? A variable is an abstract, compile-time notion that doesn't do anything by itself at runtime. – Tidy 4/9, 2012 at 10:4

@EamonNerbonne: Sure it does. As you say, "creating a variable" involves allocating space for it. The particulars of whether or not the bits are zeroed out are a runtime-specific implementation detail. Also, because arrays are stored on the heap (as they're reference types), repeatedly declaring and discarding large arrays can have significant performance consequences, as they may be placed on the LOH (depending on size) and can cause earlier GC operations in order to free up generational space. – Cryan 4/9, 2012 at 14:7

Zero-ing out isn't an implementation details; it's the spec. All fields (and array elements) are specified to be zero initialized, whether that "makes sense" for the type or not - it's very hard to imagine an implementation that doesn't clear the memory but that still satisfies the spec. But what really bothers me is trying to understand the performance characteristics of low-level details like this (where precision matters), but then calling fields and array elements variables as if there's no difference. – Tidy 5/9, 2012 at 12:21

@EamonNerbonne: Zeroing the memory is an implementation detail; if, for example, a particular runtime chose to use a high-order bit as a flag to indicate whether or not it's been initialized and alter its behavior based upon that flag, it's free to do so. Whether or not that's wise or exists isn't the issue. I'm sorry that you don't like my particular choice of words ("creating variables" vs "allocating memory"). I chose them in order to demonstrate the distinction between the amount of memory required for one "variable" of the array's element type vs. that of the array (cont.) – Cryan 5/9, 2012 at 12:41

But obviously heap vs. stack storage semantics make that an imprecise analogy. Given the fact that the question is being asked in the first place, I did not feel that strict precision was necessary in this answer, and it would seem that a number of people would agree with that. You are, however, of course free to post your own answer to this two-year-old question. For what it's worth, you'll note that Eric Lippert frequently refers to arrays as a bunch of variables, as semantically that's exactly what they are. – Cryan 5/9, 2012 at 12:45

Fair enough: an array does act just like a bunch of variables. I guess part of my uncomfortableness stems from the way that creating a variable isn't an intrinsically costly action; it's the zero-ing bit that makes it expensive (disregarding the whole heap/stack/member distinction). (as to the implementation detail, your alternative would likely still run in linear time, so the initialization requirement is still relevant). – Tidy 6/9, 2012 at 17:32

One answer could be the large object heap - objects greater than 85KB are allocated on a different LOH, that is less frequently collected and not compacted.

See the section on performance implications

there is an allocation cost (primarily clearing out the allocated memory)
the collection cost (LOH and Gen2 are collected together - causing compaction of large objects in Gen2)

Khiva answered 15/6, 2010 at 15:4 Comment(0)

It's not always easy to allocate large blocks of memory in the presence of fragmentation. I can't say for sure, but my guess is that it's having to do some rearranging to get enough contiguous memory for such a big block of memory. As for why allocating subsequent arrays isn't faster, my guess is either that the big block gets fragmented between GC time and the next allocation OR the original block was never GCd to start with.

Hungarian answered 15/6, 2010 at 15:5 Comment(4)

Actually due to the use of virtual memory, allocating contigous blocks should not be a problem at all. – Copperplate 15/6, 2010 at 15:39

I'm not at all clear how virtual memory plays in. Doesn't the CLR maintain its own heap? If so, then OS-level memory paging will probably be nothing more than a source of unpredictable slowdowns. – Hungarian 15/6, 2010 at 15:43

They would need to be contiguous in virtual address space, so fragmentation issues can still apply. – Downcomer 15/6, 2010 at 15:43

However, with an object size of just 16MB and generally just one such object alive at a time (by the sound of it), fragmentation won't be an issue - unless some other part of the process is causing problems. – Tidy 3/9, 2012 at 16:59

Recommended topics

Hot tags