Benefits of contiguous memory allocation

Asked 10/6, 2014 at 22:36 Answered 14/10, 2016 at 0:6

In terms of performance, what are the benefits of allocating a contiguous memory block versus separate memory blocks for a matrix? I.e., instead of writing code like this:

char **matrix = malloc(sizeof(char *) * 50);
for(i = 0; i < 50; i++)
    matrix[i] = malloc(50);

giving me 50 disparate blocks of 50 bytes each and one block of 50 pointers, if I were to instead write:

char **matrix = malloc(sizeof(char *) * 50 + 50 * 50);
char *data = matrix + sizeof(char *) * 50;
for(i = 0; i < 50; i++) {
    matrix[i] = data;
    data += 50;
}

giving me one contiguous block of data, what would the benefits be? Avoiding cache misses is the only thing I can think of, and even that's only for small amounts of data (small enough to fit on the cache), right? I've tested this on a small application and have noticed a small speed-up and was wondering why.

Dorran answered 10/6, 2014 at 22:36 Comment(9)

Try both and measure? – Scad 10/6, 2014 at 22:40

Cache lines are (typically) 64 bytes, so beyond that the cache behaviour is largely unaffected by this technique. (Although in your case, each matrix "row" is only 50 bytes, so reading one row will pull part of the next row into cache as well.) – Bartholomew 10/6, 2014 at 22:40

@KerrekSB: Measure what exactly? I can measure the performance time, and I do see an improvement using the contiguous memory; I just don't know why. My first guess was avoiding cache misses, but I want to know if that really is the only thing that is making this code faster. – Dorran 10/6, 2014 at 22:42

You gain some cache performance (depends on your hardware) and you also reduce heap fragmentation. Not a real big swinger, but if it were done throughout a large app it might amount to something. – Suffragan 10/6, 2014 at 22:42

Of course, it also depends on your access pattern to this matrix. If it's random, I'd expect to see almost no steady-state differences in performance. – Bartholomew 10/6, 2014 at 22:43

The question you posted was "what are the benefits". Isn't being a lot faster a benefit? And that's an answer you could find yourself by measuring. If you want to know why it's faster, your question should say that (and include something that says that you did indeed observe it to be faster). – Scad 10/6, 2014 at 22:43

@KerrekSB: I'd assumed the comment about avoiding cache misses made it rather obvious that I wanted to know why it was faster. If you wish to be pedantic, however, I will edit the question. – Dorran 10/6, 2014 at 22:47

You're working in C++ and you're complaining about being pedantic?!?! :-) – Scad 10/6, 2014 at 22:48

@KerrekSB: C, mate; not C++. Though I do post in the C++ tag as well. – Dorran 10/6, 2014 at 22:48

It's complicated - you need to measure.

Using an intermediate pointer instead of calculating addresses in a two-dimensional array is most likely a loss on current processors, and both of your examples do that.

Next, everything fitting into L1 cache is a big win. malloc () most likely rounds up to multiples of 64 bytes. 180 x 180 = 32,400 bytes might fit into L1 cache, while individual mallocs might allocate 180 x 192 = 34,560 bytes might not fit, especially if you add another 180 pointers.

One contiguous array means you know how the data fits into cache lines, and you know you'll have the minimum number of page table lookups in the hardware. With hundreds of mallocs, no guarantee.

Roentgenotherapy answered 10/6, 2014 at 23:3 Comment(2)

"Using an intermediate pointer instead of calculating addresses in a two-dimensional array is most likely a loss on current processors, and both of your examples do that." Not sure what this means. Are you saying I should use a 1-D vector and access the 2-D matrix as matrix[i * rows + j]? – Dorran 11/6, 2014 at 12:1

@wolfPack88, that's exactly what he means. You could also cast the pointer to the contiguous memory to a pointer to a 2D array (char (*)[50][50]). With C99 VLA it even works when the actual array dimensions are only known at run-time. – Chui 11/6, 2014 at 15:45

Watch Scott Meyers' "CPU Caches and Why You Care" presentation on Youtube. The performance gains can be entire orders of magnitude.

https://www.youtube.com/watch?v=WDIkqP4JbkE

As for the discussion above, the intermediate pointer argument died a long time ago. Compilers optimize them away. An N-Dimensional array is allocated as a flat 1D vector, ALWAYS. If you do std::vector>, THEN you might get the equivalent of an ordered forward list of vectors, but for raw arrays, they're always allocated as one long, contiguous strip in a flat manner, and multi-dimensional access reduces to pointer arithmetic the same way 1-Dimensional access does.

To access array[i][j][k] (assume width, height, depth of {A, B, C}), you add i*(BC) + (jC) + k to the address at the front of the array. You'd have to do this math manually in a 1-D representation anyway.

Lisk answered 14/10, 2016 at 0:6 Comment(0)

Recommended topics

Hot tags