Benefits of contiguous memory allocation
Asked Answered
D

2

7

In terms of performance, what are the benefits of allocating a contiguous memory block versus separate memory blocks for a matrix? I.e., instead of writing code like this:

char **matrix = malloc(sizeof(char *) * 50);
for(i = 0; i < 50; i++)
    matrix[i] = malloc(50);

giving me 50 disparate blocks of 50 bytes each and one block of 50 pointers, if I were to instead write:

char **matrix = malloc(sizeof(char *) * 50 + 50 * 50);
char *data = matrix + sizeof(char *) * 50;
for(i = 0; i < 50; i++) {
    matrix[i] = data;
    data += 50;
}

giving me one contiguous block of data, what would the benefits be? Avoiding cache misses is the only thing I can think of, and even that's only for small amounts of data (small enough to fit on the cache), right? I've tested this on a small application and have noticed a small speed-up and was wondering why.

Dorran answered 10/6, 2014 at 22:36 Comment(9)
Try both and measure?Scad
Cache lines are (typically) 64 bytes, so beyond that the cache behaviour is largely unaffected by this technique. (Although in your case, each matrix "row" is only 50 bytes, so reading one row will pull part of the next row into cache as well.)Bartholomew
@KerrekSB: Measure what exactly? I can measure the performance time, and I do see an improvement using the contiguous memory; I just don't know why. My first guess was avoiding cache misses, but I want to know if that really is the only thing that is making this code faster.Dorran
You gain some cache performance (depends on your hardware) and you also reduce heap fragmentation. Not a real big swinger, but if it were done throughout a large app it might amount to something.Suffragan
Of course, it also depends on your access pattern to this matrix. If it's random, I'd expect to see almost no steady-state differences in performance.Bartholomew
The question you posted was "what are the benefits". Isn't being a lot faster a benefit? And that's an answer you could find yourself by measuring. If you want to know why it's faster, your question should say that (and include something that says that you did indeed observe it to be faster).Scad
@KerrekSB: I'd assumed the comment about avoiding cache misses made it rather obvious that I wanted to know why it was faster. If you wish to be pedantic, however, I will edit the question.Dorran
You're working in C++ and you're complaining about being pedantic?!?! :-)Scad
@KerrekSB: C, mate; not C++. Though I do post in the C++ tag as well.Dorran
R
4

It's complicated - you need to measure.

Using an intermediate pointer instead of calculating addresses in a two-dimensional array is most likely a loss on current processors, and both of your examples do that.

Next, everything fitting into L1 cache is a big win. malloc () most likely rounds up to multiples of 64 bytes. 180 x 180 = 32,400 bytes might fit into L1 cache, while individual mallocs might allocate 180 x 192 = 34,560 bytes might not fit, especially if you add another 180 pointers.

One contiguous array means you know how the data fits into cache lines, and you know you'll have the minimum number of page table lookups in the hardware. With hundreds of mallocs, no guarantee.

Roentgenotherapy answered 10/6, 2014 at 23:3 Comment(2)
"Using an intermediate pointer instead of calculating addresses in a two-dimensional array is most likely a loss on current processors, and both of your examples do that." Not sure what this means. Are you saying I should use a 1-D vector and access the 2-D matrix as matrix[i * rows + j]?Dorran
@wolfPack88, that's exactly what he means. You could also cast the pointer to the contiguous memory to a pointer to a 2D array (char (*)[50][50]). With C99 VLA it even works when the actual array dimensions are only known at run-time.Chui
L
0

Watch Scott Meyers' "CPU Caches and Why You Care" presentation on Youtube. The performance gains can be entire orders of magnitude.

https://www.youtube.com/watch?v=WDIkqP4JbkE

As for the discussion above, the intermediate pointer argument died a long time ago. Compilers optimize them away. An N-Dimensional array is allocated as a flat 1D vector, ALWAYS. If you do std::vector>, THEN you might get the equivalent of an ordered forward list of vectors, but for raw arrays, they're always allocated as one long, contiguous strip in a flat manner, and multi-dimensional access reduces to pointer arithmetic the same way 1-Dimensional access does.

To access array[i][j][k] (assume width, height, depth of {A, B, C}), you add i*(BC) + (jC) + k to the address at the front of the array. You'd have to do this math manually in a 1-D representation anyway.

Lisk answered 14/10, 2016 at 0:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.