realloc but only first few bytes is meaningful

F

3

8

Assume I have used ptr = malloc(old_size); to allocate a memory block with old_size bytes. Only the first header_size bytes is meaningful. I'm going to increase the size to new_size.

new_size is greater than old_size and old_size is greater than header_size.

before:

/- - - - - - - old_size - - - - - - - \
+===============+---------------------+
 \-header_size-/

after:

/- - - - - - - - - - - - - - - new_size - - - - - - - - - - - - - - - - - - -\
+===============+------------------------------------------------------------+
\- header_size-/

I don't care what is stored after ptr + header_size because I'll read some data to there.

method 1: go straight to `new_size`

ptr = realloc(ptr, new_size);

method 2: shrink to `header_size` and grow to `new_size`

ptr = realloc(ptr, header_size);
ptr = realloc(ptr, new_size);

method 3: allocate a new memory block and copy the first `header_size` bytes

void *newptr = malloc(new_size);
memcpy(newptr, ptr, header_size);
free(ptr);
ptr = newptr;

Which is faster?

Fever answered 6/11, 2012 at 8:56 Comment(1)

@nos I'll write some data beyond header_size after reallocation so I don't care what is in there. – Fever 6/11, 2012 at 9:14

C

2

It almost certainly depends on the values of old_size, new_size and header_size, and also it depends on the implementation. You'd have to pick some values and measure.

1) is probably best in the case where header_size == old_size-1 && old_size == new_size-1, since it gives you the best chance of the single realloc being basically a no-op. (2) should be only very slightly slower in that case (2 almost-no-ops being marginally slower than 1).

3) is probably best in the case where header_size == 1 && old_size == 1024*1024 && new_size == 2048*1024, because the realloc would have to move the allocation, but you avoid copying 1MB of data you don't care about. (2) should be only very slightly slower in that case.

2) is probably best when header_size is much smaller than old_size, and new_size is in a range where it's reasonably likely that the realloc will relocate, but also reasonably likely that it won't. Then you can't predict which of (1) and (3) it is that will be very slightly faster than (2).

In analyzing (2), I have assumed that realloc downwards is approximately free and returns the same pointer. This is not guaranteed. I can think of two things that can mess you up:

realloc downwards copies to a new allocation
realloc downwards splits the buffer to create a new chunk of free memory, but then when you realloc back up again the allocator doesn't merge that new free chunk straight back onto your buffer again in order to return without copying.

Either of those could make (2) significantly more expensive than (1). So it's an implementation detail whether or not (2) is a good way of hedging your bets between the advantages of (1) (sometimes avoids copying anything) and the advantages of (3) (sometimes avoids copying too much).

Btw, this kind of idle speculation about performance is more effective in order to tentatively explain your observations, than it is to tentatively predict what observations we would make in the unlikely event that we actually cared enough about performance to test it.

Furthermore, I suspect that for large allocations, the implementation might be able to do even a relocating realloc without copying anything, by re-mapping the memory to a new address. In which case they would all be fast. I haven't looked into whether implementations actually do that, though.

Canonicity answered 6/11, 2012 at 9:33 Comment(0)

H

3

Neither malloc (for the whole block) nor realloc (for the space beyond the size of the old block when increasing the size) guarantee what the memory you receive will contain so, if you want those excess bytes set to zero (for example), you'll have to do it yourself with something like:

// ptr contains current block.
void *saveptr = ptr;
ptr = realloc (ptr, new_size);
if (ptr == NULL) {
    // do something intelligent like recover saveptr and exit.
}
memset (ptr + header_size, 0, new_size - header_size);

However, since you've stated that you don't care about the content beyond the header, the fastest is almost certainly a single realloc since that's likely to be optimised under the covers.

Calling it twice for contraction and expansion, or calling malloc-new/memcpy/free-old is very unlikely to be as efficient though, as with all optimisations, you should measure, don't guess!

Keep in mind that realloc doesn't necessarily have to copy your memory at all. If the expansion can be done in place, then an intelligent heap manager will just increase the size of the block without copying anything, such as:

+-----------+   ^        +-----------+ <- At same address,
| Old block |   | Need   | New block |      no copying
|           |   | this   |           |      involved.
+-----------+   | much   |           |
| Free      |   | now.   |           |
|           |   v        +-----------+
|           |            | Free      |
|           |            |           |
+-----------+            +-----------+

Hamburger answered 6/11, 2012 at 9:0 Comment(3)

Huh? Of course realloc() guarantees that the new memory will contain as much as will fit from the old block. – Topple 6/11, 2012 at 9:4

@unwind: I meant beyond the old boundary, not the memory in the old block. I'll clarify. – Hamburger 6/11, 2012 at 9:5

@lqs, since you've stated in a comment that you don't care what the non-header contains, I've simplified my answer to suit. – Hamburger 6/11, 2012 at 9:21

C

2

It almost certainly depends on the values of old_size, new_size and header_size, and also it depends on the implementation. You'd have to pick some values and measure.

1) is probably best in the case where header_size == old_size-1 && old_size == new_size-1, since it gives you the best chance of the single realloc being basically a no-op. (2) should be only very slightly slower in that case (2 almost-no-ops being marginally slower than 1).

3) is probably best in the case where header_size == 1 && old_size == 1024*1024 && new_size == 2048*1024, because the realloc would have to move the allocation, but you avoid copying 1MB of data you don't care about. (2) should be only very slightly slower in that case.

2) is probably best when header_size is much smaller than old_size, and new_size is in a range where it's reasonably likely that the realloc will relocate, but also reasonably likely that it won't. Then you can't predict which of (1) and (3) it is that will be very slightly faster than (2).

In analyzing (2), I have assumed that realloc downwards is approximately free and returns the same pointer. This is not guaranteed. I can think of two things that can mess you up:

realloc downwards copies to a new allocation
realloc downwards splits the buffer to create a new chunk of free memory, but then when you realloc back up again the allocator doesn't merge that new free chunk straight back onto your buffer again in order to return without copying.

Either of those could make (2) significantly more expensive than (1). So it's an implementation detail whether or not (2) is a good way of hedging your bets between the advantages of (1) (sometimes avoids copying anything) and the advantages of (3) (sometimes avoids copying too much).

Btw, this kind of idle speculation about performance is more effective in order to tentatively explain your observations, than it is to tentatively predict what observations we would make in the unlikely event that we actually cared enough about performance to test it.

Furthermore, I suspect that for large allocations, the implementation might be able to do even a relocating realloc without copying anything, by re-mapping the memory to a new address. In which case they would all be fast. I haven't looked into whether implementations actually do that, though.

Canonicity answered 6/11, 2012 at 9:33 Comment(0)

R

1

That probably depends on what the sizes are and if copying is needed.

Method 1 will copy everything contained in the old block - but if you don't do that too often, you won't notice.

Method 2 will only copy what you need to keep, as you discard everything else beforehand.

Method 3 will copy unconditionally, while the others only copy if the memory block cannot be resized where it is.

Personally, I would prefer method 2 if you do this quite often, or method 1 if you do it more seldom. Respectively, I would profile which of these will be faster.

Roll answered 6/11, 2012 at 9:4 Comment(5)

Method 2 is the only one that both avoids a copy entirely, if there is free space after the block, and otherwise copies only the minimum memory. Method 1 is the most efficient in the best case, but possibly also the the least efficient in the worst case. It does depend now much data we're talking about whether the extra function overheads are worthwhile. – Yonyona 6/11, 2012 at 9:36

@Yonyona Are there actually realloc implementations that consider free blocks after the block about to be realloc'ed? It seems unlikely to me. – Bonnybonnyclabber 6/11, 2012 at 9:42

@pmr: even if they don't try to merge blocks, in the case where new_size - old_size is small there might be free space at the end of the block that can be used to satisfy the new size, due to old_size having been rounded up. – Canonicity 6/11, 2012 at 9:53

@Bonnybonnyclabber Why wouldn't the implementation extend an allocation into following free space, if there is some? Sure, somebody has to implement it that way, but there's a lot of smart people out there! In reality though, you can't even assume that reducing the size of the allocation won't result in a copy. – Yonyona 6/11, 2012 at 11:26

@Bonnybonnyclabber At least some implementations of malloc use mmap for allocations greater than one page (typically 4KiB), so extending the size of the allocation is almost a no-op. It depends how crowded the address space is, but on a 64-bit machine that's probably not a problem. – Yonyona 6/11, 2012 at 11:45

method 1: go straight to `new_size`

method 2: shrink to `header_size` and grow to `new_size`

method 3: allocate a new memory block and copy the first `header_size` bytes

Recommended topics

Hot tags

method 1: go straight to new_size

method 2: shrink to header_size and grow to new_size

method 3: allocate a new memory block and copy the first header_size bytes

Recommended topics

Hot tags

method 1: go straight to `new_size`

method 2: shrink to `header_size` and grow to `new_size`

method 3: allocate a new memory block and copy the first `header_size` bytes