One large malloc versus multiple smaller reallocs

F

3

1

Sorry if this has been asked before, I haven't been able to find just what I am looking for.

I am reading fields from a list and writing them to a block of memory. I could

Walk the whole list, find the total needed size, do one malloc and THEN walk the list again and copy each field;
Walk the whole list and realloc the block of memory as I write the values;

Right now the first seems the most efficient to me (smallest number of calls). What are the pros and cons of either approach ?

Thank you for your time.

Fleshpots answered 24/10, 2010 at 12:7 Comment(0)

T

1

You're probably better off allocating a decent amount of space initially, based on what you think is the most likely maximum.

Then, if you find you need more space, don't just allocate enough for the extra, allocate a big chunk extra.

This will minimise the number of re-allocations while still only processing the list once.

By way of example, initially allocate 100K. If you then find you need more, re-allocate to 200K, even if you only need 101K.

Tainataint answered 24/10, 2010 at 12:13 Comment(6)

I would recommend looking at Alexandrescu's small object allocator for a similar idea and a very good implementation. – Chimp 24/10, 2010 at 12:16

@pax: ah, the "what factor to exponentially re-allocate by" argument. Happy days. I've seen 2, I've seen 3/2, I've seen 577/408... – Sherlock 24/10, 2010 at 12:23

I've seen the suggestion to double it each time but I generally just add the initial amount (so that toal alloc is 100K, 200K, 300K, ...). This allows for a fairly efficient expansion but you're never wasting more than that initial 100K amount. – Tainataint 24/10, 2010 at 13:49

@paxdiablo: if you don't know anything about the final size, I'd go with exponential growth and a final realloc() to the correct size; linear growth only makes sense if you know beforehand that you'll only need a few expansions for most inputs – Roquelaure 24/10, 2010 at 14:13

@Christoph: you shouldn't expect that final realloc to release any memory. – Sherlock 24/10, 2010 at 18:8

@Steve: sure, but if the amount of memory which was unnecessarily allocated is large enough, reasonable realloc() implementations will (eg on linux via using mremap()); see blog.kowalczyk.info/article/realloc-on-Windows-vs-Linux-1.html for anecdotal evidence on linear vs. exponential growth performance – Roquelaure 24/10, 2010 at 20:6

S

2

The first approach is almost always better. A realloc() typically works by copying the entire contents of a memory block into the freshly allocated, larger block. So n reallocs can mean n copies, each one larger than the last. (If you are adding m bytes to your allocation each time, then the first realloc has to copy m bytes, the next one 2m, the next 3m, ...).

The pedantic answer would be that the internal performance implications of realloc() are implementation specific, not explicitly defined by the standard, in some implementation it could work by magic fairies that move the bytes around instantly, etc etc etc - but in any realistic implementation, realloc() means a copy.

Shirleeshirleen answered 24/10, 2010 at 12:14 Comment(5)

"magic fairies that move the bytes around instantly" - "fairies" a.k.a. "the VMM". It's absurdly heavy to allocate at least a page for every allocation, just to optimize realloc, but I suppose you could do it for large allocations without a huge amount of magic required. I think you sometimes see a trick a bit like it in I/O stacks, where buffers are "handed off" from kernel to user space without a copy by re-mapping it. – Sherlock 24/10, 2010 at 12:19

Unless your malloc code is written by a mental deficient :-), you'll generally only get a copy if you can't expand the current block. That's not usually the case if the block you're trying to re-alloc is at the end of the arena. Again, you're right that it's an implementation issue but I've never seen an implementation that doesn't take advantage of the properties of the underlying data structures. If the re-alloc is the only thing you're doing, you most likely won't get a lot of copy operations. – Tainataint 24/10, 2010 at 12:19

Alarmingly, this means the realloc() provided by a certain major manufacturer's standard library must have been written by a mental deficient. That's a disturbing thought. – Shirleeshirleen 24/10, 2010 at 12:23

Which one? Please, let it be Microsoft, I do love a bit of MS bashing occasionally :-) Seriously though, there may be mitigating reasons for a particular implementation to prefer 'copy' to 'coalesce' so it would depend on the circumstances. But, if there's a big enough free block immediately following the one you're trying to realloc, and your arena implementation supports it, it's incredibly unlikely that a copy would be faster than a simple coalesce of the two blocks. – Tainataint 24/10, 2010 at 14:27

You can also look the code of realloc on Solaris, you will see that it will copy the block and does not use vmm tricks, which is logical as most implementations of malloc et al. are in user space, vmm tricks would imply switching to kernel mode. The copying is often avoided by overallocating the blocks in the first place. For small blocks it allocates 100% more, for big blocks 50%. – Motherless 24/10, 2010 at 18:16

T

1