Using realloc in c++

Asked 3/11, 2010 at 16:22 Answered 3/11, 2010 at 18:39

std::realloc is dangerous in c++ if the malloc'd memory contains non-pod types. It seems the only problem is that std::realloc wont call the type destructors if it cannot grow the memory in situ.

A trivial work around would be a try_realloc function. Instead of malloc'ing new memory if it cannot be grown in situ, it would simply return false. In which case new memory could be allocated, the objects copied (or moved) to the new memory, and finally the old memory freed.

This seems supremely useful. std::vector could make great use of this, possibly avoiding all copies/reallocations.
preemptive flame retardant: Technically, that is same Big-O performance, but if vector growth is a bottle neck in your application a x2 speed up is nice even if the Big-O remains unchanged.

BUT, I cannot find any c api that works like a try_realloc.

Am I missing something? Is try_realloc not as useful as I imagine? Is there some hidden bug that makes try_realloc unusable?

Better yet, Is there some less documented API that performs like try_realloc?

NOTE: I'm obviously, in library/platform specific code here. I'm not worried as try_realloc is inherently an optimization.

Update: Following Steve Jessops comment's on whether vector would be more efficient using realloc I wrote up a proof of concept to test. The realloc-vector simulates a vector's growth pattern but has the option to realloc instead. I ran the program up to a million elements in the vector.

For comparison a vector must allocate 19 times while growing to a million elements.

The results, if the realloc-vector is the only thing using the heap the results are awesome, 3-4 allocation while growing to the size of million bytes.

If the realloc-vector is used alongside a vector that grows at 66% the speed of the realloc-vector The results are less promising, allocating 8-10 times during growth.

Finally, if the realloc-vector is used alongside a vector that grows at the same rate, the realloc-vector allocates 17-18 times. Barely saving one allocation over the standard vector behavior.

I don't doubt that a hacker could game allocation sizes to improve the savings, but I agree with Steve that the tremendous effort to write and maintain such an allocator isn't work the gain.

Maribeth answered 3/11, 2010 at 16:22 Comment(7)

It's hard to provide platform specific suggestions with no idea of the platform you want to target. – Horsepower 3/11, 2010 at 16:28

My target is gcc + Linux. However, I'm generally curious so a solution on any platform will be considered. – Maribeth 3/11, 2010 at 16:31

I can't help to think: If you want the best performance, use vector.reserve() so you don't have to grow the vector at all. – Purnell 3/11, 2010 at 16:35

@kotlinski: but you cannot always do that. Otherwise the dynamic growth property of the vector class would be redundant anyway. – Briones 3/11, 2010 at 16:50

If the copy performance of the objects your vector is holding is terrible, and you cannot use a deque for whatever reason, then maybe you should change your vector over to hold shared_ptr instances that point to the objects. That way the copy operations will become a lot cheaper. I'm not sure if unique_ptr objects can be used in standard containers but that would reduce copying overhead even further. – Headstock 3/11, 2010 at 17:35

To be able to use realloc (or similar) with non-POD (plain old data) in C++ you would not only need to be able to call destructors in case of failure, but also in the case of shrinking an array. It would also have to call the default constructor on new members of the array in the case that the array was being grown. Another thing that might need to be considered would be if moving an object might cause some problem; classes might then need to implement a move method that was sort of a destructor-reconstructor that had references to both the old and the new data but order of move might matter. – Bernoulli 3/11, 2010 at 17:43

+1 for update with concrete results! – Apposite 3/11, 2010 at 19:33

vector generally grows in large increments. You can't do that repeatedly without relocating, unless you carefully arrange things so that there's a large extent of free addresses just above the internal buffer of the vector (which in effect requires assigning whole pages, because obviously you can't have other allocations later on the same page).

So I think that in order to get a really good optimization here, you need more than a "trivial workaround" that does a cheap reallocation if possible - you have to somehow do some preparation to make it possible, and that preparation costs you address space. If you only do it for certain vectors, ones that indicate they're going to become big, then it's fairly pointless, because they can indicate with reserve() that they're going to become big. You can only do it automatically for all vectors if you have a vast address space, so that you can "waste" a big chunk of it on every vector.

As I understand it, the reason that the Allocator concept has no reallocation function is to keep it simple. If std::allocator had a try_realloc function, then either every Allocator would have to have one (which in most cases couldn't be implemented, and would just have to return false always), or else every standard container would have to be specialized for std::allocator to take advantage of it. Neither option is a great Allocator interface, although I suppose it wouldn't be a huge effort for implementers of almost all Allocator classes just to add a do-nothing try_realloc function.

If vector is slow due to re-allocation, deque might be a good replacement.

Sanguinaria answered 3/11, 2010 at 16:38 Comment(5)

I'm working under the assumption that try_realloc is trivially cheap. When a vector's capacity is exceeded instead of the default exponential growth it would use try_realloc to increase the capacity by one. When try_realloc can no longer grow without reallocation, then allocate twice the memory needed, do the standard copy/move stuff, doubling the capacity. – Maribeth 3/11, 2010 at 16:57

Generally I don't see why a try_realloc should be much faster than a malloc. – Purnell 3/11, 2010 at 17:3

@caspin: How is that any more efficient than what vector actually does, though, which is (for the sake of argument) still to double its capacity each time it relocates? All you gain, I think, is that in your version the extra memory isn't "really allocated" yet, so it's available to be allocated for some other use. If it isn't so used, then there's no difference, and if it is so used then you've actually slowed down the vector, since your try_realloc will fail earlier than my extra capacity is used up. Even in my version, with overcommit you can avoid "really using" physical memory. – Sanguinaria 3/11, 2010 at 17:3

@caspin: I think your way gains in the case where, purely by luck, there's a large extent of free memory just above the buffer, at the time the vector wants to extend into it. Which is fine, but as I say I think it's too rare to be worth building into the interface, given that deque exists to support situations where the growth is unpredictable and copy performance is awful. Maybe the authors of the standard think likewise, or maybe they have some other reason - I don't know. – Sanguinaria 3/11, 2010 at 17:11

@caspin: Fair point. If vector is willing to make platform-specific assumptions/guesses, then it could in theory ensure that its sequence of increased capacities fits neatly into the block sizes actually provided by the memory allocator, so that this bad case of vector performance never happens. That sequence might be 32, 64, 128, or might perhaps be 24, 56, 120. If you want to make this part of the portable Allocator interface, I think you're going to a lot of effort just to avoid using deque ;-) – Sanguinaria 3/11, 2010 at 17:16

You could implement something like the try_realloc you proposed, using mmap with MAP_ANONYMOUS and MAP_FIXED and mremap with MREMAP_FIXED.

Edit: just noticed that the man page for mremap even says:

mremap() uses the Linux page table scheme. mremap() changes the mapping between virtual addresses and memory pages. This can be used to implement a very efficient realloc(3).

Melanous answered 3/11, 2010 at 16:29 Comment(2)

Problem with that is that the minimum allocation with mmap is a page, and that's massive overkill for most vectors. – Deledda 3/11, 2010 at 16:39

Agreed, but I don't think there are any sub-page sized allocators that can offer this and if it's significantly smaller than the size of a page or so the performance benefits would probably be negligible anyway. – Melanous 3/11, 2010 at 16:41

realloc in C is hardly more than a convenience function; it has very little benefit for performance/reducing copies. The main exception I can think of is code that allocates a big array then reduces the size once the size needed is known - but even this might require moving data on some malloc implementations (ones which segregate blocks strictly by size) so I consider this usage of realloc really bad practice.

As long as you don't constantly reallocate your array every time you add an element, but instead grow the array exponentially (e.g. by 25%, 50%, or 100%) whenever you run out of space, just manually allocating new memory, copying, and freeing the old will yield roughly the same (and identical, in case of memory fragmentation) performance to using realloc. This is surely the approach that C++ STL implementations use, so I think your whole concern is unfounded.

Edit: The one (rare but not unheard-of) case where realloc is actually useful is for giant blocks on systems with virtual memory, where the C library interacts with the kernel to relocate whole pages to new addresses. The reason I say this is rare is because you need to be dealing with very big blocks (at least several hundred kB) before most implementations will even enter the realm of dealing with page-granularity allocation, and probably much larger (several MB maybe) before entering and exiting kernelspace to rearrange virtual memory is cheaper than simply doing the copy. Of course try_realloc would not be useful here, since the whole benefit comes from actually doing the move inexpensively.

Paid answered 3/11, 2010 at 18:39 Comment(1)

I'd think that once allocation grows to megabytes, the absolute overhead of exponential allocation strategy may start becoming an issue, so having optimization only in that size range would still be good. Another case where realloc is good is, when there's a loop which reallocates just one memory block, which should eventually end up in heap location where it can be grown without copying. And also absolute cost of copy increases with block size, so again optimization only applying to big blocks is still good. – Erasme 29/1, 2014 at 17:20

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags