Compelling examples of custom C++ allocators?

Asked 5/5, 2009 at 19:36 Answered 5/5, 2009 at 19:37

Solved c++memory-management std memory-alignment allocator

215

What are some really good reasons to ditch std::allocator in favor of a custom solution? Have you run across any situations where it was absolutely necessary for correctness, performance, scalability, etc? Any really clever examples?

Custom allocators have always been a feature of the Standard Library that I haven't had much need for. I was just wondering if anyone here on SO could provide some compelling examples to justify their existence.

Kerns answered 5/5, 2009 at 19:37 Comment(2)

boost::pool and boost::interprocess – Ionia 20/7, 2021 at 23:25

If you're super crafty, you could theoretically use RAM on a remote machine via allocators. – Ionia 20/7, 2021 at 23:55

140

As I mention here, I've seen Intel TBB's custom STL allocator significantly improve performance of a multithreaded app simply by changing a single

std::vector<T>

std::vector<T,tbb::scalable_allocator<T> >

(this is a quick and convenient way of switching the allocator to use TBB's nifty thread-private heaps; see page 59 in this document)

Duero answered 5/5, 2009 at 19:37 Comment(4)

Thanks for that second link. The use of allocators to implement thread-private heaps is clever. I like that this is a good example of where custom allocators have a clear advantage in a scenario that is not resource-limited (embed or console). – Kerns 5/5, 2009 at 21:42

The original link is now defunct, but CiteSeer has the PDF: citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.71.8289 – Murillo 4/4, 2013 at 1:6

I have to ask: Can you reliably move such a vector into another thread? (I'm guessing no) – Darton 22/9, 2014 at 14:45

@sellibitze: Since the vectors were being manipulated from within TBB tasks and reused across multiple parallel operations and there is no guarantee which TBB worker thread will pick up tasks, I conclude it works just fine. Although note that have been some historic issues with TBB freeing stuff created on one thread in another thread (apparently a classic problem with thread private heaps and producer-consumer patterns of allocation & deallocation. TBB claims it's allocator avoids these issues but I have seen otherwise. Maybe fixed in newer versions.) – Duero 22/9, 2014 at 20:27

One area where custom allocators can be useful is game development, especially on game consoles, as they have only a small amount of memory and no swap. On such systems you want to make sure that you have tight control over each subsystem, so that one uncritical system can't steal the memory from a critical one. Other things like pool allocators can help to reduce memory fragmentation. You can find a long, detailed paper on the topic at:

EASTL -- Electronic Arts Standard Template Library

Supper answered 5/5, 2009 at 19:37 Comment(1)

+1 for EASTL link: "Among game developers the most fundamental weakness [of the STL] is the std allocator design, and it is this weakness that was the largest contributing factor to the creation of EASTL." – Kerns 5/5, 2009 at 20:9

I am working on a mmap-allocator that allows vectors to use memory from a memory-mapped file. The goal is to have vectors that use storage that are directly in the virtual memory mapped by mmap. Our problem is to improve reading of really large files (>10GB) into memory with no copy overhead, therefore I need this custom allocator.

So far I have the skeleton of a custom allocator (which derives from std::allocator), I think it is a good starting point to write own allocators. Feel free to use this piece of code in whatever way you want:

#include <memory>
#include <stdio.h>

namespace mmap_allocator_namespace
{
        // See StackOverflow replies to this answer for important commentary about inheriting from std::allocator before replicating this code.
        template <typename T>
        class mmap_allocator: public std::allocator<T>
        {
public:
                typedef size_t size_type;
                typedef T* pointer;
                typedef const T* const_pointer;

                template<typename _Tp1>
                struct rebind
                {
                        typedef mmap_allocator<_Tp1> other;
                };

                pointer allocate(size_type n, const void *hint=0)
                {
                        fprintf(stderr, "Alloc %d bytes.\n", n*sizeof(T));
                        return std::allocator<T>::allocate(n, hint);
                }

                void deallocate(pointer p, size_type n)
                {
                        fprintf(stderr, "Dealloc %d bytes (%p).\n", n*sizeof(T), p);
                        return std::allocator<T>::deallocate(p, n);
                }

                mmap_allocator() throw(): std::allocator<T>() { fprintf(stderr, "Hello allocator!\n"); }
                mmap_allocator(const mmap_allocator &a) throw(): std::allocator<T>(a) { }
                template <class U>                    
                mmap_allocator(const mmap_allocator<U> &a) throw(): std::allocator<T>(a) { }
                ~mmap_allocator() throw() { }
        };
}

To use this, declare an STL container as follows:

using namespace std;
using namespace mmap_allocator_namespace;

vector<int, mmap_allocator<int> > int_vec(1024, 0, mmap_allocator<int>());

It can be used for example to log whenever memory is allocated. What is neccessary is the rebind struct, else the vector container uses the superclasses allocate/deallocate methods.

Update: The memory mapping allocator is now available at https://github.com/johannesthoma/mmap_allocator and is LGPL. Feel free to use it for your projects.

Moersch answered 5/5, 2009 at 19:37 Comment(2)

Just a heads up, deriving from std::allocator is not really the idiomatic way to write allocators. You should instead look at allocator_traits, which allows you to supply the bare minimum of functionality, and the traits class will provide the rest. Note that the STL always uses your allocator through allocator_traits, not directly, so you don't need to refer to allocator_traits yourself There's not much incentive to derive from std::allocator (although this code may be a helpful starting point regardless). – Discretional 18/6, 2015 at 15:50

@Nir good link on the topic: learn.microsoft.com/en-us/cpp/standard-library/… ... note: "Warning! At compile time, the C++ Standard Library uses the allocator_traits class to detect which members you have explicitly provided and provides a default implementation for any members that are not present. Do not interfere with this mechanism by providing a specialization of allocator_traits for your allocator!" – Grover 22/3, 2022 at 22:42

I'm working with a MySQL storage engine that uses c++ for its code. We're using a custom allocator to use the MySQL memory system rather than competing with MySQL for memory. It allows us to make sure we're using memory as the user configured MySQL to use, and not "extra".

Stannwood answered 5/5, 2009 at 19:37 Comment(0)

It can be useful to use custom allocators to use a memory pool instead of the heap. That's one example among many others.

For most cases, this is certainly a premature optimization. But it can be very useful in certain contexts (embedded devices, games, etc).

Etna answered 5/5, 2009 at 19:37 Comment(1)

Or, when that memory pool is shared. – Sophistry 19/6, 2012 at 5:14

When working with GPUs or other co-processors it is sometimes beneficial to allocate data structures in main memory in a special way. This special way of allocating memory can implemented in a custom allocator in a convenient fashion.

The reason why custom allocation through the accelerator runtime can be beneficial when using accelerators is the following:

through custom allocation the accelerator runtime or driver is notified of the memory block
in addition the operating system can make sure that the allocated block of memory is page-locked (some call this pinned memory), that is, the virtual memory subsystem of the operating system may not move or remove the page within or from memory
if 1. and 2. hold and a data transfer between a page-locked memory block and an accelerator is requested, the runtime can directly access the data in main memory since it knows where it is and it can be sure the operating system did not move/remove it
this saves one memory copy that would occur with memory that was allocated in a non-page-locked way: the data has to be copied in main memory to a page-locked staging area from with the accelerator can initialize the data transfer (through DMA)

Persecute answered 5/5, 2009 at 19:37 Comment(1)

...not to forget page aligned memory blocks. This is especially useful if you're talking to a driver (i.e. with FPGAs via DMA) and don't want the hassle and overhead of calculating in-page offsets for your DMA scatterlists. – Canebrake 19/11, 2014 at 11:11

I haven't written C++ code with a custom STL allocator, but I can imagine a webserver written in C++, which uses a custom allocator for automatic deletion of temporary data needed for responding to a HTTP request. The custom allocator can free all temporary data at once once the response has been generated.

Another possible use case for a custom allocator (which I have used) is writing a unit test to prove that that a function's behavior doesn't depend on some part of its input. The custom allocator can fill up the memory region with any pattern.

Luxury answered 5/5, 2009 at 19:37 Comment(3)

It seems like the first example is the job of the destructor, not the allocator. – Extravehicular 31/7, 2014 at 6:55

If you're worried about your program depending on the initial contents of memory from the heap, a quick (i.e. overnight!) run in valgrind will let you know one way or the other. – Collimate 2/3, 2015 at 7:38

@anthropomorphic: The destructor and the custom allocator would work together, the destructor would run first, then the delete of the custom allocator, which won't call free(...) yet, but free(...) would be called later, when serving the request has finished. This can be faster than the default allocator and reduce address space fragmentation. – Luxury 2/3, 2015 at 23:21

I'm using custom allocators here; you might even say it was to work around other custom dynamic memory management.

Background: we have overloads for malloc, calloc, free, and the various variants of operator new and delete, and the linker happily makes STL use these for us. This lets us do things like automatic small object pooling, leak detection, alloc fill, free fill, padding allocation with sentries, cache-line alignment for certain allocs, and delayed free.

The problem is, we're running in an embedded environment -- there isn't enough memory around to actually do leak detection accounting properly over an extended period. At least, not in the standard RAM -- there's another heap of RAM available elsewhere, through custom allocation functions.

Solution: write a custom allocator that uses the extended heap, and use it only in the internals of the memory leak tracking architecture... Everything else defaults to the normal new/delete overloads that do leak tracking. This avoids the tracker tracking itself (and provides a bit of extra packing functionality too, we know the size of tracker nodes).

We also use this to keep function cost profiling data, for the same reason; writing an entry for each function call and return, as well as thread switches, can get expensive fast. Custom allocator again gives us smaller allocs in a larger debug memory area.

Supersensual answered 5/5, 2009 at 19:37 Comment(0)

A custom allocator is a reasonable way to securely erase memory before it is deallocated.

template <class T>
class allocator
{
public:
    using value_type    = T;

    allocator() noexcept {}
    template <class U> allocator(allocator<U> const&) noexcept {}

    value_type*  // Use pointer if pointer is not a value_type*
    allocate(std::size_t n)
    {
        return static_cast<value_type*>(::operator new (n*sizeof(value_type)));
    }

    void
    deallocate(value_type* p, std::size_t) noexcept  // Use pointer if pointer is not a value_type*
    {
        OPENSSL_cleanse(p, n);
        ::operator delete(p);
    }
};
template <class T, class U>
bool
operator==(allocator<T> const&, allocator<U> const&) noexcept
{
    return true;
}
template <class T, class U>
bool
operator!=(allocator<T> const& x, allocator<U> const& y) noexcept
{
    return !(x == y);
}

Recommend using allocator boilerplate by Hinnant: https://howardhinnant.github.io/allocator_boilerplate.html)

Lynnell answered 5/5, 2009 at 19:37 Comment(0)

I am using a custom allocator for counting the number of allocations/deallocations in one part of my program and measuring how long it takes. There are other ways this could be achieved but this method is very convenient for me. It is especially useful that I can use the custom allocator for only a subset of my containers.

Phosphorism answered 5/5, 2009 at 19:37 Comment(0)

One essential situation: When writing code that must work across module (EXE/DLL) boundaries, it is essential to keep your allocations and deletions happening in only one module.

Where I ran into this was a Plugin architecture on Windows. It is essential that, for example, if you pass a std::string across the DLL boundary, that any reallocations of the string occur from the heap where it originated from, NOT the heap in the DLL which may be different*.

*It's more complicated than this actually, as if you are dynamically linking to the CRT this might work anyways. But if each DLL has a static link to the CRT you are heading to a world of pain, where phantom allocation errors continually occur.

Egyptology answered 5/5, 2009 at 19:37 Comment(2)

If you pass objects across DLL boundaries you should use Multi-threaded (Debug) DLL (/MD(d)) setting for both sides. C++ wasn't designed with module support in mind. Alternatively you could shield everything behind COM interfaces and use CoTaskMemAlloc. This is the best way to use plugin interfaces which are not bound to a specific compiler, STL or vendor. – Kopeck 4/3, 2016 at 16:46

The old guys rule for that is: Don't do it. Don't use STL types in DLL API. And don't pass dynamic memory free responsibility across DLL API boundaries. There is no C++ ABI - so if you treat every DLL as a C API, you avoid a whole class of potential problems. At the expense of "c++ beauty", of course. Or as the other comment suggests: Use COM. Just plain C++ is a bad idea. – Transducer 24/12, 2019 at 3:5

Obligatory link to Andrei Alexandrescu's CppCon 2015 talk on allocators:

https://www.youtube.com/watch?v=LIb3L4vKZ7U

The nice thing is that just devising them makes you think of ideas of how you would use them :-)

Muire answered 5/5, 2009 at 19:37 Comment(1)

Very good presentation from him. I'm hoping that his ideas will be implemented in the C++ standard libraries some day. I'm relatively new to writing allocators, but it seems like he has a lot of very good points about scalable architecture and efficiency, which is not only relevant to game engine programmers. – Treacherous 3/11, 2020 at 21:37

One example of I time I have used these was working with very resource constrained embedded systems. Lets say you have 2k of ram free and your program has to use some of that memory. You need to store say 4-5 sequences somewhere that's not on the stack and additionally you need to have very precise access over where these things get stored, this is a situation where you might want to write your own allocator. The default implementations can fragment the memory, this might be unacceptable if you don't have enough memory and cannot restart your program.

One project I was working on was using AVR-GCC on some low powered chips. We had to store 8 sequences of variable length but with a known maximum. The standard library implementation of the memory management is a thin wrapper around malloc/free which keeps track of where to place items with by prepending every allocated block of memory with a pointer to just past the end of that allocated piece of memory. When allocating a new piece of memory the standard allocator has to walk over each of the pieces of memory to find the next block that is available where the requested size of memory will fit. On a desktop platform this would be very fast for this few items but you have to keep in mind that some of these microcontrollers are very slow and primitive in comparison. Additionally the memory fragmentation issue was a massive problem that meant we really had no choice but to take a different approach.

So what we did was to implement our own memory pool. Each block of memory was big enough to fit the largest sequence we would need in it. This allocated fixed sized blocks of memory ahead of time and marked which blocks of memory were currently in use. We did this by keeping one 8 bit integer where each bit represented if a certain block was used. We traded off memory usage here for attempting to make the whole process faster, which in our case was justified as we were pushing this microcontroller chip close to it's maximum processing capacity.

There's a number of other times I can see writing your own custom allocator in the context of embedded systems, for example if the memory for the sequence isn't in main ram as might frequently be the case on these platforms.

Presbyterial answered 5/5, 2009 at 19:37 Comment(0)

Sometime ago I found this solution very useful to me: Fast C++11 allocator for STL containers. It slightly speeds up STL containers on VS2017 (~5x) as well as on GCC (~7x). It is a special purpose allocator based on memory pool. It can be used with STL containers only thanks to the mechanism you are asking for.

Millstream answered 5/5, 2009 at 19:37 Comment(0)

In a graphics simulation, I've seen custom allocators used for

Alignment constraints that std::allocator didn't directly support.
Minimizing fragmentation by using separate pools for short-lived (just this frame) and long-lived allocations.

Corina answered 5/5, 2009 at 19:37 Comment(0)

For shared memory it is vital that not only the container head, but also the data it contains are stored in shared memory.

The allocator of Boost::Interprocess is a good example. However, as you can read here this allone does not suffice, to make all STL containers shared memory compatible (Due to different mapping offsets in different processes, pointers might "break").

Climate answered 5/5, 2009 at 19:37 Comment(0)

I personally use Loki::Allocator / SmallObject to optimize memory usage for small objects — it show good efficiency and satisfying performance if you have to work with moderate amounts of really small objects (1 to 256 bytes). It can be up to ~30 times more efficient than standard C++ new/delete allocation if we talk about allocating moderate amounts of small objects of many different sizes. Also, there's a VC-specific solution called "QuickHeap", it brings best possible performance (allocate and deallocate operations just read and write the address of the block being allocated/returned to heap, respectively in up to 99.(9)% cases — depends on settings and initialization), but at a cost of a notable overhead — it needs two pointers per extent and one extra for each new memory block. It's a fastest possible solution for working with huge (10 000++) amounts of objects being created and deleted if you don't need a big variety of object sizes (it creates an individual pool for each object size, from 1 to 1023 bytes in current implementation, so initialization costs may belittle the overall performance boost, but one can go ahead and allocate/deallocate some dummy objects before the application enters it's performance-critical phase(s)).

The issue with the standard C++ new/delete implementation is that it's usually just a wrapper for C malloc/free allocation, and it works good for larger blocks of memory, like 1024+ bytes. It has a notable overhead in terms of performance and, sometimes, extra memory used for mapping too. So, in most cases custom allocators are implemented in a way to maximize the performance and/or minimize the amount of extra memory needed for allocating small (≤1024 bytes) objects.

Synonym answered 5/5, 2009 at 19:37 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags