Custom heap pre-allocation for entity-component system

Asked 22/7, 2013 at 14:12 Answered 9/8, 2013 at 19:24

Solved c++c++11 memory memory-management entity-component-system

I have an OOP entity-component system that currently works like this:

// In the component system
struct Component { virtual void update() = 0; }
struct Entity
{
    bool alive{true};
    vector<unique_ptr<Component>> components;
    void update() { for(const auto& c : components) c->update(); }
}

// In the user application
struct MyComp : Component
{
    void update() override { ... }
}

To create new entities and components, I use C++'s usual new and delete:

// In the component system
struct Manager
{
    vector<unique_ptr<Entity>> entities;
    Entity& createEntity() 
    { 
        auto result(new Entity);
        entities.emplace_back(result);
        return *result;
    }
    template<typename TComp, typename... TArgs>
        TComp& createComponent(Entity& mEntity, TArgs... mArgs)
    {
        auto result(new TComp(forward<TArgs>(mArgs)...));
        mEntity.components.emplace_back(result);
        return result;
    }
    void removeDead() { /* remove all entities with 'alive == false' - 'delete' is called here by the 'unique_ptr' */ }
}

// In the user application
{
    Manager m;
    auto& myEntity(m.createEntity());
    auto& myComp(m.createComponent<MyComp>(myEntity));
    // Do stuff with myEntity and myComp
    m.removeDead();
}

The system works fine, and I like the syntax and flexibility. However, when continuously adding and removing entities and components to the manager, memory allocation/deallocation slows down the application. (I've profiled and determined that the slow down is caused by new and delete).

I've recently read that it's possible to pre-allocate heap memory in C++ - how can that be applied to my situation?

Desired result:

// In the user application
{
    Manager m{1000}; 
    // This manager can hold about 1000 entities with components 
    // (may not be 1000 because of dynamic component size, 
    // since the user can define it's on components, but it's ok for me)

    auto& myEntity(m.createEntity());
    auto& myComp(m.createComponent<MyComp>(myEntity));
    // Do stuff with myEntity and myComp

    m.removeDead(); 
    // No 'delete' is called here! Memory of the 'dead' entities can
    // be reused for new entity creation
}
// Manager goes out of scope: 'delete' is called here

Phillida answered 22/7, 2013 at 14:12 Comment(6)

I think you can get pre-made object pools and the like, perhaps there is one you can use in Boost library. I suggest not implementing your own, as it is tricky not to make bugs. – Foreignism 22/7, 2013 at 14:42

@NeilKirk, I'd like to not introduce a Boost dependency in my project, if possible – Phillida 22/7, 2013 at 14:52

Maybe try something silly: Just before you run your hot code, allocate a large amount of dynamic memory, make sure to write to every page at least once, and free it. Then run the profiled part of your code. That isn't generally a sensible thing to do, but I'd be curious if it makes a difference. – Streeter 25/7, 2013 at 8:18

createEntity is not exception safe by the way, and can leak memory – Dagney 29/7, 2013 at 22:21

You may like to post your complete benchmark here, so that people can take it and see how drastically their versions speed things up. – Impiety 31/7, 2013 at 14:54

Implementing a global operator new() that calls through to malloc() may already give you significant speedups. (You'll obviously also need to implement operator delete() to call through to free() for obvious reasons.) The reason is, that the implementation of operator new() is generally not the same as that for malloc(), even though they both serve the same purpose. And I for one have made the experience that the implementation of operator new() was a lot slower than that of malloc() on my system. – Impropriate 11/4, 2018 at 14:49

Using most of answers and Google as references, I implemented some pre-allocation utilities in my SSVUtils library.

Prealloc.h

Example:

using MemUnit = char;
using MemUnitPtr = MemUnit*;
using MemSize = decltype(sizeof(MemUnit)); // Should always be 1 byte

class MemBuffer
{
    Uptr<MemUnit[]> buffer;
    MemRange range;

    MemBuffer(MemSize mSize) : ... 
    { 
        // initialize buffer from mSize
    }
};

class PreAllocatorChunk
{
    protected:
        MemSize chunkSize;
        MemBuffer buffer;
        std::stack<MemRange> available;

    public:
        PreAllocatorChunk(MemSize mChunkSize, unsigned int mChunks) : ...
        {
            // Add "chunks" to to available...
        }

        template<typename T, typename... TArgs> T* create(TArgs&&... mArgs)
        {
            // create on first "chunk" using placement new
            auto toUse(available.top().begin); available.pop();
            return new (toUse) T{std::forward<TArgs>(mArgs)...};
        }
};

More pre-allocation utilities are available:

PreAllocatorDynamic: pre-allocates a big buffer, then, when creating an object, splits the buffer in two parts:
- [buffer start, buffer start + obj size)
- [buffer start + obj size, buffer end)
When an object is destroyed, its occupied memory range is set as "available". If during creation of a new object no big enough "chunk" is found, the pre-allocator tries to unify contiguous memory chunks before throwing a runtime exception. This pre-allocator is sometimes faster than new/delete, but it greatly depends on the size of pre-allocated buffer.
PreAllocatorStatic<T>: inherited from PreAllocatorChunk. Size of a chunk is equal to sizeof(T). Fastest pre-allocator, less flexible. Almost always faster than new/delete.

Phillida answered 9/8, 2013 at 19:24 Comment(0)

There are a few things you can do to make the implementation of your design scale better.

In your current implementation there are two memory allocations per Entity and Component. The first one allocates an object and the second one when the object is put into the vector. The second one happens when the vector runs out of space and allocates a bigger array and moves the old elements into the new array.

In this case the best you can do is to use intrusive lists. That is, each of Entity and Component become also list nodes. Then, after these have been allocated no extra memory allocations are necessary to put the object into a list. Use a single or double-linked list from Boost.Intrusive, or write your own. This is how Linux kernel keeps track of many different objects.

The next step is to preallocate Entity and Component elements. Preallocating could be something as simple as a global array of these, or something more sophisticated, such as Boost.Pool. There are quite a few ways to build a memory pool of objects.

Once Entity and Component are preallocated and intrusive lists are used you are done.

An example which uses boost components:

#include <boost/intrusive/list.hpp>
#include <boost/pool/pool_alloc.hpp>
#include <new>

namespace bi = boost::intrusive;

// api.h

//
// Object pooling support begin.
//
template<class T>
struct Pool
{
    static boost::pool_allocator<T> pool;
};

// Singleton. Although it is defined in the header, the linkers
// make sure there is only one instance of it in the application.
// It is instantiated on demand when Pool<T> is used.
template<class T>
boost::pool_allocator<T> Pool<T>::pool;

template<class Derived>
struct Pooled // use it on the most derived class only, not on intermediate base classes
{
    // Automatically use the object pool for plain new/delete.
    static void* operator new(size_t) { return Pool<Derived>::pool.allocate(1); }
    static void operator delete(void* p) { return Pool<Derived>::pool.deallocate(static_cast<Derived*>(p), 1); }
};
//
// Object pooling support end.
//

// Using bi::list_base_hook<bi::link_mode<bi::auto_unlink> > because it automatically
// unlinks from the list when the object is destroyed. No need to manually
// remove the object from the list when an object is about to be destroyed.

struct Component
    : bi::list_base_hook<bi::link_mode<bi::auto_unlink> > // make it an intrusive list node
{
    virtual void update() = 0;
    virtual ~Component() {}
};

struct Entity
    : bi::list_base_hook<bi::link_mode<bi::auto_unlink> > // make it an intrusive list node
    , Pooled<Entity> // optional, make it allocated from the pool
{
    bool active = false;

    bi::list<Component, bi::constant_time_size<false> > components;

    ~Entity() {
        for(auto i = components.begin(), j = components.end(); i != j;)
            delete &*i++; // i++ to make sure i stays valid after the object is destroyed
    }

    void update() {
        for(auto& c : components)
            c.update();
    }
};

struct Manager
{
    bi::list<Entity, bi::constant_time_size<false> > entities;

    ~Manager() {
        for(auto i = entities.begin(), j = entities.end(); i != j;)
            delete &*i++; // i++ to make sure i stays valid after the object is destroyed
    }

    Entity& createEntity() {
        auto result = new Entity;
        entities.push_back(*result);
        return *result;
    }

    template<typename TComp, typename... TArgs>
    TComp& createComponent(Entity& mEntity, TArgs... mArgs)
    {
        auto result = new TComp(std::forward<TArgs>(mArgs)...);
        mEntity.components.push_back(*result);
        return *result;
    }

    void removeDead() {
        for(auto i = entities.begin(), j = entities.end(); i != j;) {
            auto& entity = *i++;
            if(!entity.active)
                delete &entity;
        }
    }
};

// user.cc
struct MyComp
    : Component
    , Pooled<MyComp> // optional, make it allocated from the pool
{
    void update() override {}
};

int main() {
    Manager m;
    auto& myEntity(m.createEntity());
    auto& myComp(m.createComponent<MyComp>(myEntity));
    m.removeDead();
}

In the above example boost::pool_allocator<T> actually uses new to allocate objects and then it keeps reusing destroyed objects rather than invoking delete on them. You can do better by preallocating all objects, but there are many ways to do so depending on your requirements, so that I use boost::pool_allocator<T> for simplicity to avoid hair splitting here. You can change the implementation of Pooled<T> to something like Pooled<T, N> where N stands for the maximum number of objects, the rest of the code stays the same because it uses plain new/delete which happen to be overridden for objects allocated from a pool.

Impiety answered 24/7, 2013 at 15:21 Comment(3)

Does this solution work for components? Component is a class that gets derived by the user and has variable size. How am I supposed to pre-allocate space for objects whose size is unknown? – Phillida 31/7, 2013 at 13:46

@VittorioRomeo In this case the user has to preallocate its objects of derived classes. For example, using a memory pool per class or per object size. – Impiety 31/7, 2013 at 14:34

@VittorioRomeo added an example for you, let me know if you need more comments. – Impiety 31/7, 2013 at 20:21

C++ supports class-specific memory pools for this kind of thing. The general-purpose new/delete pair inevitably trades off among

Time spent searching for a free block of the right size to meet each request
Time spent coalescing free blocks
Time spent maintaining and perhaps reorganizing internal data structures to make the above two operations faster.

The primary way to gain speed is to avoid these tradeoffs entirely with custom allocators that - as you say - pre-allocate a big chunk of memory viewed as a simple array of free objects all of the same size. Initially these are all linked on a free list, where the link pointers occupy the first bytes of each block "overlaid" where the data will eventually go. Allocation is just unchaining a block from the head of the free list - a "pop" operation needing about 2 instructions. Deallocation is a "push:" two more instructions. In many cases, memory hardware can be set to generate a trap when the the pool is empty so there is no per-allocation overhead for detecting this error condition. (In GC systems the same trick is used to initiate collection with no overhead.)

In your case you'd need two pools: one for Entities and one for Components.

Defining your own pool allocator is not so hard, especially if your application is single threaded. See this document for a tutorial treatment.

Mouthwash answered 25/7, 2013 at 19:44 Comment(0)

Using most of answers and Google as references, I implemented some pre-allocation utilities in my SSVUtils library.

Prealloc.h

Example:

using MemUnit = char;
using MemUnitPtr = MemUnit*;
using MemSize = decltype(sizeof(MemUnit)); // Should always be 1 byte

class MemBuffer
{
    Uptr<MemUnit[]> buffer;
    MemRange range;

    MemBuffer(MemSize mSize) : ... 
    { 
        // initialize buffer from mSize
    }
};

class PreAllocatorChunk
{
    protected:
        MemSize chunkSize;
        MemBuffer buffer;
        std::stack<MemRange> available;

    public:
        PreAllocatorChunk(MemSize mChunkSize, unsigned int mChunks) : ...
        {
            // Add "chunks" to to available...
        }

        template<typename T, typename... TArgs> T* create(TArgs&&... mArgs)
        {
            // create on first "chunk" using placement new
            auto toUse(available.top().begin); available.pop();
            return new (toUse) T{std::forward<TArgs>(mArgs)...};
        }
};

More pre-allocation utilities are available:

PreAllocatorDynamic: pre-allocates a big buffer, then, when creating an object, splits the buffer in two parts:
- [buffer start, buffer start + obj size)
- [buffer start + obj size, buffer end)
When an object is destroyed, its occupied memory range is set as "available". If during creation of a new object no big enough "chunk" is found, the pre-allocator tries to unify contiguous memory chunks before throwing a runtime exception. This pre-allocator is sometimes faster than new/delete, but it greatly depends on the size of pre-allocated buffer.
PreAllocatorStatic<T>: inherited from PreAllocatorChunk. Size of a chunk is equal to sizeof(T). Fastest pre-allocator, less flexible. Almost always faster than new/delete.

Phillida answered 9/8, 2013 at 19:24 Comment(0)

One of your issues can be solved by allocating enough space in the vectors on their creation

For

vector<unique_ptr<Entity>> entities;

provide enough space in the constructor

Manager::Manager() : entities(10000)
{
    //...
}

Thus, you avoid reallocations and copying during later stages.

The second issue is the creation of your unique_ptr<Entity> pointers. Here, as you will always use default constructed objects, you could also use a pre-allocated pool of objects from which you create the pointers. Instead of calling new you would call an own class

class EntityPool
{
public:
     EntityPool(unsigned int size = 10000) : pool(size), nextEntity(0)
     {
     }
     Entity* getNext(void)
     {
         if (nextEntity != pool.size()) // if pool is exhausted create new
         {
             pool.emplace_back(Entity());
         }                 
         return pool[nextEntity++];
     }
private:
     vector<Entity> pool;
     unsigned int nextEntity; // index into the vector to the next Entity
};

struct Manager
{
    vector<unique_ptr<Entity>> entities;
    Entity& createEntity() 
    { 
        entities.emplace_back(entityPoolInstance.getNext());
        return *result;
    }
 //...

Old answered 29/7, 2013 at 10:22 Comment(6)

The author mentioned removing elements. Removing from vectors may be costly. – Impiety 29/7, 2013 at 10:27

If the number of adds and removes to the Manager are the same, then the vector is not the adequate data structure for the entities member in the manager, as removing from a vector is O(n). Then a map could be more adequate having both operations (find and remove) O(log n) complexity. I would measure it with a profiler anyway. – Old 29/7, 2013 at 11:30

Does this solution work for components? Component is a class that gets derived by the user and has variable size. How am I supposed to pre-allocate space for objects whose size is unknown? – Phillida 31/7, 2013 at 12:42

Yes it will, but per derived class you need to create a pool on its own and manage it in the Manager, entityPoolInstance and the ComponentPoolInstances can be members of the Manager. Alternative, e.g. placement new requires you to manage the heap manually which is painfull, if you want to remove objects. I also considered the removal of elements again. You really should measure, if a copy_if() on a vector wouldn't be cheaper than using maps. – Old 1/8, 2013 at 8:8

if you use vector and copy_if you will need to use shared_ptr, but on the other hand, the overhead is still smaller than using lists or maps and memory locality is much better. – Old 1/8, 2013 at 8:55

See ideone.com/h8gnk3 for an working implementation. You need to replace the PerformanceMeasurement with your own. – Old 1/8, 2013 at 12:50

Or you could wire in the standard 'placement new'. This allows you to allocate a big block of memory to construct (place) objects into as you choose. This will keep the block on the heap for as long as you need it, and allow you to allocate multiple short lived objects into this block instead of doing the costly allocations and de-allocations that just end up fragmenting the heap. There's a couple of gotcha's involved, but all in all its a REALLY simple solution without having to go down the custom memory manager route. Here's an excellent treatment of removing some of the pitfalls and describing placement new in detail. I've used data structures as simple as a stack to keep track of the next free block to allocate into: push the address of a block that's about to be deleted onto the stack. When allocating just pop the next free block off of the stack and use that as the arg to placement new. Super easy and super fast!

Pratte answered 29/7, 2013 at 22:7 Comment(1)

I tried this, as it looked a fantastic solution, but I had a lot of trouble managing object lifetime and getting it to work since my "pooled" objects have to be derived. I have a Component base class on which the user creates its own derived components, that may have very different size (in memory). If you could provide a working example of this solution with derived classes you would get the bounty :) – Phillida 29/7, 2013 at 22:18

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags