Will a custom allocator improve performance if many vector<T> s get constructed and destroyed?
Asked Answered
C

6

6

In the code below, many vectors with each 10 ints gets constructed with 60% chance or an existing vector gets deleted with 40% chance. Thus, there will be many calls to new/malloc and delete. Since all these vectors are of type vector<int>, can a custom allocator help here to reduce calls to new and delete and thus increase performance? The idea is that the space of a deleted vector can be reused by a newly constructed one. How would such a allocator look like?

Note: This question is about allocators, that reduces calls to new and delete.

#include <iostream>
#include <vector>
#include <random>

using namespace std;

int main() 
{
    // Random generator and distribution
    mt19937 gen(123456);
    uniform_real_distribution<> dis01(0., 1.);

    // Make or delete 10E6 vectors.
    vector< vector<int> > v; //the inner vectors will make many calls to new and delete

    v.reserve(10E5); //assume some size.

    for(int i=0; i<10E6; ++i)
    {
        if(dis01(gen)<0.6) // if true: make new sub-vector
        {
            v.emplace_back(); //new sub-vector
            v.back().reserve(10);

            for(int k=0; k<10; ++k)
                v.back().emplace_back(k); //fill it with some numbers
        }
        else // else, delete the last entry if there is one.
            if(!v.empty())
                v.pop_back();
    }

    cout<<"v.size()= "<<v.size();       
    return 0;
}
Cauley answered 21/4, 2016 at 8:37 Comment(5)
If you have known space bounds, you can also use a simple static array and just keep a bunch of counters around. Then you have no allocations at all.Workingman
@KerrekSB lets assume that the bounds are not clear. The question is specific about allocators.Cauley
Well, it sounds like a job for a stack allocator (i.e. one that assumes LIFO operation).Workingman
You could consider something like Howard Hinnant's stack_alloc: it works using a fixed size buffer and falling back to the heap if too much space is requested.Arletha
You could allocate for a 'vector pointer of vectors', and delete random indexesHolbrooke
P
1

This is for C++11. Older standards need additional stuff implemented in the allocator [1].

This is proof-of-concept code. It runs and solves the example problem but suffers from several limitations. It still demonstrates how a custom allocator can be used to improve performance in a scenario where a lot of std::vectors are created and destroyed.

PoolAlloc.hh:

template<typename T>
struct MemChunk
{
    std::size_t buf_size=0;
    T* buf=nullptr;
    T* top=nullptr;
    std::size_t used=0;
};

template<typename T>
class PoolAllocator
{
    public:
    using value_type=T;

    PoolAllocator();
    explicit PoolAllocator(std::size_t);
    PoolAllocator(PoolAllocator const&) noexcept;
    template<typename U>
        PoolAllocator(PoolAllocator<U> const&) noexcept;
    PoolAllocator(PoolAllocator&&) noexcept;
    PoolAllocator& operator=(PoolAllocator const&)=delete;
    PoolAllocator& operator=(PoolAllocator&&)=delete;
    ~PoolAllocator();

    template <typename U> 
    struct rebind 
    {
        using other=PoolAllocator<U>;
    };

    T* allocate(std::size_t);
    void deallocate(T*, std::size_t) noexcept;

    template<typename U1, typename U2>
        friend bool operator==(PoolAllocator<U1> const&, PoolAllocator<U2> const&) noexcept;

    private:
    std::vector<MemChunk<T>>* memory_=nullptr;
    int* ref_count_=nullptr;
    std::size_t default_buf_size_=0;
};

template<typename T>
PoolAllocator<T>::PoolAllocator():
    PoolAllocator{100000} {}

template<typename T>
PoolAllocator<T>::PoolAllocator(std::size_t buf_size):
    memory_{new std::vector<MemChunk<T>>},
    ref_count_{new int(0)},
    default_buf_size_{buf_size}
{
    memory_->emplace_back();
    memory_->back().buf_size=buf_size;
    memory_->back().buf=new T[buf_size];
    memory_->back().top=memory_->back().buf;
    ++(*ref_count_);
}

template<typename T>
PoolAllocator<T>::PoolAllocator(PoolAllocator const& src) noexcept:
    memory_{src.memory_},
    ref_count_{src.ref_count_},
    default_buf_size_{src.default_buf_size_}
{
    ++(*ref_count_);
}

template<typename T>
PoolAllocator<T>::PoolAllocator(PoolAllocator&& src) noexcept:
    memory_{src.memory_},
    ref_count_{src.ref_count_},
    default_buf_size_{src.default_buf_size_}
{
    src.memory_=nullptr;
    src.ref_count_=nullptr;
}

template<typename T>
template<typename U>
PoolAllocator<T>::PoolAllocator(PoolAllocator<U> const& src) noexcept:
    memory_{src.memory_},
    ref_count_{src.ref_count_},
    default_buf_size_{src.default_buf_size_}
{
    ++(*ref_count_);
}

template<typename T>
PoolAllocator<T>::~PoolAllocator()
{
    if (ref_count_!=nullptr)
    {
        --(*ref_count_);
        if (*ref_count_==0)
        {
            if (memory_!=nullptr)
            {
                for (auto& it : *memory_)
                {
                    delete[] it.buf;
                }
                delete memory_;
            }
            delete ref_count_;
        }
    }
}

template<typename T>
T* 
PoolAllocator<T>::allocate(std::size_t n)
{
    MemChunk<T>* mem_chunk=&memory_->back();
    if ((mem_chunk->used+n)>mem_chunk->buf_size)
    {
        default_buf_size_*=2;
        memory_->emplace_back();
        mem_chunk=&memory_->back();
        std::size_t buf_size=default_buf_size_;
        if (n>default_buf_size_)
        {
            buf_size=n;
        }
        mem_chunk->buf_size=buf_size;
        mem_chunk->buf=new T[mem_chunk->buf_size];
        mem_chunk->top=mem_chunk->buf;
    }
    T* r=mem_chunk->top;
    mem_chunk->top+=n;
    mem_chunk->used+=n;
    return r;
}

template<typename T>
void 
PoolAllocator<T>::deallocate(T* addr, std::size_t n) noexcept
{
    MemChunk<T>* mem_chunk=&memory_->back();
    if (mem_chunk->used>n and (mem_chunk->top-n)==addr)
    {
        mem_chunk->used-=n;
        mem_chunk->top-=n;
    }
}

template<typename U1, typename U2>
bool operator==(PoolAllocator<U1> const& lhs, PoolAllocator<U2> const& rhs) noexcept
{
    return (std::is_same<U1, U2>::value and lhs.memory_==rhs.memory_);
}

Using your example modified in the following way:

#include <iostream>
#include <vector>
#include <random>   
#include "PoolAlloc.hh"

using namespace std;

int main() 
{
    // Random generator and distribution
    mt19937 gen(123456);
    uniform_real_distribution<> dis01(0., 1.);
    PoolAllocator<int> palloc{1000000};

    // Make or delete 10E6 vectors.
    vector< vector<int, PoolAllocator<int>> > v; //the inner vectors will make many calls to new and delete

    v.reserve(10E5); //assume some size.

    for(int i=0; i<10E6; ++i)
    {
        if(dis01(gen)<0.6) // if true: make new sub-vector
        {
            v.emplace_back(palloc); //new sub-vector
            v.back().reserve(10);

            for(int k=0; k<10; ++k)
                v.back().emplace_back(k); //fill it with some numbers
        }
        else // else, delete the last entry if there is one.
            if(!v.empty())
                v.pop_back();
    }

    cout<<"v.size()= "<<v.size();   
    return 0;
}

The number of calls to malloc drops from ~6e6 down to 21. Total number of instructions drops from 3.7e9 to 2.5e9 (using -O3, measured with valgrind --tool=callgrind).

There are a couple of implementation details that will impact the performance in different situations of usage.

Currently multiple buffers are used. If one is full, another one is created. This way there never has to be a reallocation operation which would get you into a world of hurt (see comments).

The biggest question is, how to deal with deallocated memory. Currently a trivial approach is used that only makes deallocated memory available to later allocates when it is at the end of the buffer. For your example that is sufficient, as you only deallocate memory at the end of the buffer.

For more complex scenarios you will need a more sophisticated mechanism. Some data structure is needed to store the addresses and sizes of free memory chunks. Multiple concepts are possible here and their performance will vary with the concrete situation they are used in. I doubt there is a good one-size-fits-all solution here.

[1] http://howardhinnant.github.io/stack_alloc.html

Pier answered 4/5, 2016 at 17:16 Comment(4)
It is a nice effort from TFM to provide this example, however, I will strongly suggest to use a well tested library for this task, the code above has several limitations: the type T must be default constructable and the code is not even basic guarantee in case of exceptions (memory will be leaked).Araiza
I think PoolAllocator::allocate has a fatal issue. Lets say we have a bunch of vectors allocated. They all have their internal pointers pointing into our mem_chunk_. Now mem_chunk_ gets full and we reallocate its buf. And we end up with a bunch of ivalid vectors with their pointers pointing into freed memory. Or maybe I am missing something?Patristic
@Alessandro Teruzzi: As I understood the OP, he asked for a more general concept providing a concept example because he did not want to disclose too much of the concrete problem. I provided a proof-of-concept answer. As I already pointed out in my answer, the code needs several details clarified depending on the concrete situation of usage.Pier
@robyschek: Unfortunately that's very true. I'll update it later to be using the multiple buffer approach. What we would need otherwise is a record of all the objects using the memory and then inform them on reallocation. But I don't think the STL container to allocator interaction supports an operation like this. Which means the idea of fully contiguous memory being used for multiple vectors cannot be realized with allocators.Pier
M
3

You might be able to gain some performance by writing an allocator to more efficiently reuse recently freed memory, especially if all the vectors are going to be size 10. Of course, if that's the case, you'll gain more performance by using an object of fixed size. If the size of allocation for the vectors needs to be dynamic, then your problem is as abstract as general memory allocation, and you're not likely to improve on the standard allocator.

You are not at all likely to improve performance vs. STL unless you're able to leverage information which applies to your specific case but not the more general case.

A much better solution would be not to delete the vector objects, but just leave them in the vector>, maintain an iterator/pointer to the "end" of the vector (decremented instead of deleting), and then rather than emplacing an element (constructing a vector) you just advance your iterator, test for .end(), and then emplace if needed, otherwise reuse the old vector. This assumes that your program does not rely on side affects of constructor or destructor (vector doesn't, but you're not telling us your actual use case).

Mock answered 30/4, 2016 at 23:5 Comment(0)
L
3

As I understand from https://en.wikipedia.org/wiki/Allocator_(C%2B%2B), C++ allocators reduce requests for allocation and deallocation for a specific container. I assume this means creating and deleting containers still require calls to new and delete.

You might want to look at https://github.com/gperftools/gperftools. It is a replacement for malloc. It claims improvements in small object allocation especially in multi-threaded programs.

Lacagnia answered 1/5, 2016 at 0:44 Comment(1)
I have to agree with this one, tcmalloc (from gperftools) is really fast.Pennant
P
1

This is for C++11. Older standards need additional stuff implemented in the allocator [1].

This is proof-of-concept code. It runs and solves the example problem but suffers from several limitations. It still demonstrates how a custom allocator can be used to improve performance in a scenario where a lot of std::vectors are created and destroyed.

PoolAlloc.hh:

template<typename T>
struct MemChunk
{
    std::size_t buf_size=0;
    T* buf=nullptr;
    T* top=nullptr;
    std::size_t used=0;
};

template<typename T>
class PoolAllocator
{
    public:
    using value_type=T;

    PoolAllocator();
    explicit PoolAllocator(std::size_t);
    PoolAllocator(PoolAllocator const&) noexcept;
    template<typename U>
        PoolAllocator(PoolAllocator<U> const&) noexcept;
    PoolAllocator(PoolAllocator&&) noexcept;
    PoolAllocator& operator=(PoolAllocator const&)=delete;
    PoolAllocator& operator=(PoolAllocator&&)=delete;
    ~PoolAllocator();

    template <typename U> 
    struct rebind 
    {
        using other=PoolAllocator<U>;
    };

    T* allocate(std::size_t);
    void deallocate(T*, std::size_t) noexcept;

    template<typename U1, typename U2>
        friend bool operator==(PoolAllocator<U1> const&, PoolAllocator<U2> const&) noexcept;

    private:
    std::vector<MemChunk<T>>* memory_=nullptr;
    int* ref_count_=nullptr;
    std::size_t default_buf_size_=0;
};

template<typename T>
PoolAllocator<T>::PoolAllocator():
    PoolAllocator{100000} {}

template<typename T>
PoolAllocator<T>::PoolAllocator(std::size_t buf_size):
    memory_{new std::vector<MemChunk<T>>},
    ref_count_{new int(0)},
    default_buf_size_{buf_size}
{
    memory_->emplace_back();
    memory_->back().buf_size=buf_size;
    memory_->back().buf=new T[buf_size];
    memory_->back().top=memory_->back().buf;
    ++(*ref_count_);
}

template<typename T>
PoolAllocator<T>::PoolAllocator(PoolAllocator const& src) noexcept:
    memory_{src.memory_},
    ref_count_{src.ref_count_},
    default_buf_size_{src.default_buf_size_}
{
    ++(*ref_count_);
}

template<typename T>
PoolAllocator<T>::PoolAllocator(PoolAllocator&& src) noexcept:
    memory_{src.memory_},
    ref_count_{src.ref_count_},
    default_buf_size_{src.default_buf_size_}
{
    src.memory_=nullptr;
    src.ref_count_=nullptr;
}

template<typename T>
template<typename U>
PoolAllocator<T>::PoolAllocator(PoolAllocator<U> const& src) noexcept:
    memory_{src.memory_},
    ref_count_{src.ref_count_},
    default_buf_size_{src.default_buf_size_}
{
    ++(*ref_count_);
}

template<typename T>
PoolAllocator<T>::~PoolAllocator()
{
    if (ref_count_!=nullptr)
    {
        --(*ref_count_);
        if (*ref_count_==0)
        {
            if (memory_!=nullptr)
            {
                for (auto& it : *memory_)
                {
                    delete[] it.buf;
                }
                delete memory_;
            }
            delete ref_count_;
        }
    }
}

template<typename T>
T* 
PoolAllocator<T>::allocate(std::size_t n)
{
    MemChunk<T>* mem_chunk=&memory_->back();
    if ((mem_chunk->used+n)>mem_chunk->buf_size)
    {
        default_buf_size_*=2;
        memory_->emplace_back();
        mem_chunk=&memory_->back();
        std::size_t buf_size=default_buf_size_;
        if (n>default_buf_size_)
        {
            buf_size=n;
        }
        mem_chunk->buf_size=buf_size;
        mem_chunk->buf=new T[mem_chunk->buf_size];
        mem_chunk->top=mem_chunk->buf;
    }
    T* r=mem_chunk->top;
    mem_chunk->top+=n;
    mem_chunk->used+=n;
    return r;
}

template<typename T>
void 
PoolAllocator<T>::deallocate(T* addr, std::size_t n) noexcept
{
    MemChunk<T>* mem_chunk=&memory_->back();
    if (mem_chunk->used>n and (mem_chunk->top-n)==addr)
    {
        mem_chunk->used-=n;
        mem_chunk->top-=n;
    }
}

template<typename U1, typename U2>
bool operator==(PoolAllocator<U1> const& lhs, PoolAllocator<U2> const& rhs) noexcept
{
    return (std::is_same<U1, U2>::value and lhs.memory_==rhs.memory_);
}

Using your example modified in the following way:

#include <iostream>
#include <vector>
#include <random>   
#include "PoolAlloc.hh"

using namespace std;

int main() 
{
    // Random generator and distribution
    mt19937 gen(123456);
    uniform_real_distribution<> dis01(0., 1.);
    PoolAllocator<int> palloc{1000000};

    // Make or delete 10E6 vectors.
    vector< vector<int, PoolAllocator<int>> > v; //the inner vectors will make many calls to new and delete

    v.reserve(10E5); //assume some size.

    for(int i=0; i<10E6; ++i)
    {
        if(dis01(gen)<0.6) // if true: make new sub-vector
        {
            v.emplace_back(palloc); //new sub-vector
            v.back().reserve(10);

            for(int k=0; k<10; ++k)
                v.back().emplace_back(k); //fill it with some numbers
        }
        else // else, delete the last entry if there is one.
            if(!v.empty())
                v.pop_back();
    }

    cout<<"v.size()= "<<v.size();   
    return 0;
}

The number of calls to malloc drops from ~6e6 down to 21. Total number of instructions drops from 3.7e9 to 2.5e9 (using -O3, measured with valgrind --tool=callgrind).

There are a couple of implementation details that will impact the performance in different situations of usage.

Currently multiple buffers are used. If one is full, another one is created. This way there never has to be a reallocation operation which would get you into a world of hurt (see comments).

The biggest question is, how to deal with deallocated memory. Currently a trivial approach is used that only makes deallocated memory available to later allocates when it is at the end of the buffer. For your example that is sufficient, as you only deallocate memory at the end of the buffer.

For more complex scenarios you will need a more sophisticated mechanism. Some data structure is needed to store the addresses and sizes of free memory chunks. Multiple concepts are possible here and their performance will vary with the concrete situation they are used in. I doubt there is a good one-size-fits-all solution here.

[1] http://howardhinnant.github.io/stack_alloc.html

Pier answered 4/5, 2016 at 17:16 Comment(4)
It is a nice effort from TFM to provide this example, however, I will strongly suggest to use a well tested library for this task, the code above has several limitations: the type T must be default constructable and the code is not even basic guarantee in case of exceptions (memory will be leaked).Araiza
I think PoolAllocator::allocate has a fatal issue. Lets say we have a bunch of vectors allocated. They all have their internal pointers pointing into our mem_chunk_. Now mem_chunk_ gets full and we reallocate its buf. And we end up with a bunch of ivalid vectors with their pointers pointing into freed memory. Or maybe I am missing something?Patristic
@Alessandro Teruzzi: As I understood the OP, he asked for a more general concept providing a concept example because he did not want to disclose too much of the concrete problem. I provided a proof-of-concept answer. As I already pointed out in my answer, the code needs several details clarified depending on the concrete situation of usage.Pier
@robyschek: Unfortunately that's very true. I'll update it later to be using the multiple buffer approach. What we would need otherwise is a record of all the objects using the memory and then inform them on reallocation. But I don't think the STL container to allocator interaction supports an operation like this. Which means the idea of fully contiguous memory being used for multiple vectors cannot be realized with allocators.Pier
A
0

Custom allocators might solve some problems, but it's not a silver bullet. The example is not enough for to know what the best solution would be. I'm going to suggest something different, not because it's better, but because it could be better in some instance.

v.resize(10E5, std::vector<int>(10));

Instead of

v.reserve(10E5);

But then you need an iterator for the next free slot on the vector and all that good stuff.

Astrometry answered 21/4, 2016 at 9:11 Comment(2)
As I wrote above in the comments, this question is about allocators. You are right in this case there might be simpler solutions. In my actual code, something like this will not be possible.Cauley
Are the vectors of 10 ints always the same size? Do you need to run this function multiple times when the program is running?Astrometry
N
0

Have you tried existing boost pool allocator?

http://theboostcpplibraries.com/boost.pool

I think, if it comes to reuse 'std::vector's object itself' memory, it should be somewhat related to inplacement creation/pools.

Nigger answered 21/4, 2016 at 9:25 Comment(0)
G
0

The way that I see it, custom allocators really only offer a benefit over standard allocators when you know exactly how the memory will be used. In general, you are making a size/perf trade off and the custom allocator is what allows you to control this decision.

If in you example you can use page size chunks for each list then you can just keep a Free list of pages and hand them out on all future allocations. This would have a lot of memory overhead if you really only have ten ints in each list but could be a big win if the lists are larger and could be done with no calls to new or delete per int. This is essentially creating a fixed size memory pool for each list. When you are done with the list you just put the page back in the free list and use it for the next list.

Gnathion answered 5/5, 2016 at 21:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.