Boost pool allocator slower than new
Asked Answered
S

1

8

So i made this container allocator memory_pools class based on boost pool :

memory_pools.hpp

#ifndef MEMORY_POOL_HPP
# define MEMORY_POOLS_HPP

// boost
# include <boost/pool/pool.hpp>
# include <boost/unordered_map.hpp>

template<typename ElementType>
class   memory_pools
{
public:
  template <typename>
  friend class memory_pools;

private:
  using pool = boost::pool<>;

public:
  using value_type = ElementType;
  using pointer = value_type*;
  using const_pointer = const value_type*;
  using reference = value_type&;
  using const_reference = const value_type&;
  using size_type = pool::size_type;
  using difference_type = pool::difference_type;

public:

  template<typename OtherElementType>
  struct rebind
  {
    using other = memory_pools<OtherElementType>;
  };

public:
  memory_pools();

  template<typename SourceElement>
  memory_pools(const memory_pools<SourceElement>&);

public:
  pointer   allocate(const size_type n);
  void  deallocate(const pointer ptr, const size_type n);

  template<typename... Args>
  void  construct(pointer, Args...);
  void  destroy(pointer);

public:
  bool  operator==(const memory_pools&);
  bool  operator!=(const memory_pools&);

private:
  using pools_map = boost::unordered_map<std::size_t, std::shared_ptr<pool>>;

private:
  std::shared_ptr<pools_map>      pools_map_;
  std::shared_ptr<pool>           pool_;
};

# include <memory_pools.ipp>

#endif

memory_pools.ipp

#ifndef MEMORY_POOLS_IPP
# define MEMORY_POOLS_IPP

template<typename ElementType>
memory_pools<ElementType>::memory_pools()
  :
  pools_map_(std::make_shared<pools_map>
             (pools_map
             {
               std::make_pair
                 (sizeof(ElementType),
                  make_shared<pool>(sizeof(ElementType)))
             })),
  pool_(pools_map_->at(sizeof(ElementType)))
{
}

template<typename ElementType>
template<typename SourceElement>
memory_pools<ElementType>::memory_pools
(const memory_pools<SourceElement>& rebinded_from)
  :
  pools_map_(rebinded_from.pools_map_),
  pool_(pools_map_->insert
        (std::make_pair(sizeof(ElementType),
                        make_shared<pool>(sizeof(ElementType)))).first->second)
  {
  }

template<typename ElementType>
typename memory_pools<ElementType>::pointer memory_pools<ElementType>::allocate
(const size_type n)
{
  pointer ret = static_cast<pointer>(pool_->ordered_malloc(n));

  if ((!ret) && n)
    throw std::bad_alloc();

  return (ret);
}

template<typename ElementType>
void        memory_pools<ElementType>::deallocate
(const pointer ptr, const size_type n)
{
  pool_->ordered_free(ptr, n);
}

template<typename ElementType>
template<typename... Args>
void        memory_pools<ElementType>::construct(pointer ptr, Args... args)
{
  new (ptr) ElementType(std::forward<Args>(args)...);
}

template<typename ElementType>
void        memory_pools<ElementType>::destroy(pointer ptr)
{
  ptr->~ElementType();
}

template<typename ElementType>
bool        memory_pools<ElementType>::operator==(const memory_pools& rhs)
{
  return (pools_map_ == rhs.pools_map_);
}

template<typename ElementType>
bool        memory_pools<ElementType>::operator!=(const memory_pools& rhs)
{
  return (pools_map_ != rhs.pools_map_);
}

#endif

Then when i test it using:

#include <memory_pools.hpp>

int     main(void)
{
  using pools_type = memory_pools<std::pair<const int, int>>;
  pools_type    pools;

  boost::unordered_map<int, int, boost::hash<int>, std::equal_to<int>, pools_type>      map;
  //boost::unordered_map<int, int, boost::hash<int>, std::equal_to<int>>      map;

  for (unsigned int i = 0; i < 20000; ++i)
    {
      map[i] = i + 1;
    }

  return (0);
}

With clang3.5 on macOSX 10.10, i got:

$ time ./a.out

real    0m1.873s
user    0m1.850s
sys     0m0.009s

Whereas when i launch:

#include <memory_pools.hpp>

int     main(void)
{
  using pools_type = memory_pools<std::pair<const int, int>>;
  pools_type    pools;

  //boost::unordered_map<int, int, boost::hash<int>, std::equal_to<int>, pools_type>      map;
  boost::unordered_map<int, int, boost::hash<int>, std::equal_to<int>>      map;

  for (unsigned int i = 0; i < 20000; ++i)
    {
      map[i] = i + 1;
    }

  return (0);
}

I have:

$ time ./a.out

real    0m0.019s
user    0m0.016s
sys     0m0.002s

Question

Is memory allocation using boost pool supposed to be that slow or is my test invalid for some reason ?


EDIT

After Carmeron's comment, i added the -O3 and -DNDEBUG flags, now i have:

$time ./a.out

real    0m0.438s
user    0m0.431s
sys     0m0.003s

for the memory_pools version, and:

$ time ./a.out

real    0m0.008s
user    0m0.006s
sys     0m0.002s

for the standard allocator version.

Question

The question still holds, is it normal it is slower ?

Scurrilous answered 24/10, 2014 at 16:59 Comment(3)
Have you checked the assembly code to ensure your test loop isn't being optimized away? All of this is with -O3 -DNDEBUG, right?Arguseyed
Depending on the STL implementation you might lose a lot of time in the rebind code.Rampant
Are you pre-allocating the pools? boost::pool just uses a free list, so unless you pre-allocate, it's entirely overhead above using malloc/new.Carpio
E
7

I have never used Boost's pool code, or even read it. But I do know some things about memory pools in general, and I would not expect the memory pool in your test to outperform malloc.

To understand this, you must first understand how malloc and free are implemented if you don't already. The answers to this question seem to provide a pretty good summary: How do malloc() and free() work?

Memory fragmentation is a hard problem for malloc() and free(), and there's no simple, fast solution. But it's much, much easier if you can guarantee that all of your allocations are the same size: this is how memory pools can win. But your test doesn't involve a lot of memory fragmentation, and probably doesn't free much memory at all. So in this test, malloc() wins, and pools lose. To refine your test, you might mix in a bunch of deletes, something like:

// Allocate 10,000 things
// Loop 10 times:
//   Allocate 1,000 things
//   Delete 1,000 things

Having said all that, if you really want to know why a particular piece of code performs the way it does, you should profile it. It's useful to think up theories about why a piece of code behaves a particular way, but you must also test your theories.

Exhibition answered 24/10, 2014 at 18:37 Comment(1)
I would try randomly allocating and deallocating things of random sizes by creating a list of allocations and randomly either allocating or, if the list isn't empty deallocating a random element of the list. Once you start to get runs of allocations you'll start to fragment your memory.Noli

© 2022 - 2024 — McMap. All rights reserved.