STL Containers, SBO and custom allocator conflicts
Asked Answered
S

2

8

I've recently been fiddling with developing a custom allocator based on a memory pool, that is shared between multiple instances of the allocator.

The intention was that the allocator be compatible with STL and Standard C++ based containers such as vector, deque, map, string etc

However something in particular has caused me some confusion. Various implementations of the containers such as std::vector, std::string make use of Small Buffer Optimisation - stack based allocation for small initial memory requirements.

For example MSVC9.1 has the following member in the basic_string class:

union _Bxty
{   // storage for small buffer or pointer to larger one
   _Elem _Buf[_BUF_SIZE];
   _Elem *_Ptr;
   char _Alias[_BUF_SIZE];  // to permit aliasing
} _Bx;

I can't see how when instantiating such containers one can cajole the implementation to only and always use the provided allocator and not use SBO. I ask because one of intentions of implementing custom allocators was to be able to use them in a shared memory context, where the amount of the shared memory may be less than the SBO limit some of the various implementations may use.

For example I would like to have a situation where I can have two instances of std::string one per process sharing a common block of memory which maybe smaller than or equal to the SBO upper limit.

Possibly related: May std::vector make use of small buffer optimization?

typedef std::vector<int,mysharedmemallocator> shmvtype;

shmvtype v(2,0); //<-- if SBO then error as memory is allocated on
                 //stack not via the allocator

v[1] = 1234;  //<-- if SBO then error as wrong piece of memory
              // is being modified.

Lets look at another example that is not based on shared memory as it seems to over complicate things for some people. Lets say I want to specialize my std::basic_string or std::vector etc with an allocator that fills the memory it allocates with the value 0xAB prior to presenting the pointer back to the calling entity for no reason other than whimsy.

A container that is specialised with this new allocator, but that also uses SBO, will not have its SBO based memory filled with 0xAB pattern. So for example:

typedef std::basic_string<char,myfillmemallocator> stype

stype s;
s.resize(2);

assert(s[0] == 0xAB); // if SBO this will fail.
Sightread answered 3/12, 2012 at 5:27 Comment(0)
S
5

one of intentions of implementing custom allocators was to be able to use them in a shared memory context

This may be what you intend to do with it, but that's not why they exist. Indeed, with the exception of basic_string in C++98/03, it is not legal to share allocated memory between objects at all. They can share allocator objects, so they can get their memory from the same place. But it is illegal for modifications of one object to impact another that is unrelated; each instance must be separate.

Copy-on-write strings only work because the system assumes that any non-const access to a character will write to it, thus performing a copy. And in C++11, even basic_string is forbidden from doing copy-on-write-style stuff like this.

For example I would like to have a situation where I can have two instances of std::string one per process sharing a common block of memory which maybe smaller than or equal to the SBO upper limit.

That's not possible without writing your own class. The allocator only controls where the memory comes from. What you're wanting is a guaranteed copy-on-write string or some sort of shared string class.

What you want requires a container class specifically designed for this purpose.

Schapira answered 3/12, 2012 at 5:42 Comment(6)
Thanks for the comment Nicol, but just one question, can't one have an allocator that allocates shared memory, and aliases the memory via an offset pointer concept - This is how it is done normally (boost.interprocess), not sure I understand why it can't be done.Sightread
@Seminar: And what happens if you copy that string? What happens if you insert a character to that string, thus potentially causing reallocation? It's not clear what the semantics of any of these operations should be, but one thing is clear: whatever the semantics are, they are semantics the implementation itself needs to know about. That is, you need a specific container implementation that does things a certain way.Schapira
Regarding Copy from the shared string to another string is a common copy, the target string will not be expected to have shared memory, this is also the same the other way round, copy from normal std::string to shared string data gets copied into the shared memory provided by the specialised allocator - So this is irrelevant to the questionSightread
Regarding Resize/Append et al: the container will use the allocator it has been specialised on. In shared memory semantics it is common to agree upon a common max size before hand - So this is irrelevant to the questionSightread
if you get a chance can you please review boost.interprocess.Sightread
@Seminar: "In shared memory semantics it is common to agree upon a common max size before hand" Then you should use a container that understands that there is a "common max size" that has been agreed upon. There is a strong interplay between the container's implementation and how memory gets allocated; you can't just put a size limit into the allocator and have the container work within it. If you want to share like this, you either need to use a container that has well-defined allocation patterns (which basic_string does not, intentionally) or you need to write such a container.Schapira
B
0

For completeness, obviously there is a huge difference in optimization of containers storing strings and some optimization of std::strings themselves.

"For example I would like to have a situation where I can have two instances of std::string one per process sharing a common block of memory which maybe smaller than or equal to the SBO upper limit."

If you write your own version of strings, then the pointers to the strings can be shared by containers or wrappers around containers if you need to keep a reference count to delete the strings later, and you may also provide dereference getters and save pointer adapters. This is actually true even if you use std::string, meaning you could save shared strings from a string pool. The degree of customization needed in C++ for pooled strings or garbage collection (as Java programmers would refer to shared string constants and freeing memory) would depend on the C++ implementation. Some give you some of that for free I think (e.g. deleting containers deletes their contents, but doesn't dereference pointers). More than that is beyond the scope of this answer.

So a compromise could be storing pointers to std::strings in containers, and keeping a set of the pointers to delete, or a map with a reference count and pointers to delete. But that assumes there is no smart pointer needed because you're not freeing memory when a reference count equals zero, but garbage collecting at some opportune point in the program instead. You could add to the container a wrapper to check the reference count when deleting a pointer. The pooling of strings without wrappers or smart pointers would then only work for either the wrapped containers or the periodic garbage collection scenario, and not individual string variables.

My use case doing this is processing a large number of XML files, some of the text is kept in more than one container processing each file, the containers and strings are used by multiple classes, so kept on the heap since the contained strings in a container on the stack would otherwise disappear if referenced through the class vs passed to a function (debatable design flaw or feature), and the containers and text are deleted after each file is processed. The delete is done looping through the set or reference count map mentioned above, since calling clear on the containers or deleting the containers deletes pointers, but not the strings themselves. Saving pointers in the containers allows sharing the strings at the code writing cost of dereferencing them to get their value. You could also use references for easier syntax there.

Body answered 28/2, 2023 at 13:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.