How to choose a fixed address for shared memory mapping

Asked 9/5, 2011 at 16:10 Answered 25/3, 2015 at 14:26

Solved c++boost shared-memory interprocess

I would like to use shared memory between several processes, and would like to be able to keep using raw pointers (and stl containers).

For this purpose, I am using shared memory mapped at a fixed address:

segment = new boost::interprocess::managed_shared_memory(
    boost::interprocess::open_or_create,
    "MySegmentName",
    1048576, // alloc size
    (void *)0x400000000LL // fixed address
);

What is a good strategy for choosing this fixed address? For example, should I just use a pretty high number to reduce the chance that I run out of heap space?

Nadenenader answered 9/5, 2011 at 16:10 Comment(5)

If you are on Windows, VMMap can help you choose, but you should also realize that other applications on the system not under your control can inject their DLLs into your memory space, invalidating any decision you make. – Residency 9/5, 2011 at 17:31

You may want to consider some sort of Election protocol between the applications that attempt to share space, where they can negotiate the common address to be used between them. Or, just use the correct boost::interprocess::shared_ptr<family> objects. They are designed for as best performance as you can reliable get, given that they are effectively segment + offset. Is the application so performance sensitive that this extra indirect lookup matters? – Residency 9/5, 2011 at 17:33

Also, run out of heap space is unlikely. The heap managers ask the operating system for additional memory when it needs to grow the heap. The heap is not (usually) one large arena, but many linked together. – Residency 9/5, 2011 at 17:35

Or, what @Joseph Garvin said while I was typing up all my comments. :-) – Residency 9/5, 2011 at 17:36

Why do you not instead use a STL allocator which allocates exclusively in a specific block of memory, given a start address and a length? You can then use a named memory mapping in every process (resulting in whatever-address) and hand the mapping's address+length to the allocator. This seems a bit less failure-prone to me than relying on and tampering with fixed addresses. – Snicker 9/5, 2011 at 17:56

This is a hard problem. If you are forking a single program to create children, and only the parent and the children will use the memory segment, just be sure to map it before you fork. The children will automatically inherit the mapping from their parent and there's no need to use a fixed address.

If you aren't, then the first thing to consider is whether you really need to use raw STL containers instead of the boost interprocess containers. That you're already using boost interprocess to allocate the shared memory segment suggests you don't have any problem using boost, so the only advantage I can think of to using STL containers would be so you don't have to port existing code. Keep in mind that for it to work with fixed addresses, the containers and what they contain pointers to (assuming you're working with containers of pointers) will need to be kept in the shared memory space.

If you're certain that it's what you want, you'll have to figure out some method for them to negotiate an address. Keep in mind that the OS is allowed to reject your desired fixed memory address. It will reject an address if the page at that address has already been mapped into memory or allocated. Because different programs will have allocated different amounts of memory at different times, which pages are available and which are unavailable will vary across your programs.

So you need for the programs to gain consensus on a memory address. This means that several addresses might have to be tried and rejected. If it's possible that sometime after startup a new program will become interested, the search for consensus will have to start over again. The algorithm would look something like this:

Program A proposes memory address X to all other programs.
The other programs respond with true or false to indicate whether the memory mapping at address X succeeded.
If program A receives any false responses, goto #1.
Program A sends a message to the other programs letting them know the address has been validated and maybe used.
If a new app becomes interested in the data, it must notify program A it would like an address.
Program A then has to tell all the other programs to stop using the data and goto #1.

To come up with what addresses A should propose, you could have A map a non-fixed memory segment, see what address it's mapped at, and propose that address. If it's unsatisfactory, map another segment and propose it instead. You will need to unmap the segments at some point, but you can't unmap them right away because if you unmap then remap a segment of the same size chances are the OS will give you the same address back over and over. Keep in mind that you may never reach consensus; there's no guarantee that there's a large enough segment at a common location across all the processes. This could happen if your programs all independently use almost all memory, say if they are backed up by a ton of swap (though if you care enough about performance to use shared memory hopefully you are avoiding swap).

All of the above assumes you're in a relatively constrained address space. If you're on 64-bit, this could work. Most computers' RAM + swap will be far less than what's allowed by 64-bits, so you could put map the memory at a very far out fixed address that all processes are unlikely to have mapped already. I suggest at least 2^48, since current 64-bit x86 processors don't each beyond that range (despite pointers being 64-bits, you can only plug in as much RAM as allowed by 48-bits, still a ton at the time of this writing). Although there's no reason a smart heap allocator couldn't take advantage of the vastness of the address space to reduce its bookkeeping work, so to be truly robust you would still need to build consensus. Keep in mind that you will at least want the address to be configurable -- even if we don't have that much memory anytime soon, between now and then someone else might have the same idea and pick your address.

To do the bidirectional communication you could use any of sockets, pipes, or another shared memory segment. Your OS may provide other forms of IPC. But strongly consider that you are probably now introducing more complexity than you would have to deal with if you just used the boost interprocess containers ;)

Photogram answered 9/5, 2011 at 17:30 Comment(4)

Thanks for this excellent discussion! My main motivation for using the STL was this would allow me to use "raw" pointers instead of the offset-pointer which are used by boost::interprocess -- this would allow me to use shared memory without a sacrifice to performance. Is there a way to make the request for an address work with a very high probability -- say, if I put it at 2^40, this should cause no issue for the heap (assuming that RAM + swap space is much less than 1TB) and might always succeed? – Nadenenader 9/5, 2011 at 18:32

@hrr: I am not sure that the purported performance loss is worth the likelihood of failure that comes from redesigning your own system, are you ? It's your call, obviously, but the less code you write, the less mistakes you make. – Retribution 9/5, 2011 at 18:42

@Matthieu M.: Good point, indeed my goal is to have less code... My hope was that by using a fixed mapping address would lead to a pretty straightforward solution (I described it here). There, I would only need to pass a second template parameter to std::vector -- so the required changes would be pretty minimal. – Nadenenader 9/5, 2011 at 18:55

@hrr: Hmm, performance in extreme situations could be another reason to do this :) I forgot to address 32-bit versus 64-bit. Indeed, with a 64-bit address space your chance of success is improved dramatically. I should edit my post to mention there's no guarantee that an unmapped segment of sufficient size exists in common across all the processes -- it's just extremely unlikely to not exist on 64-bit. I would go with higher than 2^40 though. The current 64-bit processors only actually scale to 48-bits of memory. So if you use 2^48, your code will work even on maxed out systems :) – Photogram 10/5, 2011 at 2:18

Read the address from a configuration file. That will allow easy experimentation, and make it easy to change the address as the circumstances change.

Oestradiol answered 9/5, 2011 at 16:31 Comment(4)

Neat idea! Then, my main question is how to choose a number/address to put into the configuration file? – Nadenenader 9/5, 2011 at 17:22

@Nadenenader A bit by experimentation. You can use various tools to get a map of the binary, which will allow you to exclude certain addresses, but for the rest, you have to guess a little. – Oestradiol 9/5, 2011 at 17:58

I spoke too strongly. It's more or less likely to work under different circumstances :) To be as robust as possible you would need something more sophisticated than this though. – Photogram 10/5, 2011 at 2:33

@Joseph And I may have spoken too quickly. We didn't use standard containers; we needed a hash map, which wasn't available then, and our "data" and "keys" were either numeric values (double or int) or very short fixed length strings (char[]); the only dynamic allocation was for the nodes, and we managed that manually. All of which made things a lot easier. In theory, you should be able to do something with the STL using custom allocators; I'm sure it can be made to work, but I can't claim actually having done so. – Oestradiol 10/5, 2011 at 7:38

Don't use hard-coded absolute addresses as shared memory area for security reasons, even when you don't uses forks or threads. This bypasses all ASLR protections. It enables any attacker predictable locations in the process' address space. It is pretty easy to search for such hard-coded pointers in a binary.

You've been choosen by http://reversingonwindows.blogspot.sg/2013/12/hardcoded-pointers.html as example how to make software less secure, bypassing ASLR. The 2nd bad example is in the boost library.

The address space needs to be negotiated between the communicating parties at run-time.

Submarine answered 3/1, 2014 at 2:21 Comment(0)

My solution:

The initialising program allows the system to select an appropriate segment address. This address is written to disc and retrieved for use by subsequent programs as required.

Caveats: I am using 64 bit fedora 21 with Kdevelop 4.7 and find that 'void*' is 64 bits long. Writing to disc of the segment head address involves sprintf(bu, "%p", pointer); and writing a text file:

Recovery reads this file and decodes the hex number as a 'long long' value. This is returned to the caller where it is cast as (void*)

I have also found that grouping all the access routines into a single folder above the level of the individual processes (each as a project in its own right) has helped save my sanity at the expense of a single aberrant '#include' in the process files

David N Laine

Rift answered 25/3, 2015 at 14:26 Comment(0)

Recommended topics

Hot tags