Making more efficient use fork() and copy-on-write memory sharing
Asked Answered
N

1

7

I am a programmer developing a multiplayer online game using Linux based servers. We use an "instanced" architecture for our world. This means that each player entering a world area gets a copy of that area to play in with their party members, and independent of all the other players playing in the same area.

Internally we use a separate process for each instance. Initially each instance process would start up, load only the resources required for the given area, generate it's random terrain, and then allow new connections from players. The amount of memory used by an instance was typically about 25 meg including resources and the randomly generated level with entities.

In order to reduce the memory footprint of instances, and to speed up the spawn time, we changed to an approach where we create a single master instance that loads all the resources that any instance could possibly need (about 150 meg of memory) and then when a new instance is required, use the fork() function to spawn a new instance and utilise copy-on-write memory sharing so that the new instance only requires memory for it's "unique" data set. The footprint of the randomly generated level and entities which make up the unique data for each instance is about 3-4 meg of memory.

Unfortunately the memory sharing is not working as well as I think it could. A lot of memory pages seem to become unshared.

At first, as we load more of our data set in the prefork instance, the memory required for each forked instance goes down, but eventually there is an inflection point where loading more assets in the prefork actually increases the data used by each forked instance.

The best results we have had are loading about 80 meg of the data set pre fork, and then having the fresh instances demand load the rest. This results in about 7-10 extra meg per instance and an 80 meg fixed cost. Certainly a good improvement, but not the theoretical best.

If I load the entire 150 meg data set and then fork, each forked instance uses about 50 more meg of memory! Significantly worse than simply doing nothing.

My question is, how can I load all of my data set in the prefork instance, and make sure I'm only getting the minimum set of really unique per instance data as the memory footprint for each instance.


I have a theory as to what is happening here and I was wondering if someone would be able to help confirm for me that this is the case.

I think it's to do with the malloc free chain. Each memory page of the prefork instance probably has a few free spots of memory left in it. If, during random level generation, something is allocated that happens to fit in one of the free spots in a page, then that entire page would be copied into the forked process.

In windows you can create alternate heaps, and change the default heap used by the process. If this were possible, it would remove the problem. Is there any way to do such a thing in linux? My investigations seem to indicate that you can't.

Another possible solution would be if I could somehow discard the existing malloc free chain, forcing malloc to allocate fresh memory from the operating system for subsequent calls. I attempted to look at the implementation of malloc to see if this would be easily possible, but it seemed like it might be somewhat complex. If anyone has any ideas around this area or a suggestion of where to start with this approach, I'd love to hear it.

And finally if anyone has any other ideas about what might be going wrong here, I'd really like to hear them. Thanks a lot!

Nestle answered 24/3, 2012 at 11:2 Comment(0)
A
2

In windows you can create alternate heaps, and change the default heap used by the process. If this were possible, it would remove the problem. Is there any way to do such a thing in linux?

I Unix you can simply mmap(2) memory an bypass malloc altogether.

I would also ditch the whole "rely-on-cow" thing. I would have the master process mmap some memory (80M, 150M whatever), write stuff to it, mark it read-only via mprotect(2) for good measure and take it from there. This would solve the real issue and wouldn't force you to change the code down the road.

Annihilation answered 24/3, 2012 at 11:4 Comment(5)
I can't really bypass malloc/new entirely without a lot of work because this is a large existing code base, and all of our resource loaders quite happily use them. I'm not willing to rule out the approach, but I'd like to keep using someone elses memory allocator if possible.Nestle
@Nestle You can use a custom allocator or even hijack malloc. However, read the second paragraph. Stop relying on cow and do your own mmap.Annihilation
That would still require me to write a custom allocator, and while I could resort to this, I would like to find a solution that doesn't require me to do that. Some kind of wrapper approach around the existing malloc to facilitate my needs perhaps?Nestle
@Nestle You're not listening. You don't need anything custom if you mmap upfront.Annihilation
Negs is listening. The code loads by doing lots of malloc() calls and changing the whole thing over would be difficult. This is a classic question of programmer expense vs. hardware expense.Carpus

© 2022 - 2024 — McMap. All rights reserved.