jemalloc, mmap and shared memory?
Asked Answered
W

1

17

Can jemalloc be modified to allocate from shared memory? The FreeBSD function dallocx() implies you can provide a pointer to use for allocation, but I don't see an obvious way to tell jemalloc to restrict all allocations from that memory (nor set a size, etc).

The dallocx() function causes the memory referenced by ptr to be made available for future allocations.

If not, what is the level of effort for such a feature? I'm struggling to find an off-the-shelf allocation scheme that can allocate from a shared memory section that I provided.

Similarly, can jemalloc be configured to allocate from a locked region of memory to prevent swapping?

Feel free to point me to relevant code sections that require modification and provide any ideas or suggestions.

The idea I am exploring is — since you can create arenas/heaps for allocating in a threaded environment, as jemalloc does to minimize contention, the concept seems scalable to allocating regions of shared memory in a multiprocessing environment, i.e. I create N regions of shared memory using mmap(), and I want to leverage the power of jemalloc (or any allocation scheme) to allocate as efficiently as possible, with minimum thread contention, from those one of those shared regions, i.e. if threads/processes are not accessing the same shared regions and arenas, the chance for contention is minimal and speed of the malloc operation is increased.

This is different than a global pool alloc with malloc() API since usually these require a global lock effectively serializing the user-space. I'd like to avoid this.

edit 2:

Ideally an api like this:

// init the alloc context to two shmem pools
ctx1 = alloc_init(shm_region1_ptr);
ctx2 = alloc_init(shm_region2_ptr);

(... bunch of code determines pool 2 should be used, based on some method
of pool selection which can minimize possibility of lock contention
with other processes allocating shmem buffers)

// allocate from pool2
ptr = malloc(ctx2, size)
Waaf answered 15/6, 2015 at 1:45 Comment(8)
This strikes me as an XY problem. Do you specifically want properties of jemalloc for your shared-memory allocator? The whole point of jemalloc is that it attempts to avoid sharing even between threads in the same process (at a great expense in terms of memory usage) to optimize for performance. If you just want a shared-memory allocator with a malloc-like API, that's a much simpler topic and does not involve jemalloc.Darius
AFAICT, dallocx() is equivalent to free(), so probably not what you want.Larrisa
@Larrisa - yes, I guess I was overly optimistic that some hook for what I was after was provided.Waaf
@R..I clarified the question. I realize I am looking for something broad, less likely to be provided by a dropin for malloc and more likely to be some sort of framework. However, I can see need for an allocator to return a context associated with a memory pool that can be used for targeted allocation.Waaf
If all you want is a malloc-like replacement that can allocate from shared memory and is good on multiprocess/multithreaded tasks, you can use thisTaw
While it is trivial to patch jemalloc to carve chunks from a shared memory segment, making it work in a multiprocess environment is a lot harder due to different address spaces. The same chunk can get mapped at different base addresses in different processes hence using pointers in internal data structures is no longer an option. Also jemalloc is unable to shrink the number of arenas and it is likely going to become an issue in a "multiprocess" jemalloc.Denyse
@ChrisDodd, interesting, I will take a look.Waaf
@R.. you say "...shared-memory allocator with a malloc-like API, that's a much simpler...". How do you suggest i do this?Olly
M
8

Yes. But this was not true when you asked the question.

Jemalloc 4 (released in August of 2015) has a couple of mallctl namespaces that would be useful for this purpose; they allow you to specify per-arena, application-specific chunk allocation hooks. In particular, the arena.<i>.chunk_hooks namespace and the arenas.extend mallctl options are of use. An integration test exists that demonstrates how to consume this API.

Regarding the rationale, I would expect that the effective "messaging" overhead required to understand where contention on any particular memory segment lies would be similar to the overhead of just contending, since you're going to degrade into contending on a cache line to accurately update the "contention" value of a particular arena.

Since jemalloc already employs a number of techniques to reduce contention, you could get a similar behavior in a highly threaded environment by creating additional arenas with opt.narenas. This would reduce contention as fewer threads would be mapped to an arena, but since threads are effectively round-robined, it's possible you get to hot-spots anyway.

To get around this, you could do your contention counting and hotspot detection, and simply use the thread.arena mallctl interface to switch a thread onto an arena with less contention.

Merkel answered 27/1, 2016 at 3:3 Comment(1)
The API changed with jemalloc v5 (2017). The canonware.org links are broken because the server is down, and the "integration test" link is broken because the file is gone. I believe the relevant links are now: arena.<i>.extent_hooks and arenas.create, , and the integration test is github.com/jemalloc/jemalloc/blob/master/test/integration/…Mccann

© 2022 - 2024 — McMap. All rights reserved.