malloc function interposition in the standard C and C++ libraries
Asked Answered
M

1

4

I want to write a shared library in such a way that it is possible to isolate it’s memory usage from the application it is linked against. That is, if the shared library, let’s call it libmemory.so, calls malloc, I want to maintain that memory in a separate heap from the heap that is used to service calls to malloc made in the application. This question isn't about writing memory allocators, it more about linking and loading the library and application together.

So far I’ve been experimenting with combinations of function interposition, symbol visibility, and linking tricks. So far, I can’t get this right because of one thing: the standard library. I cannot find a way to distinguish between calls to the standard library that internally use malloc that happen in libmemory.so versus the application. This causes an issue since then any standard library usage within libmemory.so pollutes the application heap.

My current strategy is to interpose definitions of malloc in the shared library as a hidden symbol. This works nicely and all of the library code works as expected except, of course, the standard library which is loaded dynamically at runtime. Naturally, I’ve been trying to find a way to statically embed the standard library usage so that it would use the interposed malloc in libmemory.so at compile time. I’ve tried -static-libgcc and -static-libstdc++ without success (and anyway, it seems this is discouraged). Is this the right answer?

What do?

P.s., further reading is always appreciated, and help on the question tagging front would be nice.

Medicate answered 28/1, 2016 at 21:34 Comment(7)
Unlike Windows, the Unix userland is very strongly biased toward everything in a process using the same allocator. Merely having two implementations of malloc in the same process is liable to cause catastrophic malfunctions, because nobody worries about matching A_malloc with A_free and B_malloc with B_free, the implementations might get in a fight over which of them is allowed to call sbrk with a nonzero argument, etc. etc. etc. etc.Driskell
Therefore, in order to give any sort of helpful answer, we need to know a whole lot more about why you think you need to do this thing in the first place. What is the larger problem you are trying to solve? Why does this seem like the path of least resistance? It would also be useful to know generally what libmemory.so does and which C and C++ library functions it uses.Driskell
Thanks for the reply @zwol. I'm building a distributed shared memory allocator as a shared object and want to find a way to separate memory used by the shared object versus the application. The shared object is complex: it has a allocator, has an embedded HTTP sever, etc. It makes ample use of memory itself (albeit, memory from a special allocator that I use throughout all the library code I write). The problem, as state, is the memory the standard library uses... e.g. , a fall to fopen or gmtime_r might internally call malloc, thereby using the application's heap allocator to be used.Medicate
The first thing that comes to mind is: maybe the bulk of what the shared library does should be extracted to a separate daemon process.Driskell
That is a substantial design change but is certainly possible. Are there any other approaches I can investigate with my current setup before I change course?Medicate
I realize now that you didn't really answer my question. Why is it bad for the memory allocated directly or indirectly by your shared-object allocator to be lumped with the memory allocated by the application?Driskell
I've been mulling this question over in my head, too. From a purist's point of view this just feels dirty. I would prefer to keep the memory separate if possible. It is sounding like what I was is simply not possible, which makes the question you're asking more important. I think it may not matter, it is just distasteful, since I would be mixing local per process memory with a the distributed application's memory. Maybe I'll look into your daemon idea more, that achieves complete isolation.Medicate
M
2

I’ve tried -static-libgcc and -static-libstdc++ without success

Of course this wouldn't succeed: malloc doesn't live in libgcc or libstdc++; it lives in libc.

What you want to do is statically link libmemory.so with some alternative malloc implementation, such as tcmalloc or jemalloc, and hide all malloc symbols. Then your library and your application will have absolutely separate heaps.

It goes without saying that you must never allocate something in your library and free it in the application, or vice versa.

In theory you could also link the malloc part of system libc.a into your library, but in practice GLIBC (and most other UNIX C libraries) does not support partially-static link (if you link libc.a, you must not link libc.so).

Update:

If libmemory.so makes use of a standard library function, e.g., gmtime_r, which is linked in dynamically, thereby resolving malloc at runtime, then libmemory.so mistakenly uses malloc provided at runtime (the one apparently from glibc

There is nothing mistaken about that. Since you've hidden your malloc inside your library, there is no other malloc that gmtime_r could use.

Also, gmtime_r doesn't allocate memory, except for internal use by GLIBC itself, and such memory could be cleaned up by __libc_freeres, so it would be wrong to allocate this memory anywhere other than using GLIBC's malloc.

Now, fopen is another example you used, and fopen does malloc memory. Apparently you would like fopen to call your malloc (even though it's not visible to fopen) when called by your library, but call system malloc when called by the application. But how can fopen know who called it? Surely you are not suggesting that fopen walk the stack to figure out whether it was called by your library or by something else?

So, if you really want to make your library never call into system malloc, then you would have to statically link all other libc functions that you use and that may call malloc (and hide them in your library as well).

You could use something like uclibc or dietlibc to achieve that.

Mccallum answered 30/1, 2016 at 19:30 Comment(3)
Thanks for the response. I am linking libmemory.so with my own malloc symbols and I am also hiding these symbols in libmemory.so. libmemory.so's use of malloc is taken care of perfectly fine. The problem, per the original question, is in regards to the standard library. If libmemory.so makes use of a standard library function, e.g., gmtime_r, which is linked in dynamically, thereby resolving malloc at runtime, then libmemory.so mistakenly uses malloc provided at runtime (the one apparently from glibc -- thanks for that). Does this clarify, or am I missing something?Medicate
Thanks, this is aligned with how I am thinking about the problem. I was under the impression that I couldn't use something like uclibc, dietlibc, or musl without replacing libc for the entire application. It sounds like you're suggesting that statically linking all libc functions that libmemory.so uses can be done (thereby using libmemory.so's internal malloc implementation) without touching the application: it would still use the normal dynamically linked libc.Medicate
is this impression incorrect? Can I build musl and statically link it to my libmemory.so without screwing up the other libc?Medicate

© 2022 - 2024 — McMap. All rights reserved.