I've been experimenting with MALLOC_MMAP_THRESHOLD_ and MALLOC_MMAP_MAX_ env variables to affect memory management in a long-running Python 2 process. See http://man7.org/linux/man-pages/man3/mallopt.3.html
I got the idea from this bug report: http://bugs.python.org/issue11849
The results I have are encouraging: memory fragmentation is reduced and the typical high-water mark visible in memory used by long-running processes is lower.
My only concern is if there are other side effects that may bite back, when using such low level tweaks. Does anyone have any experience in using them?
Here is an example script that shows how those variables affect RSS memory in a script that generate a large dictionary: https://gist.github.com/lbolla/8e2640133032b0a6bb9c Just run "alloc.sh" and compare the output. Here is the output for me:
MALLOC_MMAP_THRESHOLD_=None MALLOC_MMAP_MAX_=None
N=9 RSS=120968
MALLOC_MMAP_THRESHOLD_=512 MALLOC_MMAP_MAX_=None
N=9 RSS=157008
MALLOC_MMAP_THRESHOLD_=1024 MALLOC_MMAP_MAX_=None
N=9 RSS=98484
MALLOC_MMAP_THRESHOLD_=2048 MALLOC_MMAP_MAX_=None
N=9 RSS=98484
MALLOC_MMAP_THRESHOLD_=4096 MALLOC_MMAP_MAX_=None
N=9 RSS=98496
MALLOC_MMAP_THRESHOLD_=100000 MALLOC_MMAP_MAX_=None
N=9 RSS=98528
MALLOC_MMAP_THRESHOLD_=512 MALLOC_MMAP_MAX_=0
N=9 RSS=121008
MALLOC_MMAP_THRESHOLD_=1024 MALLOC_MMAP_MAX_=0
N=9 RSS=121008
MALLOC_MMAP_THRESHOLD_=2048 MALLOC_MMAP_MAX_=0
N=9 RSS=121012
MALLOC_MMAP_THRESHOLD_=4096 MALLOC_MMAP_MAX_=0
N=9 RSS=121000
MALLOC_MMAP_THRESHOLD_=100000 MALLOC_MMAP_MAX_=0
N=9 RSS=121008
MALLOC_MMAP_THRESHOLD_=512 MALLOC_MMAP_MAX_=16777216
N=9 RSS=157004
MALLOC_MMAP_THRESHOLD_=1024 MALLOC_MMAP_MAX_=16777216
N=9 RSS=98484
MALLOC_MMAP_THRESHOLD_=2048 MALLOC_MMAP_MAX_=16777216
N=9 RSS=98484
MALLOC_MMAP_THRESHOLD_=4096 MALLOC_MMAP_MAX_=16777216
N=9 RSS=98496
MALLOC_MMAP_THRESHOLD_=100000 MALLOC_MMAP_MAX_=16777216
N=9 RSS=98528
As you can see, RSS used is about 20% less than vanilla Python for this example.