I have a large read-only data structure (a graph loaded in networkx, though this shouldn't be important) that I use in my web service. The webservice is built in Flask and then served through Gunicorn. Turns out that for every gunicorn worker I spin up, that worked holds its own copy of my data-structure. Thus, my ~700mb data structure which is perfectly manageable with one worker turns into a pretty big memory hog when I have 8 of them running. Is there any way I can share this data structure between gunicorn processes so I don't have to waste so much memory?
Sharing Memory in Gunicorn?
Have you considered using something like Redis to store the data and access it from each process? Would be very similar to shared memory as far as speed goes. –
Ionize
I would, but we're talking about a complex graph that there's no easy way to store in Redis (Redis has no directed edge graphs or general graph support currently AFAIK). –
Interbreed
Did the solution work for you? If yes can you le me know in detail, how you did it? –
Indisposition
It looks like the easiest way to do this is to tell gunicorn to preload your application using the preload_app
option. This assumes that you can load the data structure as a module-level variable:
from flask import Flask
from your.application import CustomDataStructure
CUSTOM_DATA_STRUCTURE = CustomDataStructure('/data/lives/here')
# @app.routes, etc.
Alternatively, you could use a memory-mapped file (if you can wrap the shared memory with your custom data structure), gevent with gunicorn to ensure that you're only using one process, or the multi-processing module to spin up your own data-structure server which you connect to using IPC.
preload option is not working, can you provide some example of how to use it with some dummy data structure? –
Indisposition
@Indisposition - you're probably better off asking another question with an example of your setup and what's not working. –
Kaylakayle
I have posted the question here #35915087 It would be great if you look at it once. Thanks in advance. –
Indisposition
A great read, although didn't help me setup catch the parent process while using a Uvicorn worker, but I managed to stumble upon a solution that I think is even cleaner than the preload method, and it's using a python config file for gunicorn.
-c gconfig.py
–
Sello Afaic, the "preload + module-level variable" method relies on the copy-on-write mechanism -- that processes share physical memory until they change the data. OP mentions that the data is read only. But in Python referencing a variable will change its reference count. I am not sure at which level the copy-on-write works, but I guess a change in refcount will create a copy of the Python object that references large arrays in memory, and not of the entire data. Could someone confirm this? –
Tog
© 2022 - 2024 — McMap. All rights reserved.