Sharing Memory in Gunicorn?

About

Asked 2/12, 2014 at 1:14 Answered 2/12, 2014 at 6:2

I have a large read-only data structure (a graph loaded in networkx, though this shouldn't be important) that I use in my web service. The webservice is built in Flask and then served through Gunicorn. Turns out that for every gunicorn worker I spin up, that worked holds its own copy of my data-structure. Thus, my ~700mb data structure which is perfectly manageable with one worker turns into a pretty big memory hog when I have 8 of them running. Is there any way I can share this data structure between gunicorn processes so I don't have to waste so much memory?

Interbreed answered 2/12, 2014 at 1:14 Comment(3)

Have you considered using something like Redis to store the data and access it from each process? Would be very similar to shared memory as far as speed goes. – Ionize 2/12, 2014 at 1:45

I would, but we're talking about a complex graph that there's no easy way to store in Redis (Redis has no directed edge graphs or general graph support currently AFAIK). – Interbreed 2/12, 2014 at 1:55

Did the solution work for you? If yes can you le me know in detail, how you did it? – Indisposition 11/3, 2016 at 6:28

It looks like the easiest way to do this is to tell gunicorn to preload your application using the preload_app option. This assumes that you can load the data structure as a module-level variable:

from flask import Flask
from your.application import CustomDataStructure

CUSTOM_DATA_STRUCTURE = CustomDataStructure('/data/lives/here')

# @app.routes, etc.

Alternatively, you could use a memory-mapped file (if you can wrap the shared memory with your custom data structure), gevent with gunicorn to ensure that you're only using one process, or the multi-processing module to spin up your own data-structure server which you connect to using IPC.

Kaylakayle answered 2/12, 2014 at 6:2 Comment(5)

preload option is not working, can you provide some example of how to use it with some dummy data structure? – Indisposition 10/3, 2016 at 7:2

@Indisposition - you're probably better off asking another question with an example of your setup and what's not working. – Kaylakayle 10/3, 2016 at 15:39

I have posted the question here #35915087 It would be great if you look at it once. Thanks in advance. – Indisposition 10/3, 2016 at 15:44

A great read, although didn't help me setup catch the parent process while using a Uvicorn worker, but I managed to stumble upon a solution that I think is even cleaner than the preload method, and it's using a python config file for gunicorn. -c gconfig.py – Sello 20/12, 2020 at 6:20

Afaic, the "preload + module-level variable" method relies on the copy-on-write mechanism -- that processes share physical memory until they change the data. OP mentions that the data is read only. But in Python referencing a variable will change its reference count. I am not sure at which level the copy-on-write works, but I guess a change in refcount will create a copy of the Python object that references large arrays in memory, and not of the entire data. Could someone confirm this? – Tog 19/4 at 3:24

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags