I am creating a data ingestion pipeline using Cloud Run. My Cloud Run api gets called everytime a file is dropped in a GCS bucket via Pub Sub. I need to load some metadata that contains text for the data I am ingesting. This metadata changes infrequently. I obviously do not want to reload it in memory on every execution. What is my best option? What I have been able to research so far is:
Option 1
You can also cache objects in memory if they are expensive to recreate on each service request. Moving this from the request logic to global scope results in better performance. https://cloud.google.com/run/docs/tips#run_tips_global_scope-java
In the example given at this link, does the heavyComputation function only get called once at cold start? What if I need to retrigger this function occasionally upon metadata update. I also find the following information troubling in that it seems to say there is no guarantee other instances will reuse the object or not.
In Cloud Run, you cannot assume that service state is preserved between requests. However, Cloud Run does reuse individual container instances to serve ongoing traffic, so you can declare a variable in global scope to allow its value to be reused in subsequent invocations. Whether any individual request receives the benefit of this reuse cannot be known ahead of time.
Option 2
Use something like Redis or Cloud Memory Store that is updated by a cloud function any time there are changes. And all instances of cloud run api pull metadata information from Redis. Would this be less or more performant than option 1? Any other down sides to this?
If there are other better ways of doing this, I would be very interested.
Update 1: I thought about it some more and since my metadata is going to be different for each tenant, and each invocation of my cloud run code is going to ingest one file for one tenant, it would be a bad idea to load all tenants metadata at each execution even if its cached. I might run seperate cloud runs inside each tenant's project though.