Where to store shared cache objects in Cloud Run?

Asked 23/12, 2019 at 20:26 Answered 24/12, 2019 at 11:52

Solved shared-memory google-cloud-run data-ingestion google-cloud-memorystore

I am creating a data ingestion pipeline using Cloud Run. My Cloud Run api gets called everytime a file is dropped in a GCS bucket via Pub Sub. I need to load some metadata that contains text for the data I am ingesting. This metadata changes infrequently. I obviously do not want to reload it in memory on every execution. What is my best option? What I have been able to research so far is:

Option 1

You can also cache objects in memory if they are expensive to recreate on each service request. Moving this from the request logic to global scope results in better performance. https://cloud.google.com/run/docs/tips#run_tips_global_scope-java

In the example given at this link, does the heavyComputation function only get called once at cold start? What if I need to retrigger this function occasionally upon metadata update. I also find the following information troubling in that it seems to say there is no guarantee other instances will reuse the object or not.

In Cloud Run, you cannot assume that service state is preserved between requests. However, Cloud Run does reuse individual container instances to serve ongoing traffic, so you can declare a variable in global scope to allow its value to be reused in subsequent invocations. Whether any individual request receives the benefit of this reuse cannot be known ahead of time.

Option 2

Use something like Redis or Cloud Memory Store that is updated by a cloud function any time there are changes. And all instances of cloud run api pull metadata information from Redis. Would this be less or more performant than option 1? Any other down sides to this?

If there are other better ways of doing this, I would be very interested.

Update 1: I thought about it some more and since my metadata is going to be different for each tenant, and each invocation of my cloud run code is going to ingest one file for one tenant, it would be a bad idea to load all tenants metadata at each execution even if its cached. I might run seperate cloud runs inside each tenant's project though.

Grandmamma answered 23/12, 2019 at 20:26 Comment(0)

Regarding the first option (Option 1):

The heavyComputation() function here would be called only at cold start, each time a new Cloud Run container instance is created (when the maximum number of requests that can be sent in parallel to a given container instance is exceeded and therefore a new instnace is created).

In order to address the second option (Option 2):

As of now, Cloud Run (fully managed) does not support Serverless VPC Access, and therefore a connection to Cloud Memorystore is not a possibility. Monitor the following Feature Request to get all the relevant information and updates from the Cloud Run product team to check when this feature will be available.

You can found some workarounds on this post and on the Feature Request already mentioned. They basically consist of:

Using a Google Kubernetes Engine cluster with Cloud Run.
Setup a Redis instance on Compute Engine.

Holladay answered 24/12, 2019 at 11:52 Comment(6)

Isn't MemoryStore just another service like BigQuery or GCS? Can you please expand on why Serverless VPC Access is required for Cloud Run to use MemoryStore? – Grandmamma 26/12, 2019 at 5:1

Also, on that thread bunch of people are saying that Cloud Run API is publically accessible, but I don't believe that's true. When deploying the API it asks whether we want to allow anonymous access or not. Am I missing something? – Grandmamma 26/12, 2019 at 5:6

Cloud Memorystore has some very strict connectivity requirements, on which basically the instances must share the same VPC network as the other services. Serverless VPC Access allows the connectivity between serverless services (e.g App Engine, Cloud Functions, and soon Cloud Run) and a VPC network in the Google Cloud Platform context. – Holladay 26/12, 2019 at 8:31

Regarding the Cloud Run API you can choose between the services being publicly accessible or to require authentication. But this had to do with authentication, meaning who is able to call a Cloud Run service. What they are referring to has to do with connectivity, meaning that since by default each service has an external endpoint, they are exposed to the public internet through the service's default URL. What is required is that Cloud Run services are not accessible to the public internet, but by an internal IP on the VPC network. – Holladay 26/12, 2019 at 8:32

Is DoS type attacks the only concern for publically exposed services? Or is there another concern? – Grandmamma 27/12, 2019 at 4:48

Google Cloud Platform has many built in security features (including DDoS protections), and by default you are protected by many of these attacks. The concern is more business oriented, in the means of protecting your company's private information to be exposed to the public internet, and to shield your backend services from public endpoints. – Holladay 27/12, 2019 at 8:34

We start with your initial premise which is "I obviously do not want to reload it in memory on every execution". For me, that isn't an always true statement. If I implement a caching technology then, as a programmer, I have spent time getting it right and introduced opportunities for error and maintenance. If I have saved 100msecs per execution, how many thousands and thousands of executions would it take to break even on these costs vs the savings in increased execution time? I commonly take the simplest approach up-front and be prepared to monitor operations in the future and address improvements only if warranted.

That all said, let's assume that you have determined you will be making a bazillion new file creation requests per second and want to optimize. The key to understanding how best to use Cloud Run is that it runs a whole container that can process concurrent requests. I believe the default is 80. What that means is that if 80 concurrent files were created, only one container instance would be created and 80 parallel events would be processed. If you are coding in Java, this would mean 80 concurrent threads all within the same JVM. Each thread would have concurrent addressability to a common global variable. Only if the 81st request arrived and none of the previous 80 had already completed would a new Cloud Run container be spawned.

What this tells me is that my first "improvement" would be to populate cache data in my JVM on first usage and keep it present for subsequent reuse. When do you populate your cache data? This would be your design choice. You could populate it all up front when the container first starts ... this would be sensible if you know that the data will be used for each and every request. Alternatively, if you have multiple cacheable values, consider creating a map that contains your name/value pairs and having an accessor which returns a cached value (if present) or retrieves from slow storage, caches and the returns a value (if not originally present).

Mandiemandingo answered 23/12, 2019 at 21:32 Comment(7)

My metadata resides in bigquery. It is going to be fairly costly to fetch the same information (as i said it changed infrequently) at each invocation to make a roundtrip to BigQuery. Imagine millions of files loading, and at the beginning on each invocation I go to Bigquery to pull same information needlessly. Thanks for your explanation on Cloud run at a container level. That helped clear up some of it. So what you are saying is that only when at 81st request that second container spins up is it going to rerun that global var? – Grandmamma 23/12, 2019 at 21:40

Is there a way for me to force it to restart all cloud run instances if metadata changed in bigquery? – Grandmamma 23/12, 2019 at 21:43

Aha ... so we have two stories here ... one is the "per request" cost of using the metadata and the second is the "per retrieval" cost of getting it once per container. If we assume that you have to execute a BigQuery query each time there is "some" metadata change, then if you could perform your query and store your data as a GCS Object than can be retrieved cheaply and quickly within your container. – Mandiemandingo 23/12, 2019 at 21:44

Re: When a second container spins up. A second container will only spin up if there are more concurrent requests to be processed than you have declared you want. If the default is 80 concurrent requests and at no time are there more than that, then you will run a single container forever. However if you have a burst of traffic and a you find that the newest request would result in 81 concurrent requests then a new container is spawned to run your latest requests. Containers will be spun down (to zero) after a configurable period of idleness. – Mandiemandingo 23/12, 2019 at 21:48

Why would gcs retreival be cheaper necessarily than bigquery if i have to do it repeatedly. one thing i should add is that in my code, i am doing exactly what you previously suggested once i load the metadata i.e. create dictionary objects that give me O(1) lookup when looking for information. That made my code superfast and I will keep it. but i want to do that pull from bq and map creation as less as possible. – Grandmamma 23/12, 2019 at 21:50

Howdy ... there are always multiple dimensions of puzzles to "solve for" ... it wasn't clear to me whether you are caching to solve for performance or caching to solve for cost (i.e. a BQ query may cost most than a GCS storage retrieval). So we need to ask "cheaper" in terms of time or cost or both. When you said it was going to be more "costly" my mind turned to financial cost. – Mandiemandingo 24/12, 2019 at 0:49

Aah I see. Yes I’m talking performance purely at this time. But good point. – Grandmamma 24/12, 2019 at 3:33

Recommended topics

Hot tags