How does AWS Lambda serve multiple requests?

Asked 20/8, 2017 at 9:49 Answered 16/9, 2021 at 0:54

How does AWS Lambda serve multiple requests? I want to know is it a multi-thread kind of a model here as well?

If I am calling a Lambda from an API gateway. And there are 1000 requests in 10 secs to the API. How many containers will be created and how many threads.

Ingamar answered 20/8, 2017 at 9:49 Comment(4)

What is your exact use case? Your question is too vague. – Kittle 20/8, 2017 at 10:3

Also the unit of concurrency is different for different use cases (for stream based listeners like Kinesis its the number of shards and for non stream based, its based on events - number of DDB items created/S3 documents pushed/AWS API calls made. docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html. – Kittle 20/8, 2017 at 10:10

If I am calling a Lambda from an API gateway. And there are 1000 requests in 10 secs to the API. How many containers will be created and how many threads. – Ingamar 20/8, 2017 at 11:50

1000 containers at default IIRC simultaneously and each execution is single threaded. The parallelism is on a different abstraction layer at container level (think different hosts or docker containers based on impl) instead of OS level threads. At client layer, this implementation detail should hold no meaning. Of course if you need parallelism at each invocation layer, you would have to write your application in that fashion (e.g. node js runtime would allow you to spawn a lot of async stuff). – Kittle 20/8, 2017 at 22:58

How does AWS Lambda serve multiple requests?

Independently.

I want to know is it a multi-thread kind of a model here as well?

No, it is not a multi-threaded model in the sense that you are asking.

Your code can, of course, be written to use multiple threads and/or child processes to accomplish whatever purpose it is intended to accomplish for one invocation, but Lambda doesn't send more than one invocation at a time to the same container. The container is not used for a second invocation until the first one finishes. If a second request arrives while a first one is running, the second one will run in a different container.

If I am calling a Lambda from an API gateway. And there are 1000 requests in 10 secs to the API. How many containers will be created and how many threads?

As many containers will be created as are needed to process each of the arriving requests in its own container.

The duration of each invocation will be the largest determinant of this.

1000 very quick requests in 10 seconds are roughly equivalent to 100 requests in 1 second. Assuming each request finishes in less than 1 second and arrival times are evenly-distributed, you could expect fewer than 100 containers to be created.

On the other hand, if 1000 requests arrived in 10 seconds and each request took 30 seconds to complete, you would have 1000 containers in existence during this event.

After a spike in traffic inflates the number of containers, they will all tend to linger for a few minutes, ready to handle the additional load if it arrives, and then Lambda will start terminating them.

Piefer answered 20/8, 2017 at 14:5 Comment(8)

do you have any reference to this statement: "The container is not used for a second invocation until the first one finishes." And does that statement apply only to java(OP) or Node? Thanks! – Overripe 13/5, 2019 at 23:31

@JohnLee a clear, unambiguous, authoritative citation is difficult to find, but this applies to all Lambda runtimes, and is relatively easy to prove for yourself -- the /tmp directory and global data structures are two ways to do it -- they both persist across invocations when reuse occurs, but you will not be able to create a setup where two concurrent invocations are sharing anything. You pay for a fixed allocation of memory, CPU, and disk, for the duration of each invocation, and if multiple invocations shared a container, the performance would clearly go down... but it doesn't. – Piefer 14/5, 2019 at 1:59

@JohnLee here we go: "instance" = "container" -- When your function is invoked more quickly than a single instance of your function can process events, Lambda scales by running additional instances. Each instance of your function handles only one request at a time, so you don't need to worry about synchronizing threads or processes. You can, however, use asynchronous language features to process batches of events in parallel, and save data to the /tmp directory for use in future invocations on the same instance. – Piefer 14/5, 2019 at 2:5

will all these containers be created on the same machine? if not how to get hold of the long-running child-process created in the previous invocation during the next invocation, – Intracutaneous 29/3, 2020 at 6:1

or, my function implements a long running producer-consumer problem, so in case of lambda timeout, how it will ensure to get hold of all of the producers, consumers & their states during the next invocation? – Intracutaneous 29/3, 2020 at 6:4

So running Node on Lambda clearly takes away it's async/event loop concept and makes it truly single-threaded? I've been trying to research this but there is little info to be found. – Tortious 27/8, 2020 at 6:43

@AndreasBergström, well, no, not in the way you're suggesting. Node on Lambda works the same way as Node elsewhere. The difference is that one Node.js process is never presented with more than one request from the outside world at the same time. If two invocations of the same Lambda function are being handled at the same time, there will be two different Node processes in two different containers, one handling each. Part of the magic of Lambda is the ability to create these independent containers as needed (and later destroy idle ones) without you paying for the hardware they run on. – Piefer 27/8, 2020 at 15:37

@Michael-sqlbot "Node on Lambda works the same way as Node elsewhere", that can't be true if I'm understanding Lambda correctly. When running a Node HTTP-server non-FaaS, a single Node process/thread can handle other requests while awaiting I/O from libuv. While in lambda, invocations are sent to containers/functions one at a time. This means the whole async I/O aspect of Node is irrelevant? Since 2 requests a few ms apart will invoke two different containers instead of ending up on the same Node process's microtask queue? – Tortious 27/8, 2020 at 17:29

There are a few angles to discuss.

AWS Lambda does support handling requests in parallel, but any single instance / container of a Lambda will only process one request at a time. If all existing instances are busy then new ones will be provisioned (depending on concurrency settings, discussed below).

Within a single Lambda instance multi-threading is supported, but still only one request will be handled per instance. In practice parallelization is rarely beneficial in Lambda, it adds significant overhead and is best used for processing very large sets. Additionally, Lambdas need to have more than 1 virtual core for it to have any benefit. Cores are configured by raising the memory setting--many Lambdas run with a low enough memory setting to have just one core.

Determining exactly how many containers / instances are created isn't always possible due to there being many factors:

Lambda will reuse any existing, paused, instances
Existing instances are often very fast to handle requests, a small number of warm instances can process many, many requests in the time it takes to provision new instances (especially with runtimes like Java or .NET Core, which often have startup times of 1+ seconds)
The concurrency settings of your Lambda are a significant factor
- If you have Reserved Concurrency of X, you will never have more than X instances
- If you have unreserved concurrency, then the limit is based on available concurrency. This defaults to 1000 instances per account, so if 990 instances of any Lambdas already exist then only 10 could be created
- If you have provisioned concurrency then you will always have a minimum number of instances, reducing cold-starts

But, to try to answer your story problem, let's assume you are sending your 1000 requests at a steady pace over the 10 minutes. That's one request every 600 milliseconds. Let's also assume your Java app is given a fairly high memory allocation, and its initialization is relatively quick -- let's say 1 second for a cold start. Once the cold start is complete invocation is fast -- let's say 10ms. And, let's assume there are no instances when the traffic begins.

The first request will see a response time of ~1,010ms -- 1 second for a cold start, and 10ms for handling the request. A second request will arrive while the first is still processing, so it's likely that Lambda will provision a second instance, and the second request will see a similar response time.

By the time the third request comes in (1800ms after the start) both instances are now idle and can be reused--so this request will not experience a cold start, and the response time will be 10ms. From this point forward it's likely that no additional instances are needed--but this all assumes a steady rate of requests.

But--changing any variable can have a big impact.

Edythedythe answered 16/9, 2021 at 0:54 Comment(0)

AWS Lambda is capable of serving multiple requests by horizontally scaling for multiple containers. Lambda can support up to 1000 parallel container executions by default.

there are 1000 requests in 10 secs to the API. How many containers will be created and how many threads.

Requests per second = 1000/10 = 100

There will be 100 parallel Lambda executions assuming each execution takes 1 second or more to complete.

Note: Also you can spawn multiple threads but its difficult to predict the performance gain.

Also keep in mind that, having multiple threads is not always efficient The CPU available to your Lambda function is shared between all threads and processes your Lambda function creates. Generally you will not get more CPU in a Lambda function by running work in parallel among multiple threads. Your code in this case isn’t actually running on two cores, but on two “hyperthreads” on a single core; depending on the workload, this may be better or worse than a single thread. The service team is looking at ways to better leverage multiple cores in the Lambda execution environment, and we will take your feedback as a +1 for that feature.

Reference: AWS Forum Post

For further details on concurrent executions of Lambda, refer this aws documentation.

Bashaw answered 20/8, 2017 at 10:22 Comment(5)

"Note: Lambda can support up to 1000 (combined total of processes and threads) concurrent executions by default." This isn't correct. Processes and threads are not a factor, here. 1000 concurrent invocations means 1000 containers. Each invocation is entirely independent of any others. Anything you do in your code with processes and threads applies to one invocation -- not across them. – Piefer 20/8, 2017 at 13:51

@Michael Thanks for the input. I also had the same sense but when further reading the docs I found the info in aws doc. docs.aws.amazon.com/lambda/latest/dg/limits.html Is it a typo or my interpretation is wrong? – Bashaw 20/8, 2017 at 17:19

You have combined two unrelated concepts in your interpretation. Note the title of the table: AWS Lambda Resource Limits per Invocation. This is the limit for threads and processes within one Lambda invocation. Concurrent invocations don't interact with each other. Each of them, independently, can create up to 1024 threads/processes (rarely something you'd need), each of them has 512M temp space, each of them has the amount of memory you provisioned, and they don't compete with each other for CPU cycles. The concurrent invocation limit is a similar number, only by coincidence. – Piefer 20/8, 2017 at 18:3

Thanks @Michael Agree on your points and updated the answer accordingly for future references. – Bashaw 20/8, 2017 at 18:18

One other nuance is the "burst concurrency quota" -- depending on the AWS Region you're in you will be limited in how many new instances can be provisioned. Up until this quota AWS will immediately provision new instances, but beyond this quota the new instances are created at 500 instances per minute, with excess requests receiving 429 exceptions. docs.aws.amazon.com/lambda/latest/dg/invocation-scaling.html – Edythedythe 18/1, 2022 at 18:1

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags