There are a few angles to discuss.
AWS Lambda does support handling requests in parallel, but any single instance / container of a Lambda will only process one request at a time. If all existing instances are busy then new ones will be provisioned (depending on concurrency settings, discussed below).
Within a single Lambda instance multi-threading is supported, but still only one request will be handled per instance. In practice parallelization is rarely beneficial in Lambda, it adds significant overhead and is best used for processing very large sets. Additionally, Lambdas need to have more than 1 virtual core for it to have any benefit. Cores are configured by raising the memory setting--many Lambdas run with a low enough memory setting to have just one core.
Determining exactly how many containers / instances are created isn't always possible due to there being many factors:
- Lambda will reuse any existing, paused, instances
- Existing instances are often very fast to handle requests, a small number of warm instances can process many, many requests in the time it takes to provision new instances (especially with runtimes like Java or .NET Core, which often have startup times of 1+ seconds)
- The concurrency settings of your Lambda are a significant factor
- If you have Reserved Concurrency of X, you will never have more than X instances
- If you have unreserved concurrency, then the limit is based on available concurrency. This defaults to 1000 instances per account, so if 990 instances of any Lambdas already exist then only 10 could be created
- If you have provisioned concurrency then you will always have a minimum number of instances, reducing cold-starts
But, to try to answer your story problem, let's assume you are sending your 1000 requests at a steady pace over the 10 minutes. That's one request every 600 milliseconds. Let's also assume your Java app is given a fairly high memory allocation, and its initialization is relatively quick -- let's say 1 second for a cold start. Once the cold start is complete invocation is fast -- let's say 10ms. And, let's assume there are no instances when the traffic begins.
The first request will see a response time of ~1,010ms -- 1 second for a cold start, and 10ms for handling the request. A second request will arrive while the first is still processing, so it's likely that Lambda will provision a second instance, and the second request will see a similar response time.
By the time the third request comes in (1800ms after the start) both instances are now idle and can be reused--so this request will not experience a cold start, and the response time will be 10ms. From this point forward it's likely that no additional instances are needed--but this all assumes a steady rate of requests.
But--changing any variable can have a big impact.