AWS Application Load Balancer (ALB): How many http2 persistent-connections can it keep alive at the same time?

Essentially what the subject says.

I'm new to this sport and need some high-level pieces of information to figure out the behaviour of ALB towards http2 persistent connections.

I know that ALB supports http2 persistent connections:

https://docs.aws.amazon.com/elasticloadbalancing/latest/application/application-load-balancers.html#connection-idle-timeout

I can't find anything in the docs explaining how the size(s) of the http2-connection-pools (maintained by ALB) are configured (if at all). Any links on this specific aspect?
Does the ALB, by default, maintain a fixed-size http2-connection-pool between itself and the browsers (clients) or are these connection-pools dynamically sized? If they are fixed-size how big are they by default? If they are dynamic what rules govern their expansion/contraction and what's the max amount of persistent http2-connections that they can hold? 30k? 40k?
Let's assume we have 20k http2-clients that run single-page-applications (SPAs) with sessions lasting up to 30mins. These clients need to enjoy ultra-low latency for their semi-frequent http2-requests through AWS ALB (say 1 request per 4secs which translates to about 5k requests/second landing on the ALB):

Does it make sense to configure the ALB to have a hefty http2-connection-pool so as to ensure that all of these 20k http2-connections from our clients will indeed be kept alive throughout the lifetime of the client-session?

Reasoning: In this way no http2-connection will be closed and reopened (guarantees lower jitter because reestablishing a new http2-connection involves some extra latency - at least that's my intuition about this and I'd be happy to stand corrected if I miss something)

I asked this question in the amazon forums:

https://repost.aws/questions/QULRcA_-73QxuAOyGYWhExng/aws-application-load-balancer-and-http-2-persistent-connections-keep-alive

And I got this answer which covers every aspect in question in good detail:

<< So, when it comes to the concurrent connection limits of an Application Load Balancer, there is no upper limitations on the amount of traffic it can serve; it can scale automatically to meet the vast majority of traffic workloads.

An ALB will scale up aggressively as traffic increases, and scale down conservatively as traffic decreases. As it scales up, new higher capacity nodes will be added and registered with DNS, and previous nodes will be removed. This effectively gives an ALB a dynamic connection pool to work with.

When working with the client behavior you have described, the main attribute you'll want to look at when configuring your ALB will be the Connection Idle Timeout setting. By default, this is set to 60 seconds, but can be set to a value of up to 4000 seconds. In your situation, you can set a value that will meet your need to maintain long-term connections of up to 30 minutes without the connection being terminated, in conjunction with utilizing HTTP keep-alive options within your application.

As you might expect, an ALB will start with an initial capacity that may not immediately meet your workload. But as stated above, the ALB will scale up aggressively, and scale down conservatively, scaling up in minutes, and down in hours, based on the traffic received. I highly recommend checking out our best practices for ELB evaluation page to learn more about scaling and how you can test your application to better understand how an ALB will behave based on your traffic load. I will highlight from this page that depending on how quickly traffic increases, the ALB may return an HTTP 503 error if it has not yet fully scaled to meet traffic demand, but will ultimately scale to the necessary capacity. When load testing, we recommend that traffic be increased at no more than 50 percent over a five minute interval.

When it comes to pricing, ALBs are charged for each hour that the ALB is running, and the number of Load Balancer Capacity Units (LCU) used per hour. LCUs are measured based on a set of dimensions on which traffic is processed; new connections, active connections, processed bytes, and rule evaluations, and you are charged based only on the dimension with the highest usage in a particular hour.

As an example using the ELB Pricing Calculator, assuming the ~20,000 connections are ramped up by 10 connections per second, with an average connection duration of 30 minutes (1800 seconds) and sending 1 request every 4 seconds for a total of 1GB of processed data per hour, you could expect a rough cost output of:

  1 GB per hour / 1 GB processed bytes per hour per LCU for EC2
  instances and IP addresses as targets
  = 1 processed bytes LCUs for EC2 instances and IP addresses
  as targets

  10 new connections per second / 25 new connections
  per second per LCU = 0.40 new connections LCUs

  10 new connections per second x 1,800 seconds
  = 18,000 active connections

  18,000 active connections / 3000 connections per LCU
  = 6 active connections LCUs

  1 rules per request - 10 free rules = -9 paid rules per request
  after 10 free rules Max (-9 USD, 0 USD) = 0.00 paid rules per
  request Max (1 processed bytes LCUs, 0.4 new connections LCUs,
  6 active connections LCUs, 0 rule evaluation LCUs)
  = 6 maximum LCUs

  1 load balancers x 6 LCUs x 0.008 LCU price per hour x 730 hours
  per month = 35.04 USD

  Application Load Balancer LCU usage charges (monthly): 35.04 USD

Recommended topics

Hot tags