I asked this question in the amazon forums:
https://repost.aws/questions/QULRcA_-73QxuAOyGYWhExng/aws-application-load-balancer-and-http-2-persistent-connections-keep-alive
And I got this answer which covers every aspect in question in good detail:
<< So, when it comes to the concurrent connection limits of an Application Load Balancer, there is no upper limitations on the amount of traffic it can serve; it can scale automatically to meet the vast majority of traffic workloads.
An ALB will scale up aggressively as traffic increases, and scale down conservatively as traffic decreases. As it scales up, new higher capacity nodes will be added and registered with DNS, and previous nodes will be removed. This effectively gives an ALB a dynamic connection pool to work with.
When working with the client behavior you have described, the main attribute you'll want to look at when configuring your ALB will be the Connection Idle Timeout setting. By default, this is set to 60 seconds, but can be set to a value of up to 4000 seconds. In your situation, you can set a value that will meet your need to maintain long-term connections of up to 30 minutes without the connection being terminated, in conjunction with utilizing HTTP keep-alive options within your application.
As you might expect, an ALB will start with an initial capacity that may not immediately meet your workload. But as stated above, the ALB will scale up aggressively, and scale down conservatively, scaling up in minutes, and down in hours, based on the traffic received. I highly recommend checking out our best practices for ELB evaluation page to learn more about scaling and how you can test your application to better understand how an ALB will behave based on your traffic load. I will highlight from this page that depending on how quickly traffic increases, the ALB may return an HTTP 503 error if it has not yet fully scaled to meet traffic demand, but will ultimately scale to the necessary capacity. When load testing, we recommend that traffic be increased at no more than 50 percent over a five minute interval.
When it comes to pricing, ALBs are charged for each hour that the ALB is running, and the number of Load Balancer Capacity Units (LCU) used per hour. LCUs are measured based on a set of dimensions on which traffic is processed; new connections, active connections, processed bytes, and rule evaluations, and you are charged based only on the dimension with the highest usage in a particular hour.
As an example using the ELB Pricing Calculator, assuming the ~20,000 connections are ramped up by 10 connections per second, with an average connection duration of 30 minutes (1800 seconds) and sending 1 request every 4 seconds for a total of 1GB of processed data per hour, you could expect a rough cost output of:
1 GB per hour / 1 GB processed bytes per hour per LCU for EC2
instances and IP addresses as targets
= 1 processed bytes LCUs for EC2 instances and IP addresses
as targets
10 new connections per second / 25 new connections
per second per LCU = 0.40 new connections LCUs
10 new connections per second x 1,800 seconds
= 18,000 active connections
18,000 active connections / 3000 connections per LCU
= 6 active connections LCUs
1 rules per request - 10 free rules = -9 paid rules per request
after 10 free rules Max (-9 USD, 0 USD) = 0.00 paid rules per
request Max (1 processed bytes LCUs, 0.4 new connections LCUs,
6 active connections LCUs, 0 rule evaluation LCUs)
= 6 maximum LCUs
1 load balancers x 6 LCUs x 0.008 LCU price per hour x 730 hours
per month = 35.04 USD
Application Load Balancer LCU usage charges (monthly): 35.04 USD
<<