Deeper understanding about idle timeout & scaling behavior of ALB
When your web browser or your mobile device makes a TCP connection to an Elastic Load Balancer, the connection is used for the request and the response and then remains open for a short amount of time for possible reuse. This time period is known as the idle timeout for the Load Balancer. The load balancer has a configured idle timeout period that applies to its connections. If no data has been sent or received by the time that the idle timeout period elapses, the load balancer closes the connection.
Hence, next connections will never wait for the previous connections to get closed before they reach to the target as they have there individual sockets to make a connection. We usually recommend to keep the ideal timeout value a slight greater than the application timeout value after consulting your application team accordingly. By default its value is 60 seconds.
In most cases, a 60 second timeout is long enough to allow for the potential reuse that I mentioned earlier. However, in some circumstances, different idle timeout values are more appropriate. Some applications can benefit from a longer timeout because they create a connection and leave it open for polling or extended sessions. Other applications tend to have short, non- recurring requests to AWS and the open connection will hardly ever end up being reused.
Please note that ALB are designed to adjust with increasing amount of traffic. By design, the Application Load Balancer automatically scale to appropriate size based on the amount of traffic it receives. Application Load Balancers scale up as traffic increases, and scale down after traffic decreases. This is done by adding or removing different size load balancer nodes. Elastic Load Balancers scale up aggressively, and scale down conservatively, scaling up in minutes, and down in hours. When scaling, nodes in your load balancer may be replaced with higher capacity or additional nodes may be added. Our scaling system will scale up your load balancer very aggressively, and always keep enough capacity in your load balancer to support. This scaling process happens dynamically, is non-configurable by the customer, and is unnoticed by the customer or end users accessing the application.
In some cases, the traffic that immediately starts coming in to the load balancer will be greater than the amount that the initial capacity configuration supports. Alternatively, if the load balancer is created and not used for some period of time (generally a few hours, but potentially as little as an hour), the load balancer may scale down before the traffic begins to reach it. This means that the load balancer has to scale up just to reach its initial capacity level.
I would recommend you read through the following article for Best Practices in Evaluating Elastic Load Balancing:
(1) https://aws.amazon.com/articles/best-practices-in-evaluating-elastic-load-balancing/
(2) https://aws.amazon.com/blogs/aws/elb-idle-timeout-control/
If traffic will be below 50% increase in 5 minutes interval, then ALB scaling can handle that traffic increase and will do scaling based on traffic increase. As long as the traffic is increasing at a rate of 50% over a 5-minute period, the ALB should be able to scale without any issues.
But, if you are expecting 50% or more increase in the request per second metrics within 5 minutes interval, in that case ALB scaling policy won’t be able to cope off with that much exponential request increase. Hence, in cases where there is a sudden increase in request count (increase of traffic more than 50% in less than 5 minutes period), or when you are expecting a drastic increase in traffic, more than a 50% increase in less than 5 minutes, then ALB will not have sufficient time to scale up.
Related:
ALB: What will happen if I have too many connections that have to wait for idle timeout elapsed to close?
Some further details on the ALB ability to scale in response to incomming traffic:
AWS Application Load Balancer and Http2 Persistent Connections (“keep alive”)
https://repost.aws/questions/QULRcA_-73QxuAOyGYWhExng
From the Elastic Load Balancer standpoint, the ALB support the Least outstanding requests (LOR) algorithm withholding the round-robin algorithm. With this algorithm, as the new request comes in, the load balancer will send it to the target with least number of outstanding requests.