Giảm thiểu ảnh hưởng bởi tình trạng ALB, ASG không scale out kịp

Bài toán:
Cơ chế của ASG là add thêm instance vào target group (horizontal scaling), cơ chế của ALB cũng tương tự. Tham khảo repost dưới đây:

https://repost.aws/questions/QULRcA_-73QxuAOyGYWhExng

An ALB will scale up aggressively as traffic increases, and scale down conservatively as traffic decreases. As it scales up, new higher capacity nodes will be added and registered with DNS, and previous nodes will be removed. This effectively gives an ALB a dynamic connection pool to work with.

\=> Nếu trong trường hợp traffic tăng quá nhanh và đột ngột (giả sử > 50% trong vòng 5 phút) thì sẽ xảy ra trường hợp không scale kịp.

Ngoài việc scale không kịp do đặc điểm kỹ thuật, thì 1 lỗi cũng khá thường xuyên gặp là InsufficientInstanceCapacity, dẫn tới không launch được instance mới khi traffic tăng cao

\=> Cách để giảm thiểu ở đây là gì?

Trả lời:

Để giảm thiểu ảnh hưởng khi gia tăng access đột ngột, bạn có thể xem xét việc kết hợp thêm các cách làm sau:
・Cách làm 1: Sử dụng CloudFront để cache static content và giảm tải cho ALB và EC2 instance
Chi tiết về CloudFront, bạn hãy tham khảo [1]

・Cách làm 2: Sử dụng WAF để bảo vệ các AWS Resource như là CloudFront, ALB trong trường hợp gia tăng access đột ngột do bị tấn công.
Chi tiết về WAF, bạn hãy tham khảo [2]. Bạn có thể xem xét thêm các biên pháp để đối ứng với DDoS attack tại tài liệu [3]

・Cách làm 3: Xem xét sử dụng On-Demand Capacity Reservations hoặc Reserved Instances để phòng tránh trường hợp AWS bị hết tài nguyên dẫn tới không thể scale out EC2 instance. [4] [5]

・Cách làm 4: Xem xét Pre-Warming Load Balancer.
Trong một số trường hợp nhất định, chẳng hạn như bạn biết được khi nào lượng traffic tăng cao đột biến, bạn có thể liên hệ với AWS để warm-up ALB.
Khi nhận được warm up request, AWS sẽ thiết định (config) capacity cho load balancer dựa trên số lượng traffic dự kiến.
Chi tiết về AWS warm up, bạn tham khảo thêm trong tài liệu [6] và [7]

Tài liệu tham khảo:

[1] https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#RequestBehaviorCustomOrigin

Simultaneous requests for the same object (traffic spikes)
When a CloudFront edge location receives a request for an object and either the object isn’t currently in the cache or the object has expired, CloudFront immediately sends the request to your origin. If there’s a traffic spike—if additional requests for the same object arrive at the edge location before your origin responds to the first request—CloudFront pauses briefly before forwarding additional requests for the object to your origin. Typically, the response to the first request will arrive at the CloudFront edge location before the response to subsequent requests. This brief pause helps to reduce unnecessary load on your origin server. If additional requests are not identical because, for example, you configured CloudFront to cache based on request headers or cookies, CloudFront forwards all of the unique requests to your origin.

[2] docs.aws.amazon.com/waf/latest/developergui..

[3] docs.aws.amazon.com/waf/latest/developergui..

[4] docs.aws.amazon.com/AWSEC2/latest/UserGuide..

[5] docs.aws.amazon.com/AWSEC2/latest/UserGuide..

[6] aws.amazon.com/jp/articles/best-practices-i..

In certain scenarios, such as when flash traffic is expected, or in the case where a load test cannot be configured to gradually increase traffic, we recommend that you contact us to have your load balancer “pre-warmed”. We will then configure the load balancer to have the appropriate level of capacity based on the traffic that you expect.

[7] [ELBの暖機申請について申請する目安はありますか？への回答 | 1. 暖機申請するときの目安はありますか？]
　dev.classmethod.jp/articles/tsnote-alb-pre-..

Hiểu thêm về pre-warming của ELB:

Q: Overview about pre-warming

A: Auto Scaling service would be meant for scaling horizontally i.e. adding additional instances running behind the load balancer, and not the ELB nodes itself. In a scenario like yours for sudden rise of requests, as you can see in the article we have ‘pre-warming’ solution meant for ELB – where we (AWS) configure the ELB to have the appropriate level of capacity to handle the surge of incoming traffic during the time period requested for.

Pre-warming is done to ensure that the “sudden” rise in traffic can be handled by the ELB, although note that ELB generally does not need pre-warming in scenarios where there’s a slow/gradual rise in traffic.

++ Note that here the ELB is taken care of when ‘pre-warmed’, but you also have to make sure that the backend instances (application/service) are able to handle the high number of requests. Meaning, the processing capacity of the backend instances should also be good enough to handle the surge.

Whenever you are planning to get your load balancers pre-warmed, you shall have to provide answers to the following questionnaire and share it with AWS in order to verify and approve the pre-warming request:

1. ELB DNS name:
2. Event start date/time:
3. Event end date/time:
4. Expected requests per second (or expected concurrent connections for TCP listeners):
5. Expected rate of traffic increase:
6. Average amount of data passing through the ELB per request/response pair (In Bytes):
7. Expected percentage of traffic using SSL termination on the ELB:
8. Number of AZ's that will be used for this load balancer:
9. Are keep-alives (persistent connections) used on the backend:
10. Is the backend currently scaled to the level it will be during the event:
11. A description of the traffic pattern you are expecting:
12. A brief description of your use case:

Q: How the price be caculate?
A: Please note that the charges will still be applicable as per the LCUs used i.e. the cost will vary depending on the capacity used. In brief, it is the number LCUs used that will decide the varying factor here – aws.amazon.com/elasticloadbalancing/pricing

Note: If you would like to get assistance on pricing, please reach out to the AWS Billing team as they will have more expertise regarding this.

Q: Currently the solution is simple: internet -> ALB -> EC2 (do not have ASG), and I think the automatic way to handle high traffic is ASG, but hope that you can suggest for me some sample architecture or some AWS services that can help me reduce the effect as much as possible

A: As far as you are concerned with only a sudden spike of traffic, prewarming should be good enough to make sure the load balancer is also configured for the surge. But, this is true only if you have taken care of also the backend instances to be able to serve to this rise in traffic. If not, Auto Scaling can help in scaling up with the additional instances in order to meet the performance requirements. Read more about Auto Scaling associated with ELB here – https://docs.aws.amazon.com/autoscaling/ec2/userguide/attach-load-balancer-asg.html

Q: I see the article about pre-warming is pretty old and it it seems that this is only for testing purpose: “We will need to know the start and end dates of your tests or expected flash traffic, the expected request rate per second and the total size of the typical request/response that you will be testing.”

Can I still use it? And can I use it in reality?

A: Pre-warming is not meant only for testing purposes. This mechanism is in place to also handle any upcoming events, or any other occurrence where you expect a sudden rise in number of requests. You can use it for real-world impacted scenarios, apart from just testing.