AWS Load Balancer 502
G

2

29

I have microservices(in different programming languages) running on an EC2 instance. On production I notice a few 502 Bad Gateway Errors when these services try to interact with each other. Also in the logs of the requested service it doesn't show any api call is being hit

example service A calls service B, but in service B logs there is nothing to indicate that a call came from service A.

Can it be AWS load balancer issue? Any help would be appreciated. Thanks in advance.

Solution tried: We tried making http/https connection agents in each service but still we get this issue.

Update: In lb logs, the api is logged, but the target response code shows "-" whereas lb response code shows 502 or 504. Does it mean that lb is not able to handle the traffic or my application?

Also what can be the possible solution?

Groin answered 2/11, 2017 at 9:27 Comment(6)
You can enable lb logs , if traffic passes through it in correct ways you will be able to see output or post logs hereFerdie
In lb logs, the api is logged, but the target response code shows "-" whereas lb response code shows 502 or 504. Does it mean that lb is not able to handle the traffic or my application? @KushVyasGroin
@Root We have exactly the same problem. Do you still have it, or did you find a solution?Abusive
@JanDoerrenhaus Yes we have found the solutionGroin
We are experiencing the exact same issueOakley
@CadeEmbery Did you try draining the instances?Groin
L
30

We had the same problem.

In our setup, an AWS Application ELB has a target group of 4 EC2 instances. On each of the EC2 instances, there is an Apache2 which forwards to a Tomcat.

The ELB has a default connection KeepAlive of 60 seconds. Apache2 has a default connection KeepAlive of 5 seconds. If the 5 seconds are over, the Apache2 closes its connection and resets the connection with the ELB. However, if a request comes in at precisely the right time, the ELB will accept it, decide which host to forward it to, and in that moment, the Apache closes the connection. This will result in said 502 error code.

The solution is: When you have cascading proxies/LBs, either align their KeepAlive timeouts, or - preferrably - even make them a little longer the further down the line you get.

We set the ELB timeout to 60 seconds and the Apache2 timeout to 120 seconds. Problem gone.

Logistics answered 4/5, 2018 at 11:56 Comment(7)
We figured the issue in our system It was due to the immediate shutdown of ec2 instances, instead of waiting for draining period We already had elb set to 60 seconds and apache at 120secondsGroin
We are having same issue currently, when this case happen, can we see any log on Apache side?Royalty
@Royalty We didn't, no. Because the Apache does not notice anything being wrong. The ELB access logs show the request with the 502 status code, and the Apache access logs show nothing.Abusive
@Jan thank you for the information! actually it’s also the same. I checked apache access log and error log, but I could not find anything... we will try the same setting as you and see how.Royalty
This was so difficult to figure out - thanks for this Q/A. This resolved my problem as soon as I increased the KeepAliveTimeoutSectorial
Apparently if you do a packet capture on the "receiving end" (the application server) you may be able to "Se the response FIN,ACK packets and the new request SYN packets cross paths in most cases, but it is hard to catch."Alsace
I have the same problems. And I change keepalive timeout is the solution. But It is better if you change keepalive is more than a little bit is better than change 2x times of ALB idle in order to prevent waiting too long in Target Webserver if the connection of ALB and Target is disconnected because of network interruptions.Agogue
H
1

Health checks use HTTP2. I got my EC2 instances running NGINX to healthy by adding http2 to the listen 80.

listen 80 default_server http2;

Harpoon answered 4/12, 2021 at 5:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.