Gateway timeout with Spring cloud gateway and Nginx as reverse proxy
Asked Answered
G

4

20

I created an API gateway for my application and it will act as a front controller for other microservices. In my production setup I use Nginx as a reverse proxy for my gateway

The API gateway is running on port 8080

Nginx configured as below:

gateway-api.conf:

server {
    listen 80;
    server_name api.example.com;
    location / {
        proxy_set_header        X-Real-IP       $remote_addr;
        proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_pass http://localhost:30010/;
        keepalive_timeout 500s;
    }
    keepalive_timeout 500s;
    access_log /var/log/nginx/api.log;  
    error_log /var/log/nginx/api_error.log;
}

Timeout setting in nginx.conf:

proxy_connect_timeout 300;
proxy_send_timeout 300;
proxy_read_timeout 300;
send_timeout 300;

Spring cloud gateway gradle file:

compile('org.springframework.cloud:spring-cloud-starter-gateway')
 compile('org.springframework.cloud:spring-cloud-starter-openfeign')
 compile("org.springframework.boot:spring-boot-starter-actuator")
 compile('org.springframework.boot:spring-boot-starter-security')

springBootVersion=2.0.3.RELEASE
springDMPVersion=1.0.4.RELEASE
springPlatformBomVersion=Cairo-SR2
springCloudVersion=Finchley.RELEASE

Gateway application:

@SpringBootApplication
@ComponentScan(basePackages = {"com.example"})
@EntityScan(basePackages = {"com.example"})
@EnableFeignClients(basePackages = "com.example")
public class GatewayApplication {

    public static void main(String[] args) {
        SpringApplication.run(GatewayApplication.class, args);
    }
}

Problem statement:

In one of my microservices, one REST API takes more than 3 minutes to complete. If I call this API via nginx(api.example.com), it fails exactly after 1 minute and gives HTTP status 504.

curl:

curl --request GET \
  --url http://api.example.com/hellomicroservice/api/take/moretime

error:

504 Timeout while reading the response from Server

There are no error logs in nginx or the API gateway.

Access log from nginx:

203.129.213.102 - - [01/Apr/2019:08:14:33 +0000] "GET hellomicroservice/api/take/moretime HTTP/1.1" 499 0 "-" "PostmanRuntime/7.3.0"

But when I make a call of the same API directly to the gateway (on gateway port 8080), the request is processed successfully.

curl with gateway port:

curl --request GET \
  --url http://api.example.com:8080/hellomicroservice/api/take/moretime

Edit:

If I apply the Nginx timeout settings as less than 60 Seconds (e.g. 30 seconds), the request gets timed out in a specified time interval. But if I set the Nginx timeout to be more than 60 seconds, let's 300 Seconds, the request gets timed out after 60 seconds.

Gerrilee answered 1/4, 2019 at 7:40 Comment(6)
what error log on API gateway and nginx error log when you curl via reverse proxy ?Mccandless
No error logs in nginx and gateway, added access logsGerrilee
can you try adding proxy_read_timeout 300s; in server block server { proxy_read_timeout 300s;Mitzimitzie
@RadhaMohanMaheshwari Tried, not workingGerrilee
can you try replace localhost:8080 in nginx setting with api.example.com:8080 ?Corinnecorinth
In a production deployment, we are using AWS EKS and we are not fetching this issue on production with same config filesGerrilee
M
1

It seems the request time-outs are not the problem for you. Its the connection timeout. I think we need to look on the header of

Connection

AFAIK, the Connection header defines, the connection should be persistent or who has the authority to maintain/close it. If the connection was keep-alive , then the connection will be persistent. For the keep-alive connections, the client occasionaly sends a TCP ping to ensure the server is still alive and holding the connection. As per the curl this time by default is every 60 seconds.

Now the nginx has to be configured to accept connections and keep it alive for a while using the keepalive_timeout directive. If this is not there, then nginx will not keep the connections alive.

This should be the reason why nginx says 499 in the logs. HTTP499 is a cutom error in nginx which says the client closed the connection. In your case the curl closed it. Why curlclosed it? because nginx did not respond to the TCP ping of 60 seconds as the keep-alive is not enabled.

Adding the keepalive_timeout to ~500 or a higher value than the application timeout should solve your problem.

Now, why it worked with tomcat directly? I think spring enables the alive timeout to be infinite or a very higher value. Normally in tomcat also its 60 seconds.

I hope this solves your problem.

Malamud answered 8/4, 2019 at 8:37 Comment(5)
Setting keepalive_timeout 500; did not work, same error When I checked the response header I found the connection header as Connection →closeGerrilee
Set like keepalive_timeout 500s; Keep it in the server section or the location. I think the unit 's' must be there.Malamud
tried after putting in server and location section with keepalive_timeout 500s; but same errorGerrilee
May be you should try with --no-keepalive option in curl. I assume Nginx is not respecting the keep-alive connection. You can also check the same with some other clients like http-client / Fiddler or any for that matter. Use of keep-alive is good only if you do multiple operations with same connection.Malamud
Connection just tells the origin server what to do with the TCP socket once the response is finished, the idea being that the client will send further requests along the stream. In this case that would be between curl/postman and nginx. I think it's more likely that nginx is just getting bored and killing the request, which is odd as the read_timeout is 5 mins..Jigsaw
L
0

Keepalive may still not be enabled for the upstream due to your configuration missing the proxy_http_version key.

Quote from: https://www.nginx.com/blog/tuning-nginx/#proxy_http_version

To enable keepalive connections to upstream servers you must also include the following directives in the configuration:

proxy_http_version 1.1;
proxy_set_header Connection "";

I'd also keep the keepalive_timeout in the config as Kris had suggested.

Lyrebird answered 9/4, 2019 at 20:25 Comment(1)
You might have a formatting problem. The example configs show that the "server" block is typically wrapped in an "http { }" block, and the "keepalive_timeout" key is typically a child of the http block, and not the server or location block. You might check your other keys to make sure that they are all under the correct parent too. Also, do you have multiple nginx configs active in the configuration directory?Lyrebird
G
0

Try to put your timeout settings in /etc/nginx/conf.d/timeout.conf (if it is not there create one). Set below settings ,

proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
Geode answered 10/4, 2019 at 4:38 Comment(0)
B
0

I guess this is one of the problems that can happen because of many other things. This is a solution that worked for me (I was also getting errors in /var/log/nginx/error.log:

2020/12/30 21:47:47 [error] 26765#26765: *13064 upstream timed out (110: Connection timed out) while connecting to upstream, client: XXX, server: example.com, request: "GET /eshop HTTP/1.0", upstream: "http://[::1]:8080/error_50x.html", host: "example.com"

Strangely enough, this was not happening on my laptop, but only on my server. So I checked the IPs and it turns out it's maybe because of missing ::1 address. When I added it to the lo network device, I couldn't replicate the timeouts.

sudo ip a add ::1/128 dev lo

Next: (this is my understanding, I am not 100 % sure about it:) Also, since the overhead for keeping a connection to a localhost Java service seems to be higher than just dropping that connection and connecting again when another request is made, it's recommended to use the following settings for proxies (in your nginx's site .conf):

proxy_http_version 1.1;
proxy_set_header Connection "";

See https://mcmap.net/q/120196/-nginx-close-upstream-connection-after-request

Barrens answered 30/12, 2020 at 22:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.