We have been discussing with one of our data providers the issue that some of the requests from our HTTP requests are intermittently failing due to "Connection Reset" exceptions, but we have also seen "The target server failed to respond" exceptions too.
Many Stack Overflow posts point to some potential solutions, namely
- It's a pooling configuration issue, try reaping
- HttpClient version issue - suggesting downgrading to HttpClient 4.5.1 (often from 4.5.3) fixes it. I'm using 4.5.12 https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient
- The target server is actually failing to process the request (or cloudfront before the origin server).
I'm hoping this question will help me get to the bottom of the root cause.
Context
It's a Java web application hosted in AWS Elastic Beanstalk with 2..4 servers based on load. The Java WAR file uses HttpClient 4.5.12 to communicate. Over the last few months we have seen
45 x Connection Reset (only 3 were timeouts over 30s, the others failed within 20ms)
To put this into context, we perform in the region of 10,000 requests to this supplier, so the error rate isn't excessive, but it is very inconvenient because our customers pay for the service that then subsequently fails.
Right now we are trying to focus on eliminating the "connection reset" scenarios and we have been recommended to try the following:
1) Restart our app servers (a desperate just-in-case scenario)
2) Change the DNS servers to use Google 8.8.8.8 & 8.8.4.4 (so our request take a different path)
3) Assign a static IP to each server (so they can enable us to communicate without going through their CloudFront distribution)
We will work through those suggestions, but at the same time I want to understand where our HttpClient implementation might not be quite right.
Typical usage
User Request --> Our server (JAX-RS request) --> HttpClient to 3rd party --> Response received e.g. JSON/XML --> Massaged response is sent back (Our JSON format)
Technical details
Tomcat 8 with Java 8 running on 64bit Amazon Linux
4.5.12 HttpClient 4.4.13 HttpCore <-- Maven dependencies shows HttpClient 4.5.12 requires 4.4.13 4.5.12 HttpMime
Typically a HTTP request will take anywhere between 200ms and 10 seconds, with timeouts set around 15-30s depending on the API we are invoking. I also use a connection pool and given that most requests should be complete within 30 seconds I felt it was safe to evict anything older than double that period.
Any advice on whether these are sensible values is appreciated.
// max 200 requests in the connection pool
CONNECTIONS_MAX = 200;
// each 3rd party API can only use up to 50, so worst case 4 APIs can be flooded before exhuasted
CONNECTIONS_MAX_PER_ROUTE = 50;
// as our timeouts are typically 30s I'm assuming it's safe to clean up connections
// that are double that
// Connection timeouts are 30s, wasn't sure whether to close 31s or wait 2xtypical = 60s
CONNECTION_CLOSE_IDLE_MS = 60000;
// If the connection hasn't been used for 60s then we aren't busy and we can remove from the connection pool
CONNECTION_EVICT_IDLE_MS = 60000;
// Is this per request or each packet, but all requests should finish within 30s
CONNECTION_TIME_TO_LIVE_MS = 60000;
// To ensure connections are validated if in the pool but hasn't been used for at least 500ms
CONNECTION_VALIDATE_AFTER_INACTIVITY_MS = 500; // WAS 30000 (not test 500ms yet)
Additionally we tend to set the three timeouts to 30s, but I'm sure we can fine-tune these...
// client tries to connect to the server. This denotes the time elapsed before the connection established or Server responded to connection request.
// The time to establish a connection with the remote host
.setConnectTimeout(...) // typical 30s - I guess this could be 5s (if we can't connect by then the remote server is stuffed/busy)
// Used when requesting a connection from the connection manager (pooling)
// The time to fetch a connection from the connection pool
.setConnectionRequestTimeout(...) // typical 30s - I guess only applicable if our pool is saturated, then this means how long to wait to get a connection?
// After establishing the connection, the client socket waits for response after sending the request.
// This is the time of inactivity to wait for packets to arrive
.setSocketTimeout(...) // typical 30s - I believe this is the main one that we care about, if we don't get our payload in 30s then give up
I have copy and pasted the main code we use for all GET/POST requests but stripped out the un-important aspects such as our retry logic, pre-cache and post-cache
We are using a single PoolingHttpClientConnectionManager with a single CloseableHttpClient, they're both configured as follows...
private static PoolingHttpClientConnectionManager createConnectionManager() {
PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
cm.setMaxTotal(CONNECTIONS_MAX); // 200
cm.setDefaultMaxPerRoute(CONNECTIONS_MAX_PER_ROUTE); // 50
cm.setValidateAfterInactivity(CONNECTION_VALIDATE_AFTER_INACTIVITY_MS); // Was 30000 now 500
return cm;
}
private static CloseableHttpClient createHttpClient() {
httpClient = HttpClientBuilder.create()
.setConnectionManager(cm)
.disableAutomaticRetries() // our code does the retries
.evictIdleConnections(CONNECTION_EVICT_IDLE_MS, TimeUnit.MILLISECONDS) // 60000
.setConnectionTimeToLive(CONNECTION_TIME_TO_LIVE_MS, TimeUnit.MILLISECONDS) // 60000
.setRedirectStrategy(LaxRedirectStrategy.INSTANCE)
// .setKeepAliveStrategy() - The default implementation looks solely at the 'Keep-Alive' header's timeout token.
.build();
return httpClient;
}
Every minute I have a thread that tries to reap connections
public static PoolStats performIdleConnectionReaper(Object source) {
synchronized (source) {
final PoolStats totalStats = cm.getTotalStats();
Log.info(source, "max:" + totalStats.getMax() + " avail:" + totalStats.getAvailable() + " leased:" + totalStats.getLeased() + " pending:" + totalStats.getPending());
cm.closeExpiredConnections();
cm.closeIdleConnections(CONNECTION_CLOSE_IDLE_MS, TimeUnit.MILLISECONDS); // 60000
return totalStats;
}
}
This is the custom method that performs all HttpClient GET/POST, it does stats, pre-cache, post-cache and other useful stuff, but I've stripped all of that out and this is the typical outline performed for each request. I've tried to follow the pattern as per the HttpClient docs that tell you to consume the entity and close the response. Note I don't close the httpClient because one instance is being used for all requests.
public static HttpHelperResponse execute(HttpHelperParams params) {
boolean abortRetries = false;
while (!abortRetries && ret.getAttempts() <= params.getMaxRetries()) {
// 1 Create HttpClient
// This is done once in the static init CloseableHttpClient httpClient = createHttpClient(params);
// 2 Create one of the methods, e.g. HttpGet / HttpPost - Note this also adds HTTP headers
// (see separate method below)
HttpRequestBase request = createRequest(params);
// 3 Tell HTTP Client to execute the command
CloseableHttpResponse response = null;
HttpEntity entity = null;
boolean alreadyStreamed = false;
try {
response = httpClient.execute(request);
if (response == null) {
throw new Exception("Null response received");
} else {
final StatusLine statusLine = response.getStatusLine();
ret.setStatusCode(statusLine.getStatusCode());
ret.setReasonPhrase(statusLine.getReasonPhrase());
if (ret.getStatusCode() == 429) {
try {
final int delay = (int) (Math.random() * params.getRetryDelayMs());
Thread.sleep(500 + delay); // minimum 500ms + random amount up to delay specified
} catch (Exception e) {
Log.error(false, params.getSource(), "HttpHelper Rate-limit sleep exception", e, params);
}
} else {
// 4 Read the response
// 6 Deal with the response
// do something useful with the response body
entity = response.getEntity();
if (entity == null) {
throw new Exception("Null entity received");
} else {
ret.setRawResponseAsString(EntityUtils.toString(entity, params.getEncoding()));
ret.setSuccess();
if (response.getAllHeaders() != null) {
for (Header header : response.getAllHeaders()) {
ret.addResponseHeader(header.getName(), header.getValue());
}
}
}
}
}
} catch (Exception ex) {
if (ret.getAttempts() >= params.getMaxRetries()) {
Log.error(false, params.getSource(), ex);
} else {
Log.warn(params.getSource(), ex.getMessage());
}
ret.setError(ex); // If we subsequently get a response then the error will be cleared.
} finally {
ret.incrementAttempts();
// Any HTTP 2xx are considered successfull, so stop retrying, or if
// a specifc HTTP code has been passed to stop retring
if (ret.getStatusCode() >= 200 && ret.getStatusCode() <= 299) {
abortRetries = true;
} else if (params.getDoNotRetryStatusCodes().contains(ret.getStatusCode())) {
abortRetries = true;
}
if (entity != null) {
try {
// and ensure it is fully consumed - hand it back to the pool
EntityUtils.consume(entity);
} catch (IOException ex) {
Log.error(false, params.getSource(), "HttpHelper Was unable to consume entity", params);
}
}
if (response != null) {
try {
// The underlying HTTP connection is still held by the response object
// to allow the response content to be streamed directly from the network socket.
// In order to ensure correct deallocation of system resources
// the user MUST call CloseableHttpResponse#close() from a finally clause.
// Please note that if response content is not fully consumed the underlying
// connection cannot be safely re-used and will be shut down and discarded
// by the connection manager.
response.close();
} catch (IOException ex) {
Log.error(false, params.getSource(), "HttpHelper Was unable to close a response", params);
}
}
// When using connection pooling we don't want to close the client, otherwise the connection
// pool will also be closed
// if (httpClient != null) {
// try {
// httpClient.close();
// } catch (IOException ex) {
// Log.error(false, params.getSource(), "HttpHelper Was unable to close httpClient", params);
// }
// }
}
}
return ret;
}
private static HttpRequestBase createRequest(HttpHelperParams params) {
...
request.setConfig(RequestConfig.copy(RequestConfig.DEFAULT)
// client tries to connect to the server. This denotes the time elapsed before the connection established or Server responded to connection request.
// The time to establish a connection with the remote host
.setConnectTimeout(...) // typical 30s
// Used when requesting a connection from the connection manager (pooling)
// The time to fetch a connection from the connection pool
.setConnectionRequestTimeout(...) // typical 30s
// After establishing the connection, the client socket waits for response after sending the request.
// This is the time of inactivity to wait for packets to arrive
.setSocketTimeout(...) // typical 30s
.build()
);
return request;
}
Connection reset
is pretty much always a server side issue, not a client side. – Icalhttpclient 4.5.11
, I don't have a solution yet tough... For now I'm making the request in a loop, until it works (max 4 times), but I would like to know of a solution. – Epenthesis