HTTP 1.1 Pipelining
Asked Answered
L

3

6

I have to implement an HTTP client in Java and for my needs it seems that the most efficient way to do it, is implement HTTP pipeline (as per RFC2616).

As an aside, I want to pipeline POSTs. (Also I am not talking about multiplexing. I am talking about pipelining i.e. many requests over one connection before receiving any response- batching of HTTP requests)

I could not find a third party library that explicitly states it supports pipelining. But I could use e.g. Apache HTTPCore to build such a client, or if I have to, build it by myself.

The problem I have is if it is a good idea. I have not found any authoritative references that HTTP pipelining is something more than a theoretical model and is properly implemented by HTTP servers. Additionally all browsers that support pipelining have this feature off by default.

So, should I try to implement such a client or I will be in a lot of trouble due to server's implementations (or proxies). Is there any reference that gives guidelines on these?

If it is a bad idea what would be the alternative programming model for efficiency? Separate TCP connections?

Llama answered 21/7, 2010 at 10:28 Comment(5)
Not quite what you need, but serf is a C library that implements HTTP pipelining code.google.com/p/serf I'm not 100% sure if it supports pipelined posts, though.Krefetz
Thank you, I have to do it in javaLlama
@user384706 Never tried serf, but if indeed does what you want and everything else fails, then you can always try JNI/JNA.Dibri
@ luiscubal Thank you, but my problem is that if I even used serf using JNI/JNA, is pipelining properly supported by servers or proxies or will I be in trouble? For example my understanding is that apache HTTPClient deliberately does not support pipelining. Could not found authoritative references or concrete examples that it is a feature actually used to increase performance.Llama
For reference: "Java based HTTP Client which supports Pipelining" - #2777505Arsphenamine
A
8

I've implemented a pipelined HTTP client. The basic concept sounds easy but error handling is very hard. The performance gain is so insignificant that we gave up on the concepts long time ago.

In my opinion, it doesn't make sense to normal use-case. It only has some benefits when the requests have logic connections. For example, you have a 3-requests transaction and you can send them all in a batch. But normally you can combine them into one request if they can be pipelined.

Following are just some hurdles I can remember,

  1. TCP's keepalive is not guaranteed persistent connection. If you have 3 requests piped in the connection, server drops connection after first response. You supposed to retry the next two requests.

  2. When you have multiple connections, load balance is also tricky. If no idle connection, you can either use a busy connection or create a new one.

  3. Timeout is also tricky. When one request times out, you have to discard all after it because they must come back in order.

Ashjian answered 21/7, 2010 at 12:27 Comment(4)
@ZZ Coder Thank you! In you client did you pipeline POSTs also? My case is not normal. I want to pipeline real time POSTs that trigger actions in a call center. Any info you might remember, especially about servers/proxies behavior is appreciated!Llama
Yes. It handles POST. There is no difference except that you have to remember body if you implement retry logic.Ashjian
@ZZ Coder - Regarding 1: In case of HTTP, you have to implement retry logic anyway, and retry logic for pipelined connections isn't much different (the only thing being that in case of a retry after a pipeline broke, you have to wait for the first response to see whether it's a pipelined connection or not). And most servers these days have pipelining enabled by default, so except for very bad network connections pipeline drops shouldn't occur to often, I guessFaus
@ZZ Coder - Regarding 3: Can you elaborate what is tricky about it? It does not sound too tricky. Btw., there are HTTP clients that have pipelining enabled by default ( msdn.microsoft.com/en-us/library/…) - using those should be rather straightforward, I guess. Timeout -> Retry, just as an 1.)Faus
F
10

POST should not be pipelined

8.1.2.2 Pipelining

A client that supports persistent connections MAY "pipeline" its requests (i.e., send multiple requests without waiting for each response). A server MUST send its responses to those requests in the same order that the requests were received.

Clients which assume persistent connections and pipeline immediately after connection establishment SHOULD be prepared to retry their connection if the first pipelined attempt fails. If a client does such a retry, it MUST NOT pipeline before it knows the connection is persistent. Clients MUST also be prepared to resend their requests if the server closes the connection before sending all of the corresponding responses.

Clients SHOULD NOT pipeline requests using non-idempotent methods or non-idempotent sequences of methods (see section 9.1.2). Otherwise, a premature termination of the transport connection could lead to indeterminate results. A client wishing to send a non-idempotent request SHOULD wait to send that request until it has received the response status for the previous request.

http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html

Fright answered 21/7, 2010 at 10:50 Comment(3)
Thanks for the reply. But SHOULD NOT means: "there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful" per rfc 2119. This is one of this cases. Unless there is an implication in the SHOULD NOT definition I am failing to undertandLlama
@user384706 If your request is actually idempotent, perhaps you are really doing a PUT?Maidie
@user384706, It means a lousy server could hiccup when you pipeline post it. But true, that isn't your fault, but when things don't work, things don't work. Whoever's fault it is doesn't matter.Flammable
A
8

I've implemented a pipelined HTTP client. The basic concept sounds easy but error handling is very hard. The performance gain is so insignificant that we gave up on the concepts long time ago.

In my opinion, it doesn't make sense to normal use-case. It only has some benefits when the requests have logic connections. For example, you have a 3-requests transaction and you can send them all in a batch. But normally you can combine them into one request if they can be pipelined.

Following are just some hurdles I can remember,

  1. TCP's keepalive is not guaranteed persistent connection. If you have 3 requests piped in the connection, server drops connection after first response. You supposed to retry the next two requests.

  2. When you have multiple connections, load balance is also tricky. If no idle connection, you can either use a busy connection or create a new one.

  3. Timeout is also tricky. When one request times out, you have to discard all after it because they must come back in order.

Ashjian answered 21/7, 2010 at 12:27 Comment(4)
@ZZ Coder Thank you! In you client did you pipeline POSTs also? My case is not normal. I want to pipeline real time POSTs that trigger actions in a call center. Any info you might remember, especially about servers/proxies behavior is appreciated!Llama
Yes. It handles POST. There is no difference except that you have to remember body if you implement retry logic.Ashjian
@ZZ Coder - Regarding 1: In case of HTTP, you have to implement retry logic anyway, and retry logic for pipelined connections isn't much different (the only thing being that in case of a retry after a pipeline broke, you have to wait for the first response to see whether it's a pipelined connection or not). And most servers these days have pipelining enabled by default, so except for very bad network connections pipeline drops shouldn't occur to often, I guessFaus
@ZZ Coder - Regarding 3: Can you elaborate what is tricky about it? It does not sound too tricky. Btw., there are HTTP clients that have pipelining enabled by default ( msdn.microsoft.com/en-us/library/…) - using those should be rather straightforward, I guess. Timeout -> Retry, just as an 1.)Faus
S
-1

pipelining makes almost no difference to http servers; they usually process requests in a connection serially anyway - read a request, write a response, then reads the next request...

but client would very likely improve throughput by multiplexing. websites usually have multiple machines with multiple cpus, why do you want to voluntarily limit your requests into a single line? today it's more about horizontal scalability (concurrent requests). of course, it's best to benchmark it.

Sod answered 21/7, 2010 at 21:21 Comment(2)
In pipelining, at least per definition, the interaction is not serial, since the requests come in batches. Also what if there is a limitation on the number of open connections to the same server?Llama
It makes all the difference when using a high-latency connection (dialup); even more so when it's a "long fat pipe" (satellite). It avoids the overhead of multiple TCP connections, but keeps most of the advantages.Pelagic

© 2022 - 2024 — McMap. All rights reserved.