Apache HttpClient 4.x behaving strange when uploading larger files?
Asked Answered
B

1

4

I'm developing and testing a little straight-forward client-server application using java (and scala).

The server is based on com.sun.net.httpserver.HttpServer and allows the upload of files via a basic RESTful interface using POST and PUT operations. The upload operation is restricted using Digest authentication which we implemented by ourselves, is tested and works in browsers, curl and Apache HttpClient.

The upload client wraps Apache HttpClient 4.1.2 and executes PUT operations over http to upload file entities. The content-type of the file is specified as application/xml in the header and only a single file is uploaded at a time.

When uploading files of different sizes a strange behaviour could be observed:

  • Files with sizes less or equals to 1.076.006 Byte are uploaded successfully.
  • Files with sizes greater or equals to 1.122.158 Bytes fail with a java.net.SocketException: Broken pipe.

(The exact critical size is unknown since I've created files with different sizes manually to approximate the max working size)

The reason for the broken pipe is, that the client somehow ignored the www-authenticate-response uploading files of that size, as is documented by the server logs. "Ignore" means, that it just send multiple (4) messages containing no authentication header at all. But smaller files work well and the client sends an authentication request with the proper challenge-response correctly immediately after the www-authenticate-response as it should be.

The upload works in curl with files of all sizes, so no problem there.

So at this point, one could say: "There is some bug in your client." Okay, I kind of hope so, but I've also tried an open-source java RESTclient (also wrapping apache httpclient) and it has exactly the same behaviour!

We tried it using this client over the internet and its also the same as described. So right now, I just hope I've missed to set something important in Apache HttpClient which leads to this erroneous behaviour and the developer of the open-source RESTclient missed it as well... any ideas what it could be would be great!

Britain answered 6/2, 2012 at 14:25 Comment(0)
C
6

Most likely it is a combination of several factors that leads to this situation

(1) Most likely your client does not use the 'expect-continue' handshake when sending large request entity with a request that does not include an authentication header.

(2) The server detects early that the request fails its expectations and instead of reading and discarding full request body it responds early with 401 status and closes connection on its end. In my opinion, this is an HTTP protocol violation on the part of the server.

(3) While some HTTP agents can deal with early responses, Apache HttpClient cannot due to the limitation of the Java blocking I/O (a thread of execution can either read or write from a blocking socket, but not both).

There are multiple ways of addressing the issue, the 'expect-continue' handshake being the easiest and most natural one. Alternatively one can execute a simple HEAD or a GET request to force HTTP authentication prior to executing a large POST or PUT request. HttpClient is capable of re-using authentication data for subsequent requests in the same logical HTTP session.

Chaqueta answered 7/2, 2012 at 12:40 Comment(4)
Thanks for the explanation, it makes perfectly sense! Right now I go for the 'expect-continue'-solution. In the client it is just flipping a boolean. The handshake at the server is in work now, I am quite convinced this should solve the problem.Britain
Strangely the underlying sun httpserver always responds with a 100-continue, without involving the application! It seems to me that it violates the protocol (see RFC 2616 "Use of the 100 (Continue) Status", httpserver-source: docjar.com/html/api/sun/net/httpserver/ServerImpl.java.html). But I've implemented your second solution triggering the authentication with a second request before sending a large amount of data. This works, so thanks for that!Britain
@Britain it is none of my business, but why are you using Sun's ServerImpl at all when there are so many decent embeddable HTTP servers these days?Chaqueta
Because I'm doing a project at the university, and thats what my professor has choosen and embedded ... I've noticed that almost no one is using it. But I've learned some interesting things going this hard way, so in total it was not all bad ;) At least I will appreciate the decent servers more!Britain

© 2022 - 2024 — McMap. All rights reserved.