HTTP request compression
Asked Answered
P

3

10

General Use-Case

Imagine a client that is uploading large amounts of JSON. The Content-Type should remain application/json because that describes the actual data. Accept-Encoding and Transfer-Encoding seem to be for telling the server how it should format the response. It appears that responses use the Content-Encoding header explicitly for this purpose, but it is not a valid request header.

Is there something I am missing? Has anyone found an elegant solution?

Specific Use-Case

My use-case is that I have a mobile app that is generating large amounts of JSON (and some binary data in some cases but to a lesser extent) and compressing the requests saves a large amount of bandwidth. I am using Tomcat as my Servlet container. I am using Spring for its MVC annotations primarily just to abstract away some of the JEE stuff into a much cleaner, annotation-based interface. I also use Jackson for auto (de)serialization.

I also use nginx, but I am not sure if thats where I want the decompression to take place. The nginx nodes simply balance the requests which are then distributed through the data center. It would be just as nice to keep it compressed until it actually got to the node that was going to process it.

Thanks in advance,

John

EDIT:

The discussion between myself and @DaSourcerer was really helpful for those that are curious about the state of things at the time of writing this.

I ended up implementing a solution of my own. Note that this specifies the branch "ohmage-3.0", but it will soon be merged into the master branch. You might want to check there to see if I have made any updates/fixes.

https://github.com/ohmage/server/blob/ohmage-3.0/src/org/ohmage/servlet/filter/DecompressionFilter.java

Parabolic answered 10/12, 2013 at 22:49 Comment(2)
The github link is broken!Gerhardt
It looks like it was renamed: github.com/ohmage/server/blob/master/src/org/ohmage/jee/filter/…Bade
A
12

It appears [Content-Encoding] is not a valid request header.

That is actually not quite true. As per RFC 2616, sec 14.11, Content-Encoding is an entity header which means it can be applied on the entities of both, http responses and requests. Through the powers of multipart MIME messages, even selected parts of a request (or response) can be compressed.

However, webserver support for compressed request bodies is rather slim. Apache supports it to a degree via the mod_deflate module. It's not entirely clear to me if nginx can handle compressed requests.

Apraxia answered 13/12, 2013 at 21:50 Comment(9)
Interesting. So, the lack of support is what was confusing me. It kind of makes sense the more I think about it. It's one thing to tell the server to respond in a way you understand, but it's completely different to just start speaking that way in hopes that the server will understand. Although, who doesn't have a GZIP implementation these days?Parabolic
"it's completely different to just start speaking that way in hopes that the server will understand." Well, there is the Expect: 100-continue for that. But that's an entirely different story ...Apraxia
Yeah! That makes perfect sense. I like it. The lack of support still saddens me, but I can do that on my own. Thanks!Parabolic
Hm, per this post (see first answer), some webservers seem to have deliberately chosen not to implement this for security reasons. Would make sense, tbh.Apraxia
Thanks for the heads up! Based on sections 7 and 3.7.2, it seems that the parts of a multipart request are considered to be their own entities; therefore, they should get their own headers. Therefore, I feel like I should be allowed to have a Content-Encoding (along with the Content-Type) on each part. However, most libraries that I am trying to use seem to make this very difficult if not impossible. :(Parabolic
Well, compressing parts of a request is even more rare than compressed requests as a whole. Leaving the world of Java, I think Guzzle supports compressing selected parts of a multipart request. Then again, webserver support for this is pretty much nil.Apraxia
That makes sense. The compressing of parts is/was my original problem, but when I posted this question I decided to simplify it and make it more general. Thanks for all the info! I implemented a solution that seems to be working for now and will update my question.Parabolic
There is a spec draft RFC7694 Client-Initiated Content-Encoding. Maybe one day it will be accepted as recomendedHoofer
@SergeyPonomarev that is more than a mere draft. It is a fully-blown RFC.Apraxia
A
12

Because the original code is not available any more. In case someone come here need it. I use "Content-Encoding: gzip" to identify the filter need to decompression or not.

Here's the codes.

 @Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException
{
    HttpServletRequest httpServletRequest = (HttpServletRequest) request;

    String contentEncoding = httpServletRequest.getHeader("Content-Encoding");
    if (contentEncoding != null && contentEncoding.indexOf("gzip") > -1)
    {
        try
        {
            final InputStream decompressStream = StreamHelper.decompressStream(httpServletRequest.getInputStream());

            httpServletRequest = new HttpServletRequestWrapper(httpServletRequest)
            {

                @Override
                public ServletInputStream getInputStream() throws IOException
                {
                    return new DecompressServletInputStream(decompressStream);
                }

                @Override
                public BufferedReader getReader() throws IOException
                {
                    return new BufferedReader(new InputStreamReader(decompressStream));
                }
            };
        }
        catch (IOException e)
        {
            mLogger.error("error while handling the request", e);
        }
    }

    chain.doFilter(httpServletRequest, response);
}

Simple ServletInputStream wrapper class

public static class DecompressServletInputStream extends ServletInputStream
{
    private InputStream inputStream;

    public DecompressServletInputStream(InputStream input)
    {
        inputStream = input;

    }

    @Override
    public int read() throws IOException
    {
        return inputStream.read();
    }

}

Decompression stream code

public class StreamHelper
{

    /**
     * Gzip magic number, fixed values in the beginning to identify the gzip
     * format <br>
     * http://www.gzip.org/zlib/rfc-gzip.html#file-format
     */
    private static final byte GZIP_ID1 = 0x1f;
    /**
     * Gzip magic number, fixed values in the beginning to identify the gzip
     * format <br>
     * http://www.gzip.org/zlib/rfc-gzip.html#file-format
     */
    private static final byte GZIP_ID2 = (byte) 0x8b;

    /**
     * Return decompression input stream if needed.
     * 
     * @param input
     *            original stream
     * @return decompression stream
     * @throws IOException
     *             exception while reading the input
     */
    public static InputStream decompressStream(InputStream input) throws IOException
    {
        PushbackInputStream pushbackInput = new PushbackInputStream(input, 2);

        byte[] signature = new byte[2];
        pushbackInput.read(signature);
        pushbackInput.unread(signature);

        if (signature[0] == GZIP_ID1 && signature[1] == GZIP_ID2)
        {
            return new GZIPInputStream(pushbackInput);
        }
        return pushbackInput;
    }
}
Abduction answered 17/3, 2015 at 9:55 Comment(1)
Very nice. In Spring Boot, this is put inside a simple Bean FilterDelmydeloach
T
2

Add to your header when you are sending:

JSON : "Accept-Encoding" : "gzip, deflate"

Client code :

HttpUriRequest request = new HttpGet(url);
request.addHeader("Accept-Encoding", "gzip");

@JulianReschke pointed out that there can be a case of:

"Content-Encoding" : "gzip, gzip"

so extended server code will be:

InputStream in = response.getEntity().getContent();
Header encodingHeader = response.getFirstHeader("Content-Encoding");

String gzip = "gzip";
if (encodingHeader != null) {
    String encoding = encodingHeader.getValue().toLowerCase();
    int firstGzip = encoding.indexOf(gzip);
    if (firstGzip > -1) {
      in = new GZIPInputStream(in);
      int secondGzip = encoding.indexOf(gzip, firstGzip + gzip.length());
      if (secondGzip > -1) {
        in = new GZIPInputStream(in);
      }
    }
}

I suppose that nginx is used as load balancer or proxy, so you need to set tomcat to do decompression.

Add following attributes to the Connector in server.xml on Tomcat,

<Connector 
compression="on"
compressionMinSize="2048"
compressableMimeType="text/html,application/json"
... />

Accepting gziped requests in tomcat is a different story. You'll have to put a filter in front of your servlets to enable request decompression. You can find more about that here.

Thulium answered 10/12, 2013 at 23:29 Comment(11)
First, thanks for responding! This is what I'm getting at. The Connector will allow my Servlet and all of my code to act as if there was no compression and then Tomcat, on the way out the door, will compress it. I was wondering if there was something similar in the other direction. The "Server Code" above requires that I do this in my code. This means things like Spring and Jackson's auto-deserialization are lost. It's not hard to emulate on my own, but given that it happens on the way out, why isn't there something similar on the way in.Parabolic
Interesting. This is another case of someone GZIP'ing on the way out (not in), but it has given me an idea. I will use a custom Filter that will continue the chain with a custom ServletRequest (not response) class. I want to keep this unchecked until I have implemented the solution, but I will be sure to check it once I have.Parabolic
This code will fail to properly handle more complex versions of Content-Encoding, such as "gzip, gzip".Iggy
@JulianReschke can you help me to improve the answer?Thulium
Well, you'll need to do more work in parsing. 1) Combine all field values using "," as separator, 2) split the resulting value on ",", 3) ensure you understand all tokens and process them in the right order.Iggy
@JulianReschke do you think that all this can be summarised in check where encoding contains "gzip" word? It is a dirty, but I am quite sure that can cover most of the cases.Thulium
No, it, for instance, when the field value is "gzip, gzip", you would have to ungzip twice.Iggy
You only check one header field instance, and you have hard-wired the number of gzip tokens you support...Iggy
@JulianReschke I really have no clue about header instances. Can you give me an example that this snippet wan't work?Thulium
For instance: "...Content-Encoding: gzip\nContent-Encoding: foo\n..." - your code would ignore the second header field.Iggy
The complex encoding fix looks backwards to me. They are supposed to be used "in the order in which they were applied" (developer.mozilla.org/en-US/docs/Web/HTTP/Headers/…), so wouldn't they need to be unencoded in reverse order? (obviously "gzip,gzip" doesn't matter, but it would matter if the value was "gzip,defalte,compress" instead)Mora

© 2022 - 2024 — McMap. All rights reserved.