Why Tomcat returns different headers for HEAD and GET requests to my RESTful API?
Asked Answered
K

2

13

My initial purpose was to verify the HTTP chunked transfer. But accidentally found this inconsistency.

The API is designed to return a file to client. I use HEAD and GET methods against it. Different headers are returned.

For GET, I get these headers: (This is what I expected.)

Transfer-Encoding: chunked, no Content-Length.

For HEAD, I get these headers:

Content-Length: 1017118720, no Transfer-Encoding.

According to this thread, HEAD and GET SHOULD return identical headers but not necessarily.

My question is:

If Transfer-Encoding: chunked is used because the file is dynamically fed to the client and Tomcat server cannot know its size beforehand, how could Tomcat know the Content-Length when HEAD method is used? Does Tomcat just dry-run the handler and count all the file bytes? Why doesn't it simply return the same Transfer-Encoding: chunked header?

Below is my RESTful API implemented with Spring Web MVC:

@RestController
public class ChunkedTransferAPI {

    @Autowired
    ServletContext servletContext;

    @RequestMapping(value = "bootfile.efi", method = { RequestMethod.GET, RequestMethod.HEAD })
    public void doHttpBoot(HttpServletResponse response) {

        String filename = "/bootfile.efi";
        try {
            ServletOutputStream output = response.getOutputStream();
            InputStream input = servletContext.getResourceAsStream(filename);
            BufferedInputStream bufferedInput = new BufferedInputStream(input);
            int datum = bufferedInput.read();
            while (datum != -1) {
                output.write(datum);
                datum = bufferedInput.read();
            }
            output.flush();
            output.close();

        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

}

ADD 1

In my code, I didn't explicitly add any headers, then it must be Tomcat that add the Content-Length and Transfer-Encoding headers as it sees fit.

So, what are the rules for Tomcat to decide which headers to send?

ADD 2

Maybe it's related to how Tomcat works. I hope someone can shed some light here. Otherwise, I will debug into the source of Tomcat 8 and share the result. But that may take a while.

Related:

Kentigerma answered 15/12, 2015 at 13:12 Comment(0)
U
3

Does Tomcat just dry-run the handler and count all the file bytes?

Yes, the default implementation of javax.servlet.http.HttpServlet.doHead() does that.

You can look at helper classes NoBodyResponse, NoBodyOutputStream in HttpServlet.java

The DefaultServlet class (the Tomcat servlet that is used to serve static files) is more wise. It is capable of sending the correct Content-Length value, as well as serving GET requests for a subset of the file (the Range header). You can forward your request to that servlet, with

  ServletContext.getNamedDispatcher("default").forward(request, response);
Ulita answered 24/12, 2015 at 13:53 Comment(0)
B
3

Although it seems strange, it might make sense to send the size only in response to a HEAD request and chunked in response to a GET request, depending on the type of data that has to be returned by the server.

While your API seems to provide a static file, you also talk about dynamically created files or data, so I will be talking in general here (also for webservers in general).

First let's have a look at the different usages for GET and HEAD:

  • With GET the client is requesting the whole file or data (or a range of the data), and wants it as fast as possible. So there is no specific reason for the server to send the size of the data first, especially when it could start sending faster/sooner in chunked mode. So the fastest possible way is preferred here (the client will have the size after the download anyway).

  • With HEAD on the other hand, the client usually wants some specific information. This could just be a check on existance or 'last-changed', but it could also be used if the client wants a certain part of the data (with a range request, including a check to see if range requests are supported for that request), or just needs to know the size of the data up front for some reason.

Lest's look at some possible scenarios:

Static file:

HEAD: there's no reason to not include the size in the response-header because that information is available.

GET: most of the time the size will be inluded in the header and the data sent in one go, unless there are specific performance reasons to send it in chunks. On the other hand it seems you are expecting chunked transfer for you file, so this could make sense here.

Live logfile:

Ok, somewhat strange, but possible: downloading a file where the size could change while downloading.

HEAD: again, the client probably wants the size, and the server can easily provide the size of the file at that specific time in the header.

GET: since loglines could be added while downloading, the size is unknown up front. Only option is to send chunked.

Table with fixed-sized records:

Let's imagine a server needs to send back a table with fixed-length records coming from multiple sources/databases:

HEAD: size is probably wanted by the client. The server could quickly do a query for count in each database, and send the calculated size back to the client.

GET: instead of doing a query for count in each database first, the server better starts sending the resulting records from each database in chunks.

Dynamically generated zip-files:

Maybe not common, but an interesting example.

Imagine you want to provide dynamically generated zip-files to the user based on some parameters.

Let's first have a look at the structure of a zip-file:

There are two parts: first there's a block for each file: a small header followed by the compressed data for that file. Then there's a list of all the files inside the zip-file (including sizes/positions).

So the prepared blocks for each file could be pre-generated on disk (and the names/sizes stored in some data structure.

HEAD: the client probably wants to know the size here. The server can easily calculate the size of all the needed blocks + the size of the second part with the list of the files inside.

If the client wants to extract a single file, it could directly ask for the last part of the file (with a range-request) to have the list, and then with a second request ask for that single file. Although the size is not necessarily needed to get the last n bytes, it could be handy if for example if you wanted to store the different parts in a sparse file with the same size of the full zip-file.

GET: no need to do the calculations first (including generating the second part to know its size). It would be better and faster to just start sending each block in chunks.

Fully dynamically generated file:

In this case it wouldn't be very efficient to return the size to a HEAD request of course, since the whole file would need to be generated just to know its size.

Blindly answered 24/12, 2015 at 16:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.