How should an HTTP client properly parse *chunked* HTTP response body?

Asked 24/1, 2010 at 15:57 Answered 24/1, 2010 at 16:17

When chunked HTTP transfer encoding is used, why does the server need to write out both the chunk size in bytes and have the subsequent chunk data end with CRLF?

Doesn't this make sending binary data "CRLF-unclean" and the method a bit redundant?

What if the data has a 0x0A followed by 0x0D in it somewhere (i.e. these are actually part of the data)? Is the client then expected to adhere to the chunk size explicitly provided at the head of the chunk or choke on the first CRLF it encounters in the data?

My understanding so far of expected client behaviour is to simply take the chunk size provided by the server, proceed to the next line, then read exactly this amount of bytes from within the following data (CRLF or no CRLF therein), then skip the CRLF following the data and repeat the procedure until no more chunks. Is this compliant behaviour? If so, what is the point of the CRLF after each datachunk then? Readability?

I have done some Web searching on this and also did some reading of the HTTP 1.1 specification, but a definitive answer seems to be eluding me.

Oriente answered 24/1, 2010 at 15:57 Comment(0)

A chunked consumer does not scan the message body for a CRLF pair. It first reads the specified number of bytes, and then reads two more bytes to confirm that they are CR and LF. If they're not, the message body is ill-formed, and either the size was specified improperly or the data was otherwise corrupted.

The trailing CRLF is a belt-and-suspenders assurance (per RFC 2616 section 3.6.1, Chunked Transfer Coding), but it also serves to maintain the consistent rule that fields start at the beginning of the line.

Diskson answered 24/1, 2010 at 16:17 Comment(3)

Thanks for the explanation. Do you take it from the RFC 2616 document, or somewhere else? Does your explanation also imply that the response chunk MAY NOT contain CRLF combination as part of the data itself? – Oriente 24/1, 2010 at 23:12

It follows from the EBNF in the RFC; note that chunk-data consists of OCTET, which suggests that those bytes are not to be interpreted. A response chunk can certainly contain CRLF. I've implemented a chunked codec twice now, both times in Java, and in each case I did not do any interpretation of the content of the chunk data. It's opaque to the chunk framing. The decoder determines the expected length, reads that many bytes, and then ensures that the next two bytes are CR and LF. – Diskson 25/1, 2010 at 1:40

That makes it perfectly clear to me. Octets rule. Thank you for your time. – Oriente 25/1, 2010 at 12:52

The CRLF after each chunk is probably just for better readability as it’s not necessary due to the chunk size at the begin of each chunk. But the CRLF after the “chunk header” is necessary as there may be additional information after the chunk size (see Chunk Transfer Encoding):

      chunk          = chunk-size [ chunk-extension ] CRLF
                       chunk-data CRLF

Horrocks answered 24/1, 2010 at 16:5 Comment(2)

But, even with additional information, is it not redundant to supply BOTH the chunk data size AND the CRLF after? That was sort of what I could not get my head around - why BOTH? You take the size of the chunk, read the N specified bytes ahead, and that is it for the actual chunk data, proceed from there on to assume trailer headers or a CRLF, without a CRLF preceding optional headers. – Oriente 24/1, 2010 at 23:13

Thank you for your time. "seh" has answered my question, but nevertheless, all digestable information is valuable ;-) – Oriente 25/1, 2010 at 12:54

Recommended topics

Hot tags