How to split header values?
Asked Answered
D

3

5

I'm parsing HTTP headers. I want to split the header values into arrays where it makes sense.

For example, Cache-Control: no-cache, no-store should return ['no-cache','no-store'].

HTTP RFC2616 says:

Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma. The order in which header fields with the same field-name are received is therefore significant to the interpretation of the combined field value, and thus a proxy MUST NOT change the order of these field values when a message is forwarded

But I'm not sure if the reverse is true -- is it safe to split on comma?

I've already found one example where this causes problems. My User-Agent string, for example, is

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36

i.e., it contains a comma after "KHTML". Obviously I don't have more than one user agent, so it doesn't make sense to split this header.

Is User-Agent string the only exception, or are there more?

Dorotea answered 9/4, 2015 at 21:22 Comment(0)
E
6

No, it is not safe to split headers based on commas. As an example, Accept: foo/bar;p="A,B,C", bob/dole;x="apples,oranges" is a valid header but if you try to split on the comma with the intention of getting a list of mime-types, you'd get invalid results.

The correct answer is that each header is specified using ABNF, most of them in various RFCs, e.g. Accept: is defined in RFC7231 Section 5.3.2.

I had this specific problem and wrote a parser and tested it on edge cases. Not only is parsing the header non-trivial, interpreting it and giving the correct result is also non-trivial.

Some headers are more complex than others, but essentially each header has it's own grammar which should be respected for correct (and secure) processing.

Effortless answered 4/2, 2016 at 4:40 Comment(0)
C
1

if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]

So it's the other way around. You can only assume that Field: value1, value2 is equivalent to Field: value1 + Field: value2 when the specs say that Field supports #(value), i.e. a comma-separated list of values.

Coated answered 9/4, 2015 at 21:38 Comment(2)
Well that's what I was trying to say with "I'm not sure if the reverse is true". Is there somewhere I can find a list of headers that support comma separation so that I can create a black or white list at least?Dorotea
@Mark I've been trying to find those, but RFC2616, RFC7230 and RFC7231 aren't quite exhaustive.Coated
D
1

Reading through the specs, I've concluded the following headers support multiple (comma-separated) values:

  • Accept
  • Accept-Charset
  • Accept-Encoding
  • Accept-Language
  • Accept-Patch
  • Accept-Ranges
  • Allow
  • Cache-Control
  • Connection
  • Content-Encoding
  • Content-Language
  • Expect
  • If-Match
  • If-None-Match
  • Pragma
  • Proxy-Authenticate
  • TE
  • Trailer
  • Transfer-Encoding
  • Upgrade
  • Vary
  • Via
  • Warning
  • WWW-Authenticate
  • X-Forwarded-For

You can use this to create a whitelist of splittable headers.

Dorotea answered 9/4, 2015 at 23:19 Comment(2)
@Coated I deliberately left Set-Cookie off actually. The example from Wikipedia shows sessionToken=abc123; Expires=Wed, 09 Jun 2021 10:18:14 GMT -- there's a comma in the middle of the date. I'm not sure there is a way to put more than one Set-Cookie on a line, is there?Dorotea
What about Accept: foo/bar;p="A,B"? This is valid, but splitting on the comma would not give the desired results. Better to use specific parsers based on the RFC specifications, e.g. github.com/ioquatix/http-acceptEffortless

© 2022 - 2024 — McMap. All rights reserved.