I'm parsing HTTP headers. I want to split the header values into arrays where it makes sense.
For example, Cache-Control: no-cache, no-store
should return ['no-cache','no-store']
.
HTTP RFC2616 says:
Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma. The order in which header fields with the same field-name are received is therefore significant to the interpretation of the combined field value, and thus a proxy MUST NOT change the order of these field values when a message is forwarded
But I'm not sure if the reverse is true -- is it safe to split on comma?
I've already found one example where this causes problems. My User-Agent string, for example, is
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36
i.e., it contains a comma after "KHTML". Obviously I don't have more than one user agent, so it doesn't make sense to split this header.
Is User-Agent string the only exception, or are there more?