Should cookie values be URL encoded?
Asked Answered
I

4

27

When setting cookies, PHP url-encodes the cookie value (at least when not using setrawcookie) and it url-decodes the cookie value before making it available to the application in $_COOKIE.

Is this an accepted standard? If I set a raw cookie value of a%3Db, would I get back a=b in most web programming languages (through their respective cookie-reading mechanisms)?

Illiberal answered 10/3, 2018 at 4:1 Comment(2)
It's perfectly acceptable, even if it's not strictly mandatory. There are only a handful of values that must be URL encoded, but just blanket URL encoding everything makes it much easier to work with cookies.Picot
What are allowed characters in cookies? was the question I was really looking for.Landgrabber
H
17

Yes. While it's not required per the spec, the following is mentioned in RFC6265 (emphasis is in the original document, not added)

To maximize compatibility with user agents, servers that wish to store arbitrary data in a cookie-value SHOULD encode that data, for example, using Base64 [RFC4648].

In my experience, most web frameworks and libraries for cookies have methods for encoding/decoding cookie values. In many cases, esp. in frameworks and high-level languages, this is abstracted away and done automatically.

This answer provides a fairly detailed account of the history behind the values allowed in cookies. Might be of interest to you.

Hermineherminia answered 10/3, 2018 at 4:12 Comment(5)
Sadly, as I recently found out through a lot of headache, BASE64 includes the character "+". And there seems to be no way, (not through a POST, or GET request, and definitely not through the $_COOKIE array) to read a cookie in PHP with the "+" character intact. With base64 so prevalent in its use for transmitting binary data in "safe" characters, the inclusion of + (and "=" for that matter) in base 64 seems a major disconnect with browser development. Fortunately there are many other char's to choose from, so its not difficult to substitute others for those few cases. But its a PITA.Kiangsi
@Kiangsi There is an url-safe variant of the base64 encoding, which avoids + and =.Transposal
@Transposal - Thanks. I up making my own short routine to do it. Besides the '+' and '=' chars giving me trouble, I wanted to be able to directly use the 'processed' name to create a temporary directory on the server. So it turned out the '/' char was also a problem (because it looks like a folder/directory symbol for a file related call). Fortunately there were plenty of usable chars to choose from.Kiangsi
Late to this answer, but the question was whether they should be URL Encoded, which is a specific scheme for escaping otherwise-parser-breaking characters, whereas this answer is about whether cookies should be "encoded" in the "transformed to something that cannot be trivially rewritten". The answer to the actual question is the short and sweet "only a handful of characters must be URL encoded, but everyone uses blanket URL encoding because it's makes things much easier than having to escape and then decode only a few specific things".Picot
@Mike'Pomax'Kamermans yeah, good clarification. As I understand it, the question asks if URL-encoding the cookie value is a good/standard practice. Encoding the values is a standard (but not required) practice (to rephrase your statement: as an easy way to have valid values and avoid a custom encoder/decoder) -- The specific scheme used for encoding will vary by framework. For example, PHP uses url encoding, while other frameworks may use base64 (like the Flask framework, for example) or some other scheme. Developers may also choose to format the value themselves (e.g. PHP's setrawcookie).Hermineherminia
I
16

sytech's answer (which I have accepted) is certainly correct as it quotes the spec, but since the spec is rather vague, here's an overview how some web frameworks actually handle the matter:

RFC6265:           "for example Base64"
PHP:               URL encode
Go:                raw
Node.js + Express: URL encode
Illiberal answered 10/3, 2018 at 4:1 Comment(0)
I
3

Stolen from NCZOnline:

There is some confusion over encoding of a cookie value. The commonly held belief is that cookie values must be URL-encoded, but this is a fallacy even though it is the de facto implementation. The original specification indicates that only three types of characters must be encoded: semicolon, comma, and white space. The specification indicates that URL encoding may be used but stops short of requiring it. The RFC makes no mention of encoding whatsoever. Still, almost all implementations perform some sort of URL encoding on cookie values. In the case of name=value formats, the name and value are typically encoded separately while the equals sign is left as is.

Imperceptive answered 11/7, 2018 at 20:49 Comment(4)
I would not put any stock in what that article says. It pre-dates RFC 6265 and refers to RFCs that are long since obsolete.Genesia
Note that as per the text in RFC 6265, "HTTP State Management", this is factually incorrect. Control codes, all whitespace, and the four characters ", ,, ;, and / are all not permitted and must be escaped.Picot
@Mike'Pomax'Kamermans source that these are illegal? They are being used on an external backend we're using...Virgenvirgie
You mean the link in the comment you are responding to?Picot
V
0

A little late to the party, but since the currently accepted answer is off the mark (it quotes about a part of the RFC with an optional suggestion about further encoding, i.e. base64, for really arbitrary data, which would be in addition to URL-encoding, since regular base64 itself contains characters that require URL-encoding), I'll give a more explicit answer:

RFC6265 from 2011 specifies the permitted bytes in a cookie value string:

cookie-pair       = cookie-name "=" cookie-value
[...]
cookie-value      = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                      ; US-ASCII characters excluding CTLs,
                      ; whitespace DQUOTE, comma, semicolon,
                      ; and backslash

Specifically, the mentioned characters are not allowed to be used un-encoded by the RFC. For these characters, there is no confusion, they must be encoded. The way they wrote it is a little informal (i.e., the human comment "; ..." does not even completely contain the formal list of "%..." values before oit).

So clients must URL-decode at least the mentioned codes and of course all bytes starting at 128, since US-ASCII is 7-bit only). Since all bytes involved in URL-encoding are themselves (of course) included in the bytes that need to be encoded, there is no risk in point-blank url-encoding the complete string (i.e. accepting any 2-digit hex value and decoding it to its byte).

Clients should be able to process all values that are specifically allowed by the RFC. Since the

Verdugo answered 13/5 at 11:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.