Characters allowed in GET parameter
Asked Answered
Y

8

102

Which characters are allowed in GET parameters without encoding or escaping them? I mean something like this:

http://www.example.org/page.php?name=XYZ

What can you have there instead of XYZ? I think only the following characters:

  • a-z (A-Z)
  • 0-9
  • -
  • _

Is this the full list or are there additional characters allowed?

York answered 21/9, 2009 at 16:59 Comment(2)
possible duplicate of HTTP URL - allowed characters in parameter namesRoz
@j0k: No real dupe, as in the other question escaping is required, as opposed to here, where is liked to be avoided.Heteroclite
G
136

There are reserved characters, that have a reserved meanings, those are delimiters — :/?#[]@ — and subdelimiters — !$&'()*+,;=

There is also a set of characters called unreserved characters — alphanumerics and -._~ — which are not to be encoded.

That means, that anything that doesn't belong to unreserved characters set is supposed to be %-encoded, when they do not have special meaning (e.g. when passed as a part of GET parameter).

See also RFC3986: Uniform Resource Identifier (URI): Generic Syntax

Gabble answered 21/9, 2009 at 17:9 Comment(16)
Thank you very much! So I have to add . and ~ to my list? Can I write index.php?page=start_en-new~. without escaping it?York
It would be somewhat too bold a statement to say you can't, but you shouldn't. If you were to normalize URI you'd have to escape unreserved characters (and only unreserved), but it is very likely that it will actually work unescaped.Gabble
Generally, you have the escape function that escapes everything that needs to be escaped. And you normally use this function to escape all parameters you pass.Gabble
So I shouldn't use ~ and . unescaped, either? So only alphanumeric? Is urlencode() in PHP the function you mean? I could pass all characters to urlencode() and see what goes out unescaped!?York
OMG, I haven't looked carefully at your example. I thought that was just a generic bunch of special characters ;-) No, you don't have to escape those, of course, as they are unreserved. Sorry for confusion. As for urlencode() I have no idea if it works correctly - it's not always the case with PHP functions - but if it does then yes, you can test with it ;-) Like I said - escape everything but unreserved.Gabble
:) Thanks. So I create a page with the name "~my_start-page.en" and pass the name via GET without any problems, correct? page.php?name=~my_start-page.enYork
Yes, that should be it. Those characters are safe as a query parameters with no escaping, so whether you will have problems processing that name later I don't know, but you can pass it with no problems ;-)Gabble
You're right, ~ and . seem to work fine. But what about the other answers here? They mention other characters which can be used unencoded as well. Why didn't you mention them? Are the other answers wrong?York
I did mention RFC on URI syntax, didn't I? And the newest of all RFCs mentioned too! ;-) Actually, like I said, some other approaches to escaping may go unpunished, but still non standard-conformant. As long as URIs are to be normalized and compared for equality in normalized form the punishment will follow the crime ;-)Gabble
So the RFCs mentioned in the other questions are about 8 years older and contain special chars which aren't allowed unencoded anymore?York
I haven't really read those RFCs so I don't know what they deal with. But what those who reference them say is that it deals with characters allowed in the URL. Obviously, & is also allowed, but it has special meaning, so I suspect they answered different question.Gabble
The RFC says that actually it is allowed to not escape the characters /and ?. I was looking this up because Swift does not escape these in their stringByAddingPercentEncodingForURLQueryParameter method! (Correctly, apparently)Imago
I believe this answer is incorrect. The RFC does not say that all unreserved characters need to be percent-encoded. It actually says, "If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed." The answer by @dmitriy explains it well.Vacillation
@paulkernfeld, isn't that exactly what the answer says?Gabble
I'm not sure! I'm having trouble parsing the final sentence of the answer. It might help to concisely and explicitly state exactly which characters must be percent-encoded at the beginning of the answer.Vacillation
@paulkernfeld, I believe it is. As for putting it into beginning of the answer, I don't feel comfortable referring to the entities that are not yet defined :) Also, I do not see how the praised dmitri's answer deals with it any better. It's a fine answer, just more verbose, with more RFC content copied and 9 years late :)Gabble
S
26

The question asks which characters are allowed in GET parameters without encoding or escaping them.

According to RFC3986 (general URL syntax) and RFC7230, section 2.7.1 (HTTP/S URL syntax) the only characters you need to percent-encode are those outside of the query set, see the definition below.

However, there are additional specifications like HTML5, Web forms, and the obsolete Indexed search, W3C recommendation. Those documents add a special meaning to some characters notably, to symbols like = & + ;.

Other answers here suggest that most of the reserved characters should be encoded, including "/" "?". That's not correct. In fact, RFC3986, section 3.4 advises against percent-encoding "/" "?" characters.

it is sometimes better for usability to avoid percent- encoding those characters.

RFC3986 defines query component as:

query       = *( pchar / "/" / "?" )
pchar       = unreserved / pct-encoded / sub-delims / ":" / "@"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims  = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~" 

A percent-encoding mechanism is used to represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component.

The conclusion is that XYZ part should encode:

special: # % = & ;
Space
sub-delims
out of query set: [ ]
non ASCII encodable characters

Unless special symbols = & ; are key=value separators.

Encoding other characters is allowed but not necessary.

Spiderwort answered 21/3, 2018 at 7:18 Comment(3)
Doesn't presence in the "sub-delims" set mean that "!" / "$" / "&" ... are "being used as a delimiter of, or within, the component." and therefor should be percent-encoded?Gladiatorial
Sub-delimiters are not delimiters in query and therefore should not be escaped.Maltreat
Characters in the reserved set are protected from normalization and are therefore safe to be used by scheme-specific and producer-specific algorithms for delimiting data subcomponents within a URI.Skeie
C
9

All of the rules concerning the encoding of URIs (which contains URNs and URLs) are specified in the RFC1738 and the RFC3986, here's a TL;DR of these long and boring documents:

Percent-encoding, also known as URL encoding, is a mechanism for encoding information in a URI under certain circumstances. The characters allowed in a URI are either reserved or unreserved. Reserved characters are those characters that sometimes have special meaning, but they are not the only characters that needs encoding.

There are 66 unreserved characters that doesn't need any encoding: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.~

There are 18 reserved characters which needs to be encoded: !*'();:@&=+$,/?#[], and all the other characters must be encoded.

To percent-encode a character, simply concatenate "%" and its ASCII value in hexadecimal. The php functions urlencode and rawurlencode do this job for you, as well as the js functions encodeURIComponent and encodeURI.

Crass answered 20/9, 2017 at 12:30 Comment(0)
B
8

I did a test using the Chrome address bar and a $QUERY_STRING in bash, and observed the following:

~!@$%^&*()-_=+[{]}\|;:',./? and grave (backtick) are passed through as plaintext.

, ", < and > are converted to %20, %22, %3C and %3E respectively.

# is ignored, since it is used by ye olde anchor.

Personally, I'd say bite the bullet and encode with base64 :)

Bullion answered 18/12, 2013 at 8:40 Comment(3)
These characters you mention are probably the ones that will be escaped in HTML, not the query string. I don't believe =, ? and & can be passed in plain text.Bathsheeb
Appreciate your effort but it really doesnt mean a lot to us as a reserved character could be accepted by Chrome today but not tomorrow, or other clients could reject them - much safer to go with the official definition which is: ALPHA / DIGIT / “-” / “.” / “_” / “~”Silvers
actually this has nothing todo with what the client (browser) is able to send but more what the server is going to with it. If a server is mapping delims or sub-delims directly to a file system it might cause a problem.Skeie
E
5

From RFC 1738 on which characters are allowed in URLs:

Only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.

The reserved characters are ";", "/", "?", ":", "@", "=" and "&", which means you would need to URL encode them if you wish to use them.

Emalia answered 21/9, 2009 at 17:16 Comment(2)
Thanks! Are you sure that I can use $+!'()" without escaping them?York
RFC 1738 is obsolete, see rfc-editor.org/info/rfc1738Undercut
O
3

Alphanumeric characters and all of

~ - _ . ! * ' ( ) ,

are valid within an URL.

All other characters must be encoded.

Overtop answered 21/9, 2009 at 17:1 Comment(2)
Thanks, you've understood everything correctly. I want to know which characters I can use without encoding them. Are you sure that !*'(), are such characters?York
by ctford's answer referring the RFC-1738, the dollar sign is also a special character who does not need encoding.Jacquerie
S
0

"." | "!" | "~" | "*" | "'" | "(" | ")" are also acceptable [RFC2396]. Really, anything can be in a GET parameter if it is properly encoded.

Spleen answered 21/9, 2009 at 17:0 Comment(3)
but those have special meaning, so if you whant to send % or + you have to encode them.Kearney
yeah i don't know why i wrote %Spleen
Thank you! I only want to know which characters can be used WITHOUT encoding or escaping them. I should have pointed out this better. So can I really use *!'()| without encoding them?York
R
0

When passing special character there is a error unable to decode the value so you can use encodeURIComponent for example if I resolved my issue as

updateUrl = updateUrl.replace(
                "SEARCH_TEXT",
                encodeURIComponent(JSON.stringify(searchText))
              );
Relaxation answered 3/4 at 9:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.