What is a valid URL query string?
Asked Answered
B

3

13

What characters are allowed in an URL query string?

Do query strings have to follow a particular format?

Bibb answered 14/11, 2012 at 5:29 Comment(1)
Anything other than those needs to be escaped are allowed in an URL, see what needs to be escaped in this question: #2323264Chat
L
13

Per https://www.rfc-editor.org/rfc/rfc3986

In section 2.2 Reserved Characters, the following characters are listed:

reserved = gen-delims / sub-delims

gen-delims = “:” / “/” / “?” / “#” / “[” / “]” / “@”

sub-delims = “!” / “$” / “&” / “’” / “(” / “)” / “*” / “+” / “,” / “;” / “=”

The spec then says:

If data for a URI component would conflict with a reserved character’s purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.

Next, in section 2.3 Unreserved Characters, the following are listed:

unreserved = ALPHA / DIGIT / “-” / “.” / “_” / “~”

Lehet answered 14/11, 2012 at 5:36 Comment(2)
RFC 3986 - Section 3.4 specifically describes the query string and notably includes the sub-delims and a handful of others. In summary: A-Z, a-z, 0-9, -, ., _, ~, !, $, &, ', (, ), *, +, ,, ;, =, :, @, /, ?Heartthrob
@Heartthrob It's been a while since your comment, but what does your summary mean in plain english? Do these characters need to be encoded or not? I've looked at section 3.4 but didn't see a list.Uterine
P
12

Wikipedia has your answer: http://en.wikipedia.org/wiki/Query_string

"URL Encoding: Some characters cannot be part of a URL (for example, the space) and some other characters have a special meaning in a URL: for example, the character # can be used to further specify a subsection (or fragment) of a document; the character = is used to separate a name from a value. A query string may need to be converted to satisfy these constraints. This can be done using a schema known as URL encoding.

In particular, encoding the query string uses the following rules:

  • Letters (A-Z and a-z), numbers (0-9) and the characters '.','-','~' and '_' are left as-is
  • SPACE is encoded as '+' or %20[citation needed]
  • All other characters are encoded as %FF hex representation with any non-ASCII characters first encoded as UTF-8 (or other specified encoding)

The octet corresponding to the tilde ("~") character is often encoded as "%7E" by older URI processing implementations; the "%7E" can be replaced by"~" without changing its interpretation. The encoding of SPACE as '+' and the selection of "as-is" characters distinguishes this encoding from RFC 1738."

Regarding the format, query strings are name value pairs. The ? separates the query string from the URL. Each name value pair is separated by an ampersand (&) while the name (key) and value is separated by an equals sign (=). eg. http://domain.com?key=value&secondkey=secondvalue

Under Structure in the Wikipedia reference I provided:

  • The question mark is used as a separator and is not part of the query string.
  • The query string is composed of a series of field-value pairs
  • Within each pair, the field name and value are separated by an equals sign, '='.
  • The series of pairs is separated by the ampersand, '&' (or semicolon, ';' for URLs embedded in HTML and not generated by a ...; see below).
  • W3C recommends that all web servers support semicolon separators in addition to ampersand separators[6] to allow application/x-www-form-urlencoded query strings in URLs within HTML documents without having to entity escape ampersands.
Philoprogenitive answered 14/11, 2012 at 5:37 Comment(3)
Can you provide a citation for the final paragraph?Appendicitis
I added that paragraph based on personal experience but I've updated and added more information that I could find to back it up. In doing so, I noticed that key-values are not only separated by an ampersand but can be by a semi-colon although I've never come across it before. Also, the question mark is not part of the QS but is rather a separator.Philoprogenitive
In the text of the answer: "each name value pair is prefixed with an ampersand" the wording ("prefixed") is misleading. Farther down, there is the correct "...pairs is separated...".Gehring
S
1

This link has the answer and formatted values you all need.

https://perishablepress.com/url-character-codes/

For your convenience, this is the list:

<     %3C
>     %3E
#     %23
%     %25
{     %7B
}     %7D
|     %7C
\     %5C
^     %5E
~     %7E
[     %5B
]     %5D
`     %60
;     %3B
/     %2F
?     %3F
:     %3A
@     %40
=     %3D
&     %26
$     %24
+     %2B
"     %22
space     %20
Silvanus answered 21/7, 2015 at 10:3 Comment(1)
Note that link-only answers are discouraged, SO answers should be the end-point of a search for a solution (vs. yet another stopover of references, which tend to get stale over time). Please consider adding a stand-alone synopsis here, keeping the link as a reference.Linkwork

© 2022 - 2024 — McMap. All rights reserved.