Query string degenerate cases
Asked Answered
K

1

2

I am looking around looking for a correct regualr expression for validating URI query strings. I found some answers here or here but I still have doubts on the edge cases, where the key or the value could be empty. For example, should be the following treated as valid query strings?

?&&
?=
?a=
?a=&
?=a
?&=a
Kvass answered 6/2, 2017 at 13:28 Comment(0)
P
4

I am looking [...] for a correct regular expression for [valid] URI query strings.

Sure thing, no prob. As per RFC 3986, appendix B, here it is:

^([^#]*)$

If you want something more elaborate, you can check section 3.4 for the allowed characters in addition to percent-encoded entities. The regex would look something like this:

^(%[[:xdigit:]]{2}|[[:print:]])*$

As far as RFC 3986 is concerned, all your examples are valid so far. The RFC is telling us how the query string has to be encoded while saying little about how the query string has to be structured. Older RFCs are continuously shifting authority over the structure of query strings between CGI and HTTP without ever formally specifying a grammar (see e.g. RFC 3875, sec. 4.1.7, RFC 2396, sec. 3.4, RFC 1808, sec. 2.1, …).

An interesting note can be found in RFC 7230, section 2.4:

Applications MUST NOT directly specify the syntax of queries, as this can cause operational difficulties for deployments that do not support a particular form of a query. […] HTML constrains the syntax of query strings used in form submission. New form languages SHOULD NOT emulate it, but instead allow creation of a broader variety of URIs

For a full validity check on such query strings, you would have to implement the algorithm for decoding formdata recommended by the W3C. Could be done in regex, but I would advise against it for reasons of sanity.

With regard to your examples: I believe they are all valid. How they are interpreted should be left to the receiving application. Some are not as much of a fringe case as you may think, though: ?&& is simply an empty dictionary while ?=a could map to { "": "a" }.

Piteous answered 9/2, 2017 at 2:57 Comment(2)
Promising future standard here...Arnst
@FranklinYu I've seen that. However, it bears mentioning that this is a "living standard" which does not seek to replace RFC 3986. If at all, it is augmenting it.Piteous

© 2022 - 2024 — McMap. All rights reserved.