Validate URL query string with regex
Asked Answered
H

7

12

I'm trying to validate a query string with regex. Note that I'm not trying to match out the values, but validate its syntax. I'm doing this to practice regex, so I'd appreciate help rather than "use this lib", although seing how it may have been done in a lib would help me, so show me if you've got one.

So, this would be the prerequisites:

  • It must start with a questionmark.
  • It may contain keys with or without values separated by an equals-sign, pairs separated by ampersand.

I've got pretty far, but I'm having trouble matching in regex that the equals-sign and ampersand must be in a certain order without having to repeat match groups. This is what I've got so far:

#^\?([\w\-]+((&|=)([\w\-]+)*)*)?$#

It correctly matches ?abc=123&def=345, but it also incorrectly matches for example ?abc=123=456.

I could go overkill and do something like...

/^\?([\w\-]+=?([\w\-]+)?(&[\w\-]+(=?[\w\-]*)?)*)?$/

... but I don't want to repeat the match groups which are the same anyway.

How can I tell regex that the separators between values must iterate between & and = without repeating match groups or catastrophic back tracking?

Thank you.

Edit:

I'd like to clarify that this is not meant for a real-world implementation; for that, the built-in library in your language, which is most likely available should be used. This question is asked because I want to improve my regex skills, and parsing a query string seemed like a rewarding challenge.

Hillhouse answered 30/5, 2014 at 16:33 Comment(1)
This question pops up in Google on "query string regex" search. I must note that what happens here should not be used live even if you are limited to regex-based solutions, since it lacks this point and who knows what else (I don't, but there are many pieces of specs I don't know).Lundberg
S
16

This seems to be what you want:

^\?([\w-]+(=[\w-]*)?(&[\w-]+(=[\w-]*)?)*)?$

See live demo

This considers each "pair" as a key followed by an optional value (which maybe blank), and has a first pair, followed by an optional & then another pair,and the whole expression (except for the leading?) is optional. Doing it this way prevents matching ?&abc=def

Also note that hyphen doesn't need escaping when last in the character class, allowing a slight simplification.

You seem to want to allow hyphens anywhere in keys or values. If keys need to be hyphen free:

^\?(\w+(=[\w-]*)?(&\w+(=[\w-]*)?)*)?$
Switch answered 30/5, 2014 at 16:51 Comment(6)
Thanks, that's what I'm using. I was looking for a way not to repeat the match groups though.Robinrobina
There's no way unfortunately. It would be nice to be able to label a chunk and reuse it, but no.Switch
Thank you, that was the answer I was looking for =)Robinrobina
While perfectly valid, this is not matching: "?q=denbosch&lat=45.234342364&long=-45.234342364"Breadroot
@BartBurg Add a dot to the character classes: ^\?(\w+(=[\w.-]*)?(&\w+(=[\w.-]*)?)*)?$Switch
. - : % + added in param value ^\?([\w-]+(=[\w.\-:%+]*)?(&[\w-]+(=[\w.\-:%+]*)?)*)?$Receptacle
P
5

You can use this regex:

^\?([^=]+=[^=]+&)+[^=]+(=[^=]+)?$

What it does is:

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  \?                       '?'
--------------------------------------------------------------------------------
  (                        group and capture to \1 (1 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    [^=]+                    any character except: '=' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    =                        '='
--------------------------------------------------------------------------------
    [^=]+                    any character except: '=' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    &                        '&'
--------------------------------------------------------------------------------
  )+                       end of \1 (NOTE: because you are using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in \1)
--------------------------------------------------------------------------------
  [^=]+                    any character except: '=' (1 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  (                        group and capture to \2 (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    =                        '='
--------------------------------------------------------------------------------
    [^=]+                    any character except: '=' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )?                       end of \2 (NOTE: because you are using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in \2)
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
Placia answered 30/5, 2014 at 16:46 Comment(3)
this matches "? = "Switch
@Amit Joki Where did you get that printout?Viceregal
google.co.in/…Placia
V
2

I agree with Andy Lester, but a possible regex solution is

#^\?([\w-]+=[\w-]*(&[\w-]+=[\w-]*))?$#

which is very much like what you posted.

I haven't tested it and you didn't say what language you're using so it may need a little tweaking.

Viceregal answered 30/5, 2014 at 16:45 Comment(2)
Yes, this is for practice =) Thank you for your answer, this is pretty much what I'm using right now, but it still repeats the match groups, which I wanted to avoid.Robinrobina
@HelgeTalvikSöderström Why? It's simply not possible.Viceregal
E
1

This might not be a job for regexes, but for existing tools in your language of choice. Regexes are not a magic wand you wave at every problem that happens to involve strings. You probably want to use existing code that has already been written, tested, and debugged.

In PHP, use the parse_url function.

Perl: URI module.

Ruby: URI module.

.NET: 'Uri' class

Eleonoreleonora answered 30/5, 2014 at 16:38 Comment(4)
Thank you for your reply, but if you read my post again, you'll see I'm doing this to practice regex, so I'd appreciate help rather than "use this lib".Robinrobina
Yes, I saw that, and I also know that people will find this answer anyway. I'm thinking about future users as well as you.Eleonoreleonora
Well, I interpret that as providing an answer for people who may be looking for another reply, stumbling upon this question... I'd appreciate if we stayed on subject and discussed regex, not the particular example I use for practice.Robinrobina
parse_url() will pretty much try to parse anything... you can't use it as validation. It even returns a result for JSON strings :)Carner
S
1

I made this.

function isValidURL(url) {
  // based off https://mathiasbynens.be/demo/url-regex. testing https://regex101.com/r/pyrDTK/2
  var pattern = /^(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:\/?)(?:(?:\?(?:(?!&|\?)(?:\S))+=(?:(?!&|\?)(?:\S))+)(?:&(?:(?!&|\?)(?:\S))+=(?:(?!&|\?)(?:\S))+)*)?$/iuS;
  return pattern.test(url);
}

Base: https://mathiasbynens.be/demo/url-regex

Testing: https://regex101.com/r/pyrDTK/4/

Suspend answered 6/6, 2017 at 18:46 Comment(1)
works for my example ags_s=fa_ds.coOfilia
M
0

When you need to validate a very complex url, you may use this regex

`^(https|ftp|http|ftps):\/\/([a-z\d_]+\.)?(([a-zA-Z\d_]+)(\.[a-zA-Z]{2,6}))(\/[a-zA-Z\d_\%\-=\+]+)*(\?)?([a-zA-Z\d=_\+\%\-&\{\}\:]+)?`
Modillion answered 22/8, 2020 at 2:53 Comment(0)
R
0
/^\?([\w-]+(=[\w.\-:%+]*)?(&[\w-]+(=[\w.\-:%+]*)?)*)?$/

\w = [a-zA-Z0-9_]

? = '?'

above regex supports, a-z A-Z 0-9 _ . - : % + in Param Value

you can test this regex here

Receptacle answered 26/11, 2021 at 4:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.