Should I use ^ and $ in html5 input regex pattern validation?
Asked Answered
E

3

21

I've seen mostly examples without the ^ (circumflex) and $ (currency or dollar) characters to mark the beginning an end of the string being matched. However, I did not find anything regarding this in the html5 spec. Are they implicit in the pattern? The html5 spec states that they are implicit.

The compiled pattern regular expression, when matched against a string, must have its start anchored to the start of the string and its end anchored to the end of the string. This implies that the regular expression language used for this attribute is the same as that used in JavaScript, except that the pattern attribute is matched against the entire value, not just any subset (somewhat as if it implied a ^(?: at the start of the pattern and a )$ at the end).

In type="text" inputs, the pattern works fine using either format, however in type="tel" inputs, I had to remove the characters for the regex to work as expected. I've tested in both Opera and Firefox.

Is this a browser bug? Should I file a bug in bugzilla etc.?


Edit: It seems that I've stumbled uppon a weird bug, because I'm unable to create a reduced test case. A simple input in a page doesn't shows the behavior stated above. However, the question remains. Should I, or should I not use the darn ^ and $ anchors?

Epochal answered 4/2, 2012 at 16:6 Comment(8)
developer.mozilla.org/en/HTML/Forms_in_HTML mentions this on the tel input: Line breaks are automatically stripped from the input value, but no other syntax is enforced, because telephone numbers vary widely internationally. You can use attributes such as pattern and maxlength to restrict values entered in the control.. It does not mention anything specific about the tel type in its pattern attribute description.Anatropous
Actually, if I check with Firefox, I do not see any strange behaviour in the tel inputs. Could you give a full example including code, what you expected, and what you got instead?Anatropous
Yeah, I've just noticed that, and appended an edit to the question. My form is a little bit tricky to post in here, but I'll see what I can do.Epochal
Can you at least post the exact code of the input element it concerns? And please provide a link then you refer to a specification (I hope it's from W3.org)Anatropous
And as the W3 specification states, the ^ and $ are implied. This means that you do not need to put them there explicitly.Anatropous
I took it from the whatwg draft spec actually, but it's there in the w3c draft as well: w3.org/TR/html5/…Epochal
But what everyone likes to know: is there a bug or not? Please show some examples or otherwise I will flag this question as 'RESOLVED - NOT REPRODUCIBLE' :-)Anatropous
An isolated input="tel" behaves as expected, so the bug must be related to other factors which I did not tested thoroughly yet, like javascript interaction or something to do with page reloads, which are not covered in this question. My form contains sensitive data, so I cannot disclose the full source code at the moment, sorry. Having said that, I believe that this question can be marked as resolved, unless anybody has further considerations. Thanks everyone!Epochal
P
12

The HTML Standard's section on the pattern attribute still states that it is always anchored at the start and end, as already quoted in the question:

The compiled pattern regular expression, when matched against a string, must have its start anchored to the start of the string and its end anchored to the end of the string.

We can use a simple test snippet to confirm this behavior:

<form>
  <input required pattern="abc">
  <button>Submit</button>
</form>

You will notice that the form above rejects values of foo abc and abc foo; only typing exactly the string abc will be accepted. This demonstrates that pattern="abc" is equivalent to pattern="^abc$" and that you don't need to specify the ^ and $ explicitly.

As far as I can tell, the competing answer here claiming that browsers used to implement a different behavior, in violation of spec, is completely false. You can download Firefox 15 from https://ftp.mozilla.org/pub/firefox/releases/15.0/win32/en-GB/ and test out the snippet above in it yourself, and you'll see that the behavior is just like in a modern browser. Or, since you probably can't be bothered, you can check out this screenshot of me doing so for you:

Screenshot of this answer in Firefox 15 showing the form above rejecting input of "abc foo"

Photocompose answered 6/12, 2018 at 23:4 Comment(0)
P
2

According to the standard, the regex is anchored at the start and end. However, in practice (tested FF 15 and Chrome 21) it is anchored at the start only!

So if you want to be compatible both with the standard and reality, you should anchor your regex with a $ explicitly. Whether to use ^ also is up to you - it is not necessary.

Predicative answered 25/9, 2012 at 14:17 Comment(1)
-1; I downloaded Firefox 15 from ftp.mozilla.org/pub/firefox/releases/15.0/win32/en-GB and it does not exhibit the bug you describe here.Photocompose
H
-2

Ofcourse you know phone numbers come in different forms,

e.g.

  • while being in Vienna, Austria, dialing "4000" will connect you to the City Hall.
  • while being in Innsbruck, Austria, you need to dial "014000" to dial the Vienna City Hall
  • while being in New York, USA you need to dial +4314000 to dial the same number.

This has historical reasons, with the old mechanical system delegating the job of connecting the call from one device to the next with every digit (This is also the reason why extensions are at the end of a number, and not at the start, a.o.t the DNS where you can extend your domain names at the front, but not in the end)

Now a regex with both anchors ^ and $ will match a phone number only, if it is given in exactly the same form. With only the $ anchor it will reliably match the same phone number, as long as no different extension is given. No anchor, i.e. dropping ^ and $ will match independant of location codes and extensions, but will introduce unreliability:

Using "4000" as a pattern for the Vienna City Hall will match "4000", "014000" and "+4314000", but it will also match "+44140001" which is a German Bank.

Handhold answered 4/2, 2012 at 16:24 Comment(7)
I'm a little confused now. Maybe I should've posted an example with my question. Using "^[\d]{10}$" for instance, doesn't match 1234567890 in a tel input, but it should, shouldn't it?Epochal
Where did this talk of phone numbers come from? Has the question changed?Daiquiri
No, it talks about the TEL input specificallyEpochal
That does give some reason for the derived implementations. Maybe this is also stated somewhere by Mozilla or Opera?Anatropous
So, putting in context, if the ^ and $ anchors are implicit in the pattern, then I would have to append a "*?" to the "4000" in order for it to match "4000", "014000" and "+4314000".Epochal
Putting it in context, I'd recommend using the end ancho ($), but not the start anchor (^) - for me this is the sweet point between reliability and breadth of match.Handhold
-1 because most of this answer doesn't address the question that was asked, and the bit that finally does address it, in the final two paragraphs, is wrong. <input type="tel" pattern="4000"> will reject input of 014000, contrary to this answer.Photocompose

© 2022 - 2024 — McMap. All rights reserved.