Regex to match alphanumeric characters, underscore, periods and dash, allowing dot and dash only in the middle
Asked Answered
L

5

5

Presently, I am using this:

if (preg_match ('/^[a-zA-Z0-9_]+([a-zA-Z0-9_]*[.-]?[a-zA-Z0-9_]*)*[a-zA-Z0-9_]+$/', $product) ) {
    return true;
} else { 
    return false
}

For example, I want to match:

  1. pro.duct-name_
  2. _pro.duct.name
  3. p.r.o.d_u_c_t.n-a-m-e
  4. product.-name
  5. ____pro.-_-.d___uct.nam._-e

But I don't want to match:

  1. pro..ductname
  2. .productname-
  3. -productname.
  4. -productname
Lorenz answered 26/5, 2012 at 6:58 Comment(5)
Edited the examples, so that its more understandable. Does it need further explanation. Please do let me know, I would be glad to clarify further.Lorenz
Why shouldn't pro..ductname match? The dots are in the middle?Rupp
If only dot would not come twice or any character?Jigging
Because, I don't want to match dot or dash twice consecutively. Dot and dash can appear multiple number of times in the middle, but not consecutively. Now, what happens if dot and dash appear after one another? We allow product.-nameLorenz
Q: "If only dot would not come twice or any character?" A: Dot and dash would not come twice, any other alphanumeric characters can come twice, ppppppppp should match.Lorenz
M
11

The answer would be

/^[a-zA-Z0-9_]+([-.][a-zA-Z0-9_]+)*$/

if only you allowed strings containing .- and -. NOT to match. Why would you allow them to match, anyway? But if you really need these strings to match too, a possible solution is

/^[a-zA-Z0-9_]+((\.(-\.)*-?|-(\.-)*\.?)[a-zA-Z0-9_]+)*$/

The single . or - of the first regex is replaced by a sequence of alternating . and -, starting with either . or -, optionally followed by -. or .- pairs respectively, optionally followed by a - or . respectively, to allow for an even number of alternating chars. This complexity is probably an overshoot, but appears to be needed by current specifications. If a max of 2 alternating . and - is required, the regex becomes

/^[a-zA-Z0-9_]+((\.-?|-\.?)[a-zA-Z0-9_]+)*$/

Test here or here

Mcmurray answered 26/5, 2012 at 7:55 Comment(3)
The second one actually works. Thanks a lot, though I must admit, I do not completely understand the sequence of your second regex.Lorenz
And, I love this bit - (\.-?|-\.?)[a-zA-Z0-9_]+ in the regex. That solves the problem. Great logic.Lorenz
:-) thanks. I added a last regex taking in account what you just wroteMcmurray
J
3

Try this

(?im)^([a-z_][\w\.\-]+)(?![\.\-])\b

UPDATE 1

(?im)^([a-z_](?:[\.\-]\w|\w)+(?![\.\-]))$

UPDATE 2

(?im)^([a-z_](?:\.\-\w|\-\.\w|\-\w|\.\w|\w)+)$

Explanation

<!--
(?im)^([a-z_](?:\.\-\w|\-\.\w|\-\w|\.\w|\w)+)$

Match the remainder of the regex with the options: case insensitive (i); ^ and $ match at line breaks (m) «(?im)»
Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
Match the regular expression below and capture its match into backreference number 1 «([a-z_](?:\.\-\w|\-\.\w|\-\w|\.\w|\w)+)»
   Match a single character present in the list below «[a-z_]»
      A character in the range between “a” and “z” «a-z»
      The character “_” «_»
   Match the regular expression below «(?:\.\-\w|\-\.\w|\-\w|\.\w|\w)+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      Match either the regular expression below (attempting the next alternative only if this one fails) «\.\-\w»
         Match the character “.” literally «\.»
         Match the character “-” literally «\-»
         Match a single character that is a “word character” (letters, digits, and underscores) «\w»
      Or match regular expression number 2 below (attempting the next alternative only if this one fails) «\-\.\w»
         Match the character “-” literally «\-»
         Match the character “.” literally «\.»
         Match a single character that is a “word character” (letters, digits, and underscores) «\w»
      Or match regular expression number 3 below (attempting the next alternative only if this one fails) «\-\w»
         Match the character “-” literally «\-»
         Match a single character that is a “word character” (letters, digits, and underscores) «\w»
      Or match regular expression number 4 below (attempting the next alternative only if this one fails) «\.\w»
         Match the character “.” literally «\.»
         Match a single character that is a “word character” (letters, digits, and underscores) «\w»
      Or match regular expression number 5 below (the entire group fails if this one fails to match) «\w»
         Match a single character that is a “word character” (letters, digits, and underscores) «\w»
Assert position at the end of a line (at the end of the string or before a line break character) «$»
-->

And you could test it here.

Jigging answered 26/5, 2012 at 7:7 Comment(7)
\w is not the same as [a-zA-Z0-9_]Mcmurray
I don't know if this is what @Walter is referring to, but to elaborate a bit, the PHP manual says: A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.Stirpiculture
@WalterTross: Can you provide an example where it doesn't work because the test works fine for the OP's example data.Stirpiculture
product.-name is required to match (see the OP's comments), and doesn't. 123product should match too, given the OP's first regex. The part (?![\.\-]) is not needed, because it is implied in what precedes it. [\.\-] is more readable as [-.]Mcmurray
@WalterTross: Thanks for pointing that. See my update 2. And it is also true that there is no need to an extra negetive lookahead. If 123product product has to match, then the pattern would be more simpler, just replacing the first character class with \w. OP has never comment on this.Jigging
Thanks @Jigging for explaining your answer. Now I understand the syntax. It should also now match product.-name. Regarding 123product I didn't emphasize because I could have inserted it myself. The main thing that was giving me problems was to have alphanumerics/underscore at the beginning /end, plus multiple appearance of dot and dash in the middle without having any one of them appearing consecutively. Thanks a lot for the answer. I really appreciate your answer. :)Lorenz
@banskt: You're welcome. Does it solve your problem? If need any update, let me know.Jigging
S
1

This should do:

/^[A-z0-9_]([.-]?[A-Z0-9_]+)*[.-]?[A-z0-9_]$/

It will make sure that the word begins and ends with alphanumeric or underscore character. The bracket in the middle will make sure that there will be at most one period or dash in a row, followed by at least one alphanumeric or underscore character.

Sadoff answered 26/5, 2012 at 7:22 Comment(0)
T
0
/^[A-Z0-9_][A-Z0-9_.-]*[A-Z0-9_]$/i

This makes sure the first and last character is not a dash or period; the rest in between may consist of any character (within your chosen set).

Takahashi answered 26/5, 2012 at 7:15 Comment(0)
J
0

The regex below will check for any string containing characters, numbers, dashes etc and and only one dot in the middle.

/^[A-Za-z0-9_-]+(\.){1}[A-Za-z0-9_-]+$/i

hope this helps

Janson answered 26/5, 2012 at 7:22 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.