Java: Regular expression where each character occurs 0-1 times
Asked Answered
F

1

5

Problem:

  1. Match words in which each char of the regular expression occurs once at most.

  2. The word must be of a certain size, let's say "{2,5}"

  3. One specific char must be in the word, let's say char "e"

What I've got:

word.matches("^[abcde]{2,5}$");

This matches all words where the chars a, b, c, d, and e occur 0..5 times. Therefore the words "abba" and "dead" are matched even though "abba" uses the char "b" two times and "dead" uses the char "d" two times. The expression also ignores if the char "e" is in the word.

What I want is a match where each character is used once maximum, the word is 2-5 letters long and the char "e" is in the word. A legit match would then be "bead" for instance since each char is used once max and the char "e" is in the word.

Fourierism answered 29/7, 2013 at 20:1 Comment(1)
That would be a complicated regular expression. I'd suggest not using regex for that at all. Regex isn't capable of "match this once anywhere in the string".Andee
P
11

You could use expressions like:

^(?=[abcd]*e)(?:([abcde])(?![abcde]*?\1)){2,5}$

Some comments:

^
(?=[abcd]*e)     # make sure there is an "e"
(?:
  ([abcde])      # match a character and capture it
  (?!            # make sure it's not repeated
    [abcde]*?
    \1           # reference to the previously matched char
  )
){2,5}
$
Prefix answered 29/7, 2013 at 20:7 Comment(7)
Seems to work, I thought it was going to be much more complicated. Nice answer +1.Raymund
@RohitJain, # comments are valid comments in regex strings when using (?x) or the equivalent flag. So please leave those in. Docs.Prefix
@Qtax. True for Perl. Not for Java. You can't break regex strings into multiple line using (?x), you have to use normal string concatenation.Hoarhound
Thanks for the help. Problem part 2 and 3 are solved by this! However part 1 which is about each char only appearing once does not. For instance "ebba" and "bedda" are matched.Fourierism
@Qtax. Sorry about confusion. I was saying that you can't have newlines without concatenation.Hoarhound
@Lillem4n, I'm guessing that you didn't quote it properly. (Output/print the string to see what you got.) When quoting you need to escape the backslash with another backslash. Like: "^(?=[abcd]*e)(?:([abcde])(?![abcde]*?\\1)){2,5}$"Prefix
@Qtax, Thank you for pointing that out. The whole problem is now solved! Thank you Qtax and especially Rohit Jain. Noticed I have some reading to do about assertions!Fourierism

© 2022 - 2024 — McMap. All rights reserved.