Regular Expression in Bash Script
Asked Answered
Y

1

34

I'm trying to check if a string matches this format:

10_06_13

i.e. todays date, or a similar date with "2digits_2digits_2digits".

What I've done:

regex='([0-9][0-9][_][0-9][0-9][_][0-9][0-9])'
if [[ "$incoming_string" =~ $regex ]]
then
   # Do awesome stuff here
fi

This works to a certain extent. But when the incoming string equals 011_100_131, it still passes the regex check. How can I fix my regex to only accept the right format?

Yonyona answered 10/6, 2013 at 16:2 Comment(2)
Note that the underscores don't need to be in square brackets. _ matches the same thing as [_].Colson
011_100_131 would not match with your regex. 011_10_131 would.Technique
A
53

=~ succeeds if the string on the left contains a match for the regex on the right. If you want to know if the string matches the regex, you need to "anchor" the regex on both sides, like this:

regex='^[0-9][0-9][_][0-9][0-9][_][0-9][0-9]$'
if [[ $incoming_string =~ $regex ]]
then
  # Do awesome stuff here
fi

The ^ only succeeds at the beginning of the string, and the $ only succeeds at the end.

Notes:

  1. I removed the unnecessary () from the regex and "" from the [[ ... ]].
  2. The bash manual is poorly worded, since it says that =~ succeeds if the string matches.
Afoot answered 10/6, 2013 at 16:14 Comment(3)
Damn, I was so close! I presumed '^' was to exclude characters. Thank you very much! :)Yonyona
@Robbie: ^ means "excluding" when it is the first thing in a character set ([...]), and it means "anchored" when it is the first thing in a pattern. Otherwise, it just matches ^ (but that's not true in all regex implementations; sometimes it means "match the beginning of a line"). I agree that it's confusing until you get used to it.Afoot
as mentioned above, you could replace [_] with _ without changing what the regex matches.Technique

© 2022 - 2024 — McMap. All rights reserved.