Pattern matching digits does not work in egrep?
Asked Answered
M

5

75

Why can't I match the string

"1234567-1234567890"

with the given regular expression

\d{7}-\d{10}

with egrep from the shell like this:

egrep \d{7}-\d{10} file

?

Marquise answered 6/7, 2010 at 10:36 Comment(4)
I just tried /\d{7}-\d{10}/ and it works fine with that string.Burnsed
It doesn't work; I've written that string from above inside file - but nothing ?!Marquise
@persistent: maybe you need to whip up a hex editor and see what those digit characters really are.Potion
Hey :D ; just a simple digits forwarded from the standard output with echo; echo string > fileMarquise
P
96

egrep doesn't recognize \d shorthand for digit character class, so you need to use e.g. [0-9].

Moreover, while it's not absolutely necessary in this case, it's good habit to quote the regex to prevent misinterpretation by the shell. Thus, something like this should work:

egrep '[0-9]{7}-[0-9]{10}' file

See also

References

Potion answered 6/7, 2010 at 10:42 Comment(12)
Actually he only needs to quote the regex if it contains shell meta-characters. And now that it no longer contains backslashes, it doesn't, so quoting is optional.Grecian
@sepp2k: do you need quote for a space? I think you do. I guess you can argue that a space is a shell metacharacter. Anyway I think it's best to always quote, ala it's best to always use curly braces.Potion
Then how would be with grep instead; I'm interested in \d prefix?!Marquise
@persistent: according to comparison chart I linked, neither POSIX ERE (egrep) nor POSIX BRE (grep) knows \d, \s, \w, \b, etc. Also \d is not a prefix; it's a shorthand for the digit character class supported by many but not all flavors.Potion
Well that's odd; then where they're specified if not inside (e)grep ?Marquise
It's not prefix; inside the sintax it must be specified as a prefix before the brackets; thx for the correction anyway ;)Marquise
@persistent: different flavors of regex does things differently, that's why it's important to mention which flavor you're using when asking regex questions, etc. I'll guess that Perl popularized the \d shorthand, and everyone else followed later.Potion
@polygenelubricants: Yes, you need quotes with spaces (or put a backslash before every space). And sure, it doesn't hurt to always quote.Grecian
@persistent: you can't use \d with grep/egrep; you can use its expanded form [0-9] which is practically the same thing, but slightly longer. In some flavors that supports Unicode, \d is not the same as [0-9] because it also includes some other Unicode digit characters.Potion
Well mate; I already knew for the [block]{} form ; I was interested in \d; thxMarquise
I don't ever use [0-9], when I really mean [[:digit:]]. Plus these character classes are supported almost everywhere, and are defined in POSIX.Tisbe
'\d' wasn't working for me, but '\s' and '\w' did today. grep (GNU grep) 2.23 Packaged by Homebrew.Feline
B
27

For completeness:

Egrep does in fact have support for character classes. The classes are:

  • [:alnum:]
  • [:alpha:]
  • [:cntrl:]
  • [:digit:]
  • [:graph:]
  • [:lower:]
  • [:print:]
  • [:punct:]
  • [:space:]
  • [:upper:]
  • [:xdigit:]

Example (note the double brackets):

egrep '[[:digit:]]{7}-[[:digit:]]{10}' file
Bildungsroman answered 22/1, 2013 at 16:47 Comment(2)
Just a complaint about grep: [[:digit:]] is worse than [[0-9]] in every possible way. None of these are short hand, and they are harder to rememer than the default regex syntax. EG: [[:lower:]] is harder to remember, read and write than [a-z]Godber
@Godber But grep -E i.e. egrep supports both [:digit:] and [0-9] so where is the complaint? If you're comparing with \d it's arguable. d could stand for anything, a bit like one letter variable names. Still seems \d has become more popular. I think grep character classes pre-date Perl \dImmolation
B
22

you can use \d if you pass grep the "perl regex" option, ex:

grep -P "\d{9}"

Bhatt answered 16/10, 2015 at 19:9 Comment(3)
Yes. Thank you!Chock
Not in FreeBSD. Option -P is not supported, unfortunately.Nowell
If you're running AWS CodeBuild you'll still have this issue and you'll need to use grep -PDunfermline
G
11

Use [0-9] instead of \d. egrep doesn't know \d.

Grecian answered 6/7, 2010 at 10:44 Comment(2)
According to regular-expressions.info/gnu.html, repetition in grep is \{7\}.Potion
@polygenelubricants: Sure, I thought he was asking about \d.Grecian
M
-3

try this one:

egrep '(\d{7}-\d{10})' file
Manriquez answered 6/7, 2010 at 10:45 Comment(2)
Traditional egrep did not support the { metacharacter, and some egrep implementations support \{ instead, so portable scripts should avoid { in egrep patterns and should use [{] to match a literal {.Manriquez
However neither traditional egrep nor GNU egrep support \d and that's why this does not work - not because of the {. Though it'd be useful to keep the { thing in mind if you ever have to be compatible with traditional egrep.Grecian

© 2022 - 2024 — McMap. All rights reserved.