Tcl regular expressions
Asked Answered
P

2

2
set d(aa1) 1 
set d(aa2) 1                                                                                                                   
set d(aa3) 1
set d(aa4) 1
set d(aa5) 1
set d(aa6) 1
set d(aa7) 1
set d(aa8) 1
set d(aa9) 1
set d(aa10) 1
set d(aa11) 1
set regexp "a*\[1-9\]"
set res [array names d -glob $regexp]
puts "res = $res"

In this case, the result is:

res = aa11 aa6 aa2 aa7 aa3 aa8 aa4 aa9 aa5 aa1

But when I change the regexp from a*\[1-9\] to a*\[1-10\], the result becomes:

res = aa11 aa10 aa1
Penninite answered 20/6, 2014 at 11:1 Comment(1)
Globs aren't regexps; they're a more restricted language that is easier to write but not as powerful.Vento
S
3

You have an error in your character class.

  • [1-10] does not mean a digit from 1 to 10
  • It means 1-1, which is a character ranging from 1 to 1 (i.e., simply a 1), or a 0. This explains your output.
  • to express a digit from 1 to 10, use this: (?:10?|[2-9]) (as one of several ways to do it.
  • therefore your regex becomes a*(?:10?|[2-9])
  • note that if your engine does not allow non-capturing group, you need to remove the ?:, for: a*(?:10?|[2-9])
Solander answered 20/6, 2014 at 11:11 Comment(0)
C
3

You need to be sure what you're trying to match because glob style matching and regexp style matching are different in many aspects.

From the docs, glob has the following:

  • * matches any sequence of characters in string, including a null string.
  • ? matches any single character in string.
  • [chars] matches any character in the set given by chars. If a sequence of the form x-y appears in chars, then any character between x and y, inclusive, will match. When used with -nocase, the end points of the range are converted to lower case first. Whereas {[A-z]} matches _ when matching case-sensitively (since _ falls between the Z and a), with -nocase this is considered like {[A-Za-z]} (and probably what was meant in the first place).
  • \x matches the single character x. This provides a way of avoiding the special interpretation of the characters *?[]\ in pattern.

Since you are using glob style matching, your current expression (a*\[1-9\]) matches an a, followed by any characters and any one of 1 through 9 (meaning it would also match something like abcjdne1).

If you want to match at least one a followed by numbers from 1 through 10, you will need something like this, using the -regexp mode:

set regexp {a+(?:[1-9]|10)}
set res [array names d -regexp $regexp]

Now, this regexp is I believe the more natural one for a beginner ((?:[1-9]|10) meaning either 1 through 9, or 10, but you can use the form that zx81 suggested with (?:10?|[2-9]) meaning 1, with an optional 0 for 10, or 2 through 9).

+ means that a must appear at least once for the array name to match.

If you now need to match the full names, you will need to use anchors:

^a+(?:[1-9]|10)$

Note: You cannot use glob matching if you want to match at least one a followed by digits, and alternation (the pipe used |) and quantifiers (? or + or *) the way they behave in regexp are not supported by glob matching.

One last thing, use braces to avoid escaping your pattern (unless you have a variable or running a function in your pattern and can't do otherwise).

Comedic answered 20/6, 2014 at 11:45 Comment(1)
You might need to anchor those REs; Tcl doesn't ever anchor them by default…Vento

© 2022 - 2024 — McMap. All rights reserved.