How do I extract all matches with a Tcl regex?
Asked Answered
D

2

7

hi everybody i want solution for this regular expression, my problem is Extract all the hex numbers in the form H'xxxx, i used this regexp but i didn't get all hexvalues only i get one number, how to get whole hex number from this string

set hex "V5CCH,IA=H'22EF&H'2354&H'4BD4&H'4C4B&H'4D52&H'4DC9"
set res [regexp -all {H'([0-9A-Z]+)&} $hex match hexValues]
puts "$res H$hexValues"

i am getting output is 5 H4D52

Dialectologist answered 30/7, 2010 at 7:30 Comment(4)
Does the single quote need to be escaped, i wonder?: H\'([0-9A-Z]+)\&Seddon
If you are dealing with hex numbers, [0-9A-F] should suffice.Diu
@Zabba, a single quote has no special meaning in a regex, or even in Tcl generally.Volney
@relet, this works too: [[:xdigit:]] -- tcl.tk/man/tcl8.5/TclCmd/re_syntax.htm#M31Volney
O
30

On -all -inline

From the documentation:

-all : Causes the regular expression to be matched as many times as possible in the string, returning the total number of matches found. If this is specified with match variables, they will contain information for the last match only.

-inline : Causes the command to return, as a list, the data that would otherwise be placed in match variables. When using -inline, match variables may not be specified. If used with -all, the list will be concatenated at each iteration, such that a flat list is always returned. For each match iteration, the command will append the overall match data, plus one element for each subexpression in the regular expression.

Thus to return all matches -- including captures by groups -- as a flat list in Tcl, you can write:

set matchTuples [regexp -all -inline $pattern $text]

If the pattern has groups 0…N-1, then each match is an N-tuple in the list. Thus the number of actual matches is the length of this list divided by N. You can then use foreach with N variables to iterate over each tuple of the list.

If N = 2 for example, you have:

set numMatches [expr {[llength $matchTuples] / 2}]

foreach {group0 group1} $matchTuples {
   ...
}

References


Sample code

Here's a solution for this specific problem, annotated with output as comments (see also on ideone.com):

set text "V5CCH,IA=H'22EF&H'2354&H'4BD4&H'4C4B&H'4D52&H'4DC9"
set pattern {H'([0-9A-F]{4})}

set matchTuples [regexp -all -inline $pattern $text]
     
puts $matchTuples
# H'22EF 22EF H'2354 2354 H'4BD4 4BD4 H'4C4B 4C4B H'4D52 4D52 H'4DC9 4DC9
# \_________/ \_________/ \_________/ \_________/ \_________/ \_________/
#  1st match   2nd match   3rd match   4th match   5th match   6th match
     
puts [llength $matchTuples]
# 12
     
set numMatches [expr {[llength $matchTuples] / 2}]
puts $numMatches
# 6

foreach {whole hex} $matchTuples {
   puts $hex
}
# 22EF
# 2354
# 4BD4
# 4C4B
# 4D52
# 4DC9

### On the pattern

Note that I've changed the pattern slightly:

* Instead of `[0-9A-Z]+`, e.g. `[0-9A-F]{4}` is more specific for matching exactly 4 hexadecimal digits
* If you insist on matching the `&`, then the last hex string (`H'4DC9` in your input) can not be matched
   * This explains why you get `4D52` in the original script, because that's the last match with `&`
   * Maybe get rid of the `&`, or use `(&|$)` instead, i.e. a `&` or the end of the string `$`.

### References

* [regular-expressions.info/Finite Repetition](http://www.regular-expressions.info/repeat.html), [Anchors](http://www.regular-expressions.info/anchors.html)
Oneal answered 30/7, 2010 at 8:50 Comment(6)
thanks, i don't know count of hexa decimal in that situation how to evaluate?Dialectologist
@polygen i tried with array manipulation the array size display only once that one last element is stored in that one, could you please provide example that one.Dialectologist
@polygene: I suggest you edit that into your answer, as it's the idiomatic method.Rascon
@polygene: +1: Looks pretty good. If you're writing more Tcl, remember to put braces round expressions (unless you really know what you're doing) because that lets them be compiled and avoids problems that are similar in flavor to SQL injection attacks; braced expressions are hazard-free and fast.Rascon
@Donal fellows how to use -about option in regexpDialectologist
@Malli: regexp -about $RE returns a two-item list describing features of the RE. One item is the number of capturing groups, the other is a list of features (actually a dump of the RE's internal flags field bits). If you want more info, ask a question properly!Rascon
T
2

I'm not Tclish, but I think you need to use both the -inline and -all options:

regexp -all -inline {H'([0-9A-Z]+)&} $string

EDIT: Here it is again, this time with a corrected regex (see the comments):

regexp -all -inline {H'[0-9A-F]+&} $string
Tristichous answered 30/7, 2010 at 8:32 Comment(2)
but -inline output gives similar to regexp -all {H'(0-9A-Z]+)&} $string match puts $match we can also write this one. the above regexp is produce H'22EF& 22EF H'2354& 2354 H'4BD4& 4BD4 H'4C4B& 4C4B H'4D52& 4D52 i don't want this output i need only that hexadecmial valuesDialectologist
I was only demonstrating the use of -all -inline, but @poly is right: you need to get rid of those parentheses. They aren't needed for grouping, and they're adding a lot of unwanted substrings to the results array.Tristichous

© 2022 - 2024 — McMap. All rights reserved.