Emacs Lisp: matching a repeated pattern in a compact manner?
Asked Answered
Q

3

6

Let's suppose I have an RGB string (format: #<2 hex digits><2 hex digits><2 hex digits>) like this:

"#00BBCC"

and I'd like to match and capture its <2 hex digits> elements in a more compact manner than by using the obvious:

"#\\([[:xdigit:]\\{2\\}]\\)\\([[:xdigit:]\\{2\\}]\\)\\([[:xdigit:]\\{2\\}]\\)"

I've tried:

"#\\([[:xdigit:]]\\{2\\}\\)\\{3\\}"

and:

"#\\(\\([[:xdigit:]]\\{2\\}\\)\\{3\\}\\)"

But the most they matched has been the first <2 hex digits> element.

Any idea? Thank you.

Quintan answered 1/2, 2012 at 23:41 Comment(7)
Why do you want to do that? For readability?Allin
Just curiosity: I wonder whether regexps can match repeated patterns.Quintan
The problem is that you wouldn't be able to refer to 3 different groups then, right? So how would you extract the R,G,B values separately?Allin
I'm not sure I understand you. I'd just like to know whether regexps can match and capture repeating patterns. The answer could also just be: no, they can't.Quintan
Sorry for being unclear. What I mean is that if you want to capture the R,G,B values each in a separate group, you cannot use repeating patterns because you'll end up with only one group, right? If one big group is okay for you, Sean's answer is what you're looking for.Allin
@Allin So basically you answered (e.g. it can't be done). but comment can't be voted as accepted answers.Quintan
Ok, I summed up the discussion so far in a new answer. Cheers.Allin
A
3

If you want to capture R,G,B in different subgroups, so that you can extract them using (match-string group), you need to have three different parentheses groups in your regexp at some point.

\(...\)\(...\)\(...\)

Otherwise, if you use a repeat pattern such as

\(...\)\{3\}

you have only one group, and after the match it will only contain the value of the last match. So, say, if you have something along the lines of

\([[:xdigit:]]\{2\}\)\{3\}

it will match a string like "A0B1C2", but (match-string 1) will only contain the contents of the last match, i.e. "C2", because the regexp defines only one group.

Thus you basically have two options: use a compact regexp, such as your third one , but do some more substring processing to extract the hex number as Sean suggests, or use a more complex regexp, such as your first one, which lets you access the three sub-matches more conveniently.

If you're mostly worried about code readability, you could always do something like

(let ((hex2 "\\([[:xdigit:]]\\{2\\}\\)"))
  (concat "#" hex2 hex2 hex2))

to construct such a more complex regexp in a somewhat less redundant way, as per tripleee's suggestion.

Allin answered 3/2, 2012 at 21:31 Comment(0)
I
6

You can make the regexp shorter at the expense of some extra code:

(defun match-hex-digits (str)
  (when (string-match "#[[:xdigit:]]\\{6\\}" str)
    (list (substring (match-string 0 str) 1 3)
          (substring (match-string 0 str) 3 5)
          (substring (match-string 0 str) 5 7))))
Iconostasis answered 2/2, 2012 at 1:3 Comment(3)
Nice alternative idea, I'm upvoting this. The extra code could be refactored into a function.Quintan
Why not then (let (xx "\\([[:xdigit:]]\\{2\\}\\)") (string-match (concat "#" xx xx xx) str)) )?Hat
@Hat That is what I'm using in my code. Actually, now that I think about it, it could be refactored in a more generic function than the solution suggested by Sean, because then patterns could match strings of varied lengths.Quintan
A
3

If you want to capture R,G,B in different subgroups, so that you can extract them using (match-string group), you need to have three different parentheses groups in your regexp at some point.

\(...\)\(...\)\(...\)

Otherwise, if you use a repeat pattern such as

\(...\)\{3\}

you have only one group, and after the match it will only contain the value of the last match. So, say, if you have something along the lines of

\([[:xdigit:]]\{2\}\)\{3\}

it will match a string like "A0B1C2", but (match-string 1) will only contain the contents of the last match, i.e. "C2", because the regexp defines only one group.

Thus you basically have two options: use a compact regexp, such as your third one , but do some more substring processing to extract the hex number as Sean suggests, or use a more complex regexp, such as your first one, which lets you access the three sub-matches more conveniently.

If you're mostly worried about code readability, you could always do something like

(let ((hex2 "\\([[:xdigit:]]\\{2\\}\\)"))
  (concat "#" hex2 hex2 hex2))

to construct such a more complex regexp in a somewhat less redundant way, as per tripleee's suggestion.

Allin answered 3/2, 2012 at 21:31 Comment(0)
I
0

Several years after my original response, Emacs has a much nicer way to do this, with the pcase macro.

(defun match-hex-digits (str)
  (pcase str
    ((rx "#" (let r (= 2 xdigit)) (let g (= 2 xdigit)) (let b (= 2 xdigit)))
     (list r g b))))
Iconostasis answered 6/2, 2023 at 19:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.