clojure regex named groups
Asked Answered
H

3

10

I have a problem with re-find in clojure. Actually I'm doing

(re-find #"-(?<foo>\d+)-(?<bar>\d+)-(?<toto>\d+)-\w{1,4}$" 
"http://www.bar.com/f-a-c-a-a3-spok-ser-2-phse-2-1-6-ti-105-cv-9-31289-824-gu" )

My result is fine:

["-9-31289-824-gt" "9" "31289" "824"]

But I would prefer to have a hash looking like:

{:foo "9" :bar "31289" :toto "824"}

I have understood that java.util.regex.Matcher/group is doing something like that but I haven't been able to use it correctly. Thanks for your help

Humpage answered 17/9, 2014 at 13:41 Comment(0)
M
11

JDK didn't support named capture groups until JDK 7.

Here's announcement on oracle blog

Quote:

This convenient feature has been missed in Java RegEx for years, now it finally got itself in JDK7 b50.

Since clojure supports JDK >= 6 you're out of luck if you're looking for something native (clojure uses java regex Patterns and Matchers behind the scenes).

You can always use external libraries, like named-re. That one gives you exactly what you need.

Calling

(require 'named-re.core)
(re-find #"-(?<foo>\d+)-(?<bar>\d+)-(?<toto>\d+)-\w{1,4}$" 
     "http://www.bar.com/f-a-c-a-a3-spok-ser-2-phse-2-1-6-ti-105-cv-9-31289-824-gu" )

will return

{:toto "824", :bar "31289", :foo "9", :0 "-9-31289-824-gu"}
Mother answered 17/9, 2014 at 14:8 Comment(2)
I was confused by this example until I checked the project source. A big caveat about using this is that it modifies the Clojure reader, and changes the rules for evaluating #"", so that it no longer returns a java.util.regex.Pattern object.Rochellrochella
@Rochellrochella yeah. kind of hard to implement that functionality while still returning a Pattern object, since it doesn't support it. But to be honest it was the first thing I encountered ;)Mother
T
13

The java regex library Clojure is built against (Java 1.6) doesn't support regex named capturing groups.

However, you can use Clojure's zipmap function to combine name keys and re-find's captured groups into a map. Groups that aren't matched will get a nil value for the name key.

(zipmap [:foo :bar :toto]
        (rest (re-find #"-(\d+)-(\d+)-(\d+)-\w{1,4}$" 
                        "http://www.bar.com/f-a-c-a-a3-spok-ser-2-phse-2-1-6-ti-105-cv-9-31289-824-gu")))

=> {:foo "9" :bar "31289" :toto "824"}
Touchmenot answered 17/9, 2014 at 14:10 Comment(2)
Consider writing a regex to extract the named groups from a pattern and match up the groups using zipmap to give Clojure named patterns :)Crabbed
Note that this approach breaks down when the regex is modified such that a new (anonymous) group is added. This is one of the use cases that named groups are designed to solve - reducing the fragility inherent in anonymous groups. <rant warning> It's a great shame that Clojure seems to have mostly stalled out with JDK 1.6 capabilities, especially as it has only supported JDK 1.8+ for some time. Named regex support was added in JDK 1.7, yet Clojure still lacks direct support for them.Indomitable
M
11

JDK didn't support named capture groups until JDK 7.

Here's announcement on oracle blog

Quote:

This convenient feature has been missed in Java RegEx for years, now it finally got itself in JDK7 b50.

Since clojure supports JDK >= 6 you're out of luck if you're looking for something native (clojure uses java regex Patterns and Matchers behind the scenes).

You can always use external libraries, like named-re. That one gives you exactly what you need.

Calling

(require 'named-re.core)
(re-find #"-(?<foo>\d+)-(?<bar>\d+)-(?<toto>\d+)-\w{1,4}$" 
     "http://www.bar.com/f-a-c-a-a3-spok-ser-2-phse-2-1-6-ti-105-cv-9-31289-824-gu" )

will return

{:toto "824", :bar "31289", :foo "9", :0 "-9-31289-824-gu"}
Mother answered 17/9, 2014 at 14:8 Comment(2)
I was confused by this example until I checked the project source. A big caveat about using this is that it modifies the Clojure reader, and changes the rules for evaluating #"", so that it no longer returns a java.util.regex.Pattern object.Rochellrochella
@Rochellrochella yeah. kind of hard to implement that functionality while still returning a Pattern object, since it doesn't support it. But to be honest it was the first thing I encountered ;)Mother
T
0

This was long time a limitation in Java. There was no API for getting the list of named capture groups. See this question.

When working with Java versions that do not support this feature, all you can do is using external libraries.

If you don't need a map, you can use the solution that is described in the Clojure Documentation. In your case the solution can be similar to this:

(let [matcher (re-matcher #"-(?<foo>\d+)-(?<bar>\d+)-(?<toto>\d+)-\w{1,4}$"
                          "http://www.bar.com/f-a-c-a-a3-spok-ser-2-phse-2-1-6-ti-105-cv-9-31289-824-gu")]
  (re-find matcher)
  (re-groups matcher)
  (.group matcher "foo"))

Although this solution is not perfect (The matcher is a mutable Java object), it works.

As written in the other answer, that I linked, since Java 20, which was released on 21st of March 2023, there is a solution.

(let [matcher (re-matcher #"-(?<foo>\d+)-(?<bar>\d+)-(?<toto>\d+)-\w{1,4}$"
                          "http://www.bar.com/f-a-c-a-a3-spok-ser-2-phse-2-1-6-ti-105-cv-9-31289-824-gu")]
  (re-find matcher)
  (re-groups matcher)
  (.namedGroups matcher))

This gives you what you want.

I had to manually install the JDK 20. The JRE did not work for me. But after installing the JDK 20 Clojure takes it and it works for me. There was no configuration needed.

With ClojureScript obviously this does not work, at all.

Tympanist answered 21/10, 2023 at 18:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.