Scala capture group using regex
Asked Answered
O

5

89

Let's say I have this code:

val string = "one493two483three"
val pattern = """two(\d+)three""".r
pattern.findAllIn(string).foreach(println)

I expected findAllIn to only return 483, but instead, it returned two483three. I know I could use unapply to extract only that part, but I'd have to have a pattern for the entire string, something like:

 val pattern = """one.*two(\d+)three""".r
 val pattern(aMatch) = string
 println(aMatch) // prints 483

Is there another way of achieving this, without using the classes from java.util directly, and without using unapply?

Oversweet answered 16/6, 2010 at 5:29 Comment(0)
N
120

Here's an example of how you can access group(1) of each match:

val string = "one493two483three"
val pattern = """two(\d+)three""".r
pattern.findAllIn(string).matchData foreach {
   m => println(m.group(1))
}

This prints "483" (as seen on ideone.com).


The lookaround option

Depending on the complexity of the pattern, you can also use lookarounds to only match the portion you want. It'll look something like this:

val string = "one493two483three"
val pattern = """(?<=two)\d+(?=three)""".r
pattern.findAllIn(string).foreach(println)

The above also prints "483" (as seen on ideone.com).

References

Nunley answered 16/6, 2010 at 6:51 Comment(1)
You can also use pattern.findAllMatchIn(string).foreach... insteadHardi
H
52
val string = "one493two483three"
val pattern = """.*two(\d+)three.*""".r

string match {
  case pattern(a483) => println(a483) //matched group(1) assigned to variable a483
  case _ => // no match
}
Heriot answered 23/11, 2015 at 9:13 Comment(4)
This is the simplest way by far. You use the regex object ("pattern") in a match/case and extracts the group into the variable a483. The problem withthis case is that the pattern should have wildcards on both sides: val pattern = """.*two(\d+)three.*""".rCranmer
Yes. I don't think the above is immediately clear, but once you understand that it's assigning the digit matching group to the variable 'a483', then it makes more sense. Perhaps rewrite in a clearer fashion ?Trixi
This is the scala way with regex. For people don't understand the magic behind this answer, try search "scala regex extractor" or "scala unapply regex" etc.Haploid
the semantics is unclear. is this the first, last, or a random match from the string?Rescind
C
23

Starting Scala 2.13, as an alternative to regex solutions, it's also possible to pattern match a String by unapplying a string interpolator:

"one493two483three" match { case s"${x}two${y}three" => y }
// String = "483"

Or even:

val s"${x}two${y}three" = "one493two483three"
// x: String = one493
// y: String = 483

If you expect non matching input, you can add a default pattern guard:

"one493deux483three" match {
  case s"${x}two${y}three" => y
  case _                   => "no match"
}
// String = "no match"
Chema answered 26/6, 2019 at 21:22 Comment(0)
R
16

You want to look at group(1), you're currently looking at group(0), which is "the entire matched string".

See this regex tutorial.

Repine answered 16/6, 2010 at 5:41 Comment(1)
can you illustrate on the input I provided? I tried to call group(1) on what's returned by findAllIn but I get an IllegalStateException.Oversweet
S
5
def extractFileNameFromHttpFilePathExpression(expr: String) = {
//define regex
val regex = "http4.*\\/(\\w+.(xlsx|xls|zip))$".r
// findFirstMatchIn/findAllMatchIn returns Option[Match] and Match has methods to access capture groups.
regex.findFirstMatchIn(expr) match {
  case Some(i) => i.group(1)
  case None => "regex_error"
}
}
extractFileNameFromHttpFilePathExpression(
    "http4://testing.bbmkl.com/document/sth1234.zip")
Steamtight answered 13/7, 2018 at 9:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.