Is there a quick way to find every match of a regular expression in Ruby? I've looked through the Regex object in the Ruby STL and searched on Google to no avail.
Using scan
should do the trick:
string.scan(/regex/)
class Regex \n def scan(string) \n string.scan(self) \n end \n end
–
Dulsea /(?=(...))/
. –
Okwu scan
does not support back-referencing in the regex (unlike match
) –
Epochal /(?=(...))/.flatten
–
Verditer str = "a1ab2cd3d"
and we wish to find all digits that are preceded and followed by the same letter. We could use the regex r = /(?<=(\p{Alpha}))\d(?=\1)/
. Then str.scan(r) #=> [["a"], ["d"]]
, which is not what is wanted but understandable because of the way scan
treats capture groups. We can, however, obtain the desired result as follows: str.gsub(r).to_a #=> ["1", "3"]
. My point is that scan
is not always the solution. –
Overact To find all the matching strings, use String's scan
method.
str = "A 54mpl3 string w1th 7 numb3rs scatter36 ar0und"
str.scan(/\d+/)
#=> ["54", "3", "1", "7", "3", "36", "0"]
If you want, MatchData
, which is the type of the object returned by the Regexp match
method, use:
str.to_enum(:scan, /\d+/).map { Regexp.last_match }
#=> [#<MatchData "54">, #<MatchData "3">, #<MatchData "1">, #<MatchData "7">, #<MatchData "3">, #<MatchData "36">, #<MatchData "0">]
The benefit of using MatchData
is that you can use methods like offset
:
match_datas = str.to_enum(:scan, /\d+/).map { Regexp.last_match }
match_datas[0].offset(0)
#=> [2, 4]
match_datas[1].offset(0)
#=> [7, 8]
See these questions if you'd like to know more:
- "How do I get the match data for all occurrences of a Ruby regular expression in a string?"
- "Ruby regular expression matching enumerator with named capture support"
- "How to find out the starting point for each match in ruby"
Reading about special variables $&
, $'
, $1
, $2
in Ruby will be helpful too.
if you have a regexp with groups:
str="A 54mpl3 string w1th 7 numbers scatter3r ar0und"
re=/(\d+)[m-t]/
you can use String's scan
method to find matching groups:
str.scan re
#> [["54"], ["1"], ["3"]]
To find the matching pattern:
str.to_enum(:scan,re).map {$&}
#> ["54m", "1t", "3r"]
Or the solution to have the complete matchdata:
str.to_enum(:scan,re).map{Regexp.last_match}
#> [#<MatchData "54m" 1:"54">, #<MatchData "1t" 1:"1">, #<MatchData "3r" 1:"3">]
str.to_enum(:scan,re).map {$~}
#> [#<MatchData "54m" 1:"54">, #<MatchData "1t" 1:"1">, #<MatchData "3r" 1:"3">]
str.scan(/\d+[m-t]/) # => ["54m", "1t", "3r"]
is more idiomatic than str.to_enum(:scan,re).map {$&}
–
Valenta /(\d+)[m-t]/
not /\d+[m-t]/
To write: re = /(\d+)[m-t]/; str.scan(re)
is same str.scan(/(\d+)[mt]/)
but I get #> [["" 54 "], [" 1 "], [" 3 "]]
and not "54m", "1t", "3r"]
The question was: if I have a regular expression with a group and want to capture all the patterns without changing the regular expression (leaving the group), how can I do it? In this sense, a possible solution, albeit a little cryptic and difficult to read, was: str.to_enum(:scan,re).map {$&}
–
Rumilly You can use string.scan(your_regex).flatten
. If your regex contains groups, it will return in a single plain array.
string = "A 54mpl3 string w1th 7 numbers scatter3r ar0und"
your_regex = /(\d+)[m-t]/
string.scan(your_regex).flatten
=> ["54", "1", "3"]
Regex can be a named group as well.
string = 'group_photo.jpg'
regex = /\A(?<name>.*)\.(?<ext>.*)\z/
string.scan(regex).flatten
You can also use gsub
, it's just one more way if you want MatchData.
str.gsub(/\d/).map{ Regexp.last_match }
your_regex = /(\d+)[m-t]/
and you won't need to use flatten
. Your final example uses last_match
which in this case is probably safe, but is a global and could possibly be overwritten if any regex was matched prior to calling last_match
. Instead it's probably safer to use string.match(regex).captures # => ["group_photo", "jpg"]
or string.scan(/\d+/) # => ["54", "3", "1", "7", "3", "0"]
as shown in other answers, depending on the pattern and needs. –
Valenta If you have capture groups ()
inside the regex for other purposes, the proposed solutions with String#scan
and String#match
are problematic:
String#scan
only get what is inside the capture groups;String#match
only get the first match, rejecting all the others;String#matches
(proposed function) get all the matches.
On this case, we need a solution to match the regex without considering the capture groups.
String#matches
With the Refinements you can monkey patch the String
class, implement the String#matches
and this method will be available inside the scope of the class that is using the refinement. It is an incredible way to Monkey Patch classes on Ruby.
Setup
/lib/refinements/string_matches.rb
# This module add a String refinement to enable multiple String#match()s
# 1. `String#scan` only get what is inside the capture groups (inside the parens)
# 2. `String#match` only get the first match
# 3. `String#matches` (proposed function) get all the matches
module StringMatches
refine String do
def matches(regex)
scan(/(?<matching>#{regex})/).flatten
end
end
end
Used: named capture groups
Usage
rails c
> require 'refinements/string_matches'
> using StringMatches
> 'function(1, 2, 3) + function(4, 5, 6)'.matches(/function\((\d), (\d), (\d)\)/)
=> ["function(1, 2, 3)", "function(4, 5, 6)"]
> 'function(1, 2, 3) + function(4, 5, 6)'.scan(/function\((\d), (\d), (\d)\)/)
=> [["1", "2", "3"], ["4", "5", "6"]]
> 'function(1, 2, 3) + function(4, 5, 6)'.match(/function\((\d), (\d), (\d)\)/)[0]
=> "function(1, 2, 3)"
Return an array of MatchData
objects
#scan
is very limited--only returns a simple array of strings!
Far more powerful/flexible for us to get an array of MatchData
objects.
I'll provide two approaches (using same logic), one using a PORO and one using a monkey patch:
PORO:
class MatchAll
def initialize(string, pattern)
raise ArgumentError, 'must pass a String' unless string.is_a?(String)
raise ArgumentError, 'must pass a Regexp pattern' unless pattern.is_a?(Regexp)
@string = string
@pattern = pattern
@matches = []
end
def match_all
recursive_match
end
private
def recursive_match(prev_match = nil)
index = prev_match.nil? ? 0 : prev_match.offset(0)[1]
matching_item = @string.match(@pattern, index)
return @matches unless matching_item.present?
@matches << matching_item
recursive_match(matching_item)
end
end
USAGE:
test_string = 'a green frog jumped on a green lilypad'
MatchAll.new(test_string, /green/).match_all
=> [#<MatchData "green", #<MatchData "green"]
Monkey patch
I don't typically condone monkey-patching, but in this case:
- we're doing it the right way by "quarantining" our patch into its own module
- I prefer this approach because
'string'.match_all(/pattern/)
is more intuitive (and looks a lot nicer) thanMatchAll.new('string', /pattern/).match_all
module RubyCoreExtensions
module String
module MatchAll
def match_all(pattern)
raise ArgumentError, 'must pass a Regexp pattern' unless pattern.is_a?(Regexp)
recursive_match(pattern)
end
private
def recursive_match(pattern, matches = [], prev_match = nil)
index = prev_match.nil? ? 0 : prev_match.offset(0)[1]
matching_item = self.match(pattern, index)
return matches unless matching_item.present?
matches << matching_item
recursive_match(pattern, matches, matching_item)
end
end
end
end
I recommend creating a new file and putting the patch (assuming you're using Rails) there /lib/ruby_core_extensions/string/match_all.rb
To use our patch we need to make it available:
# within application.rb
require './lib/ruby_core_extensions/string/match_all.rb'
Then be sure to include it in the String
class (you could put this wherever you want; but for example, right under the require statement we just wrote above. After you include
it once, it will be available everywhere, even outside the class where you included it).
String.include RubyCoreExtensions::String::MatchAll
USAGE: And now when you use #match_all
you get results like:
test_string = 'hello foo, what foo are you going to foo today?'
test_string.match_all /foo/
=> [#<MatchData "foo", #<MatchData "foo", #<MatchData "foo"]
test_string.match_all /hello/
=> [#<MatchData "hello"]
test_string.match_all /none/
=> []
I find this particularly useful when I want to match multiple occurrences, and then get useful information about each occurrence, such as which index the occurrence starts and ends (e.g. match.offset(0) => [first_index, last_index]
)
String#scan
? –
Refection © 2022 - 2024 — McMap. All rights reserved.