Swift 3 - How do I extract captured groups in regular expressions?
Asked Answered
C

5

25

I am using Swift 3 and trying to access captured groups.

let regexp = "((ALREADY PAID | NOT ALR | PROVIDER MAY | READY | MAY BILL | BILL YOU | PAID)((.|\\n)*))(( \\d+)(\\.+|-+)(\\d\\d))"

// check if some substring is in the recognized text
if let range = stringText.range(of:regexp, options: .regularExpression) {
    let result = tesseract.recognizedText.substring(with:range)
}

I want to be able to extract out the last two numbers captured (\d\d) so if the text was: ALREADY PAID asfasdfadsfasdf 39.15, it would extract 15. Here is a regex builder that shows what I want. Normally, I would be able to do $8 to get the 8th group that was extracted but I don't know how to do that in Swift 3.

http://regexr.com/3fh1e

Craftsman answered 14/3, 2017 at 15:22 Comment(2)
Never use (.|\\n)*, just use .* and add a (?s) at the pattern start (or use the corresponding flag).Syncytium
Use rangeAt(...). Examples here: https://mcmap.net/q/537748/-optional-capture-groups-with-nsregularexpressions-in-swift and here: https://mcmap.net/q/23817/-swift-extract-regex-matches and here: https://mcmap.net/q/537749/-how-to-parse-a-string-of-hex-into-ascii-equivalent-in-swift-2Medallist
Q
38

but I don't know how to do that in Swift 3.

When you receive a match from NSRegularExpression, what you get is an NSTextCheckingResult. You call rangeAt to get a specific capture group.

Example:

let s = "hey ho ha"
let pattern = "(h).*(h).*(h)"
// our goal is capture group 3, "h" in "ha"
let regex = try! NSRegularExpression(pattern: pattern)
let result = regex.matches(in:s, range:NSMakeRange(0, s.utf16.count))
let third = result[0].rangeAt(3) // <-- !!
third.location // 7
third.length // 1
Quinby answered 14/3, 2017 at 15:31 Comment(4)
third is an NSRange, how do you convert it to the Range type required to use it in s.substring? or otherwise, where is the "h" result here? Is the only way to convert s to NSString? Is there a simpler way to use regex? This looks excessive.Fennessy
@Fennessy NSRange to Range conversion for strings is a new Swift 4 feature.Quinby
@Fennessy It's annoying that regular expressions are a Cocoa feature, not a Swift feature, but that's how it is. And Cocoa thinks in NSString and NSRange, obviously. But in Swift 4 Range and NSRange are mutually coercible even for strings, so it's really no problem.Quinby
@Efren: With respect to NSRange/Range conversion in connection with NSRegularExpression, this Q&A might be of interest.Medallist
S
43

Swift 4, Swift 5

extension String {
    func groups(for regexPattern: String) -> [[String]] {
    do {
        let text = self
        let regex = try NSRegularExpression(pattern: regexPattern)
        let matches = regex.matches(in: text,
                                    range: NSRange(text.startIndex..., in: text))
        return matches.map { match in
            return (0..<match.numberOfRanges).map {
                let rangeBounds = match.range(at: $0)
                guard let range = Range(rangeBounds, in: text) else {
                    return ""
                }
                return String(text[range])
            }
        }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}
}

example:

let res = "1my 2own 3string".groups(for:"(([0-9]+)[a-z]+) ")

(lldb) po res ▿ 2 elements
▿ 0 : 3 elements

- 0 : "1my "

- 1 : "1my"

- 2 : "1"   

▿ 1 : 3 elements

- 0 : "2own "

- 1 : "2own"

- 2 : "2"
Soberminded answered 6/12, 2018 at 13:0 Comment(1)
Wow, nice to see the "modern language" Swift doing something that needs 2 lines of code in 25 years old Javascript 🤦🏻‍♂️Blakeley
Q
38

but I don't know how to do that in Swift 3.

When you receive a match from NSRegularExpression, what you get is an NSTextCheckingResult. You call rangeAt to get a specific capture group.

Example:

let s = "hey ho ha"
let pattern = "(h).*(h).*(h)"
// our goal is capture group 3, "h" in "ha"
let regex = try! NSRegularExpression(pattern: pattern)
let result = regex.matches(in:s, range:NSMakeRange(0, s.utf16.count))
let third = result[0].rangeAt(3) // <-- !!
third.location // 7
third.length // 1
Quinby answered 14/3, 2017 at 15:31 Comment(4)
third is an NSRange, how do you convert it to the Range type required to use it in s.substring? or otherwise, where is the "h" result here? Is the only way to convert s to NSString? Is there a simpler way to use regex? This looks excessive.Fennessy
@Fennessy NSRange to Range conversion for strings is a new Swift 4 feature.Quinby
@Fennessy It's annoying that regular expressions are a Cocoa feature, not a Swift feature, but that's how it is. And Cocoa thinks in NSString and NSRange, obviously. But in Swift 4 Range and NSRange are mutually coercible even for strings, so it's really no problem.Quinby
@Efren: With respect to NSRange/Range conversion in connection with NSRegularExpression, this Q&A might be of interest.Medallist
S
8

As ever, a simple extension seems to be the way around swift's bizarre overcomplication...

extension NSTextCheckingResult {
    func groups(testedString:String) -> [String] {
        var groups = [String]()
        for i in  0 ..< self.numberOfRanges
        {
            let group = String(testedString[Range(self.range(at: i), in: testedString)!])
            groups.append(group)
        }
        return groups
    }
}

Use it like this:

if let match = myRegex.firstMatch(in: someString, range: NSMakeRange(0, someString.count)) {
     let groups = match.groups(testedString: someString)
     //... do something with groups
}
Stemson answered 17/7, 2018 at 15:16 Comment(0)
T
0

A slightly altered version based on @Vyacheslav's answer with different error handling approach:

enum ParsingError: Error {
    // You can pass more info here with parameter(s) if you want, e.g. `case let invalidRange(originalString, failedAtRange)`
    case invalidRange 
}

protocol StringUtilityRequired {
    var stringUtility: StringUtility { get }
}

extension StringUtilityRequired {
    var stringUtility: StringUtility { StringUtility() }
}

enum StringUtility {
    func groups(_ str: String, pattern: String) throws -> [[String]] {
        let regex = try NSRegularExpression(pattern: pattern)
        let matches = regex.matches(in: str, range: NSRange(str.startIndex..., in: str))
        return try matches.map { match throws in
            return try (0 ..< match.numberOfRanges).map { range throws in
                let rangeBounds = match.range(at: range)
                guard let range = Range(rangeBounds, in: str) else {
                    throw ParsingError.invalidRange
                }
                return String(str[range])
            }
        }
    }

    // This component is stateless; it doesn't have any side effect
    case pure
    init() { self = .pure }
}

Usage:

struct MyComponent: StringUtilityRequired {
    func myFunc() throws {
        let groups = try stringUtility.groups("Test 123", pattern: "(.+)\s(.+)")
        print(groups)
    }
}
Transeunt answered 29/7, 2020 at 16:47 Comment(0)
T
0

If you are stuck with <iOS16 ...

This pasta may save someone some time.

This example specifically finds ALL the matching GROUPS in the simple case of JUST ONE group to find, which is a common use case.

Example,

input "blah blah height=3 blh height=13 height=7 blah"

regex height=(\d+)

result, the strings ["3", "13", "7"]

so ...

extension String {
    
    func collectEmAll(_ regex: String) -> [String] {
        let rx = try! NSRegularExpression(pattern: regex)
        let rr = rx.matches(in: self, range: NSMakeRange(0, self.utf16.count))
        return Array(rr).compactMap{ self.substring(with: $0.range(at: 1)) }
    }
    
    func substring(with nsrange: NSRange) -> String? {
        guard let range = Range(nsrange, in: self) else { return nil }
        return String(self[range])
    }
}

Don't forget in the idiotic .range#at system, 0 is the "whole thing" and 1 is what you typically want, hence "1" above.

test

print("blah height=3 blah height=13 height=7". collectEmAll("height=(\\d+)"))

Wherever possible use the new regex features in latest iOS.

Trilley answered 8/3 at 23:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.