How to use Swift literal regex expressions in switch case pattern statements?
Asked Answered
H

2

7

How to use Swift literal regex expressions in switch case pattern statements?

Based on the examples from WWDC 2022 presention slides, the following is expected to compile and run OK:

import Foundation
import RegexBuilder

switch "abc" {
    case /\w+/:
        print("matched!")
    default:
        print("not matched.")
}

However, the following error is produced:

Expression pattern of type Regex<Substring> cannot match values of type String

Can the switch case statement with a Swift regex literal expression be somehow modified to function OK? How would one use the new Swift 5.7 regex capabilties in the switch case pattern statement?

Highstepper answered 11/1, 2023 at 6:14 Comment(0)
H
8

From what I have found, the "matching with regexes in switch statement" feature has not been implemented, because people were arguing about what the exact semantic should be. In case such as

switch "---abc---" {
case /\w+/:
    print("foo")
default:
    print("bar")
}

which branch should the switch statement run? Should it count as a match only if the whole string matches the regex, or is it enough only for a substring of the switched string to match? In other words, is it wholeMatch or firstMatch? See more of the discussion here.

In the end, they were not able to come to a conclusion, and

The proposal has been accepted with modifications (the modification being to subset out ~= for now).

So the ~= operator was not added for Regex<Output>, so you cannot use it in a switch.

You can add it yourself if you want, if you can decide between the two semantics :) For example:

func ~=(regex: Regex<Substring>, str: String) -> Bool {
    // errors count as "not match"
    (try? regex.wholeMatch(in: str)) != nil
}
Horntail answered 11/1, 2023 at 7:6 Comment(4)
Through the years, I developed the "normally I'd use a regex like /^…$/ to match the full string" coding habit. So, firstMatch works very well with the patterns I have. Easy choice. Thanks.Highstepper
I am keeping your answer as accepted because it provided timely, correct and critical information. However, I ended up not using the ~= approach since Swift will likely define ~= in the future. I later figued out an alternate solution that appears to be a future-proof, expressive, and readable. (See my posted answer for details)Highstepper
Note that defining your own ~= will not create errors down the road, because overload resolution favors operations defined outside the standard library over those defined in the standard library. It could result in confusion for developers reading your source, however.Runesmith
You can replace the global ~= with one on your own custom wrapper type. See this really similar example, where I add a .substrings field which returns a SubstringMatchSource, which has an ~= operator which matches strings by their prefix. You can do similar similar for partial/whole matches of regexes.Optometrist
H
2

Can the switch case statement with a Swift regex literal expression be somehow modified to function OK?

Yes, a case let … where … pattern with a /regex/ literal can be used. This approach can also be implemented in a way that expressly avoids potential ~= future ambiguities relative to the standard library. This approach is an alternative to directly defining ~=.

Note that defining your own ~= will not create errors down the road, because overload resolution favors operations defined outside the standard library over those defined in the standard library. It could result in confusion for developers reading your source, however. - Stephen Canon comment -

Discussion…

Match-Part-Or-Whole Example - A fundamental approach where the original regex pattern /^…$/ is used to match a entire line:

extension String {
    func matchFirst(_ regex: Regex<Substring>) -> Bool {
        // errors count as "not match"
        (try? regex.firstMatch(in: self)) != nil
    }
}

switch "---abc---" {
    case let s where s.matchFirst(/^\w+$/):
        print("entire line contains alphanumerics: '\(s)'")
    case let s where s.matchFirst(/\w+/):
        print("alphanumerics found in string: '\(s)'")
    default:
        print("no alphanumerics found")
}

Whole-Match-Only Example - A "whole match only" regex approach where a partial match is not possible:

extension String {
    func matchWhole(_ regex: Regex<Substring>) -> Bool {
        // errors count as "not match"
        (try? regex.wholeMatch(in: self)) != nil
    }
}

switch "---abc---" {
    case let s where s.matchWhole(/\w+/):
        print("all alphanumerics: '\(s)'")
    //case partial match not available. whole or nothing.
    default:
        print("no match for /\\w+/")
}

I ended up using the "classic" Match-Part-Or-Whole Example approach instead of the Whole-Match-Only Example and func ~= approaches for the following reasons:

  • func ~= - could possibly be defined by Swift at some future time. Possible future confusion.
  • Whole-Match-Only Example - does not support both partial and full matches. Less expressive.
  • Match-Part-Or-Whole Example
    • leaves ~= undefined which allows for the possible future definition by Swift. Avoids possible future confusion.
    • does support both partial and full matches. More expressive.
    • ^…$ is expressly stated for a full line match. More readable.

Note: Extending String with both convenience wrappers, such as matchFirst and matchWhole, can allow for either approach to be choosen at the point of use. This approach provides the following benefits:

  • expressive
  • co-locates both choices in the point-of-use autocompletion list
  • avoids the conflict of one vs the other in the lower level extension
  • does not presume any interpretation for the not-yet-officially-defined ~=.
extension String {
    func matchFirst(_ regex: Regex<Substring>) -> Bool {
        // errors count as "not match"
        (try? regex.firstMatch(in: self)) != nil
    }

    func matchWhole(_ regex: Regex<Substring>) -> Bool {
        // errors count as "not match"
        (try? regex.wholeMatch(in: self)) != nil
    }
}

Historic Footnote

The ^ begin-anchor and $ end-anchor syntax has been part of Regular Expressions since the 1970s with qed and ed PDP-7 AT&T Bell Labs Unix editors.

QED Text Editor (1970 Bell Telephone Laboratories technical memorandum)

enter image description here

ed (see man ed or info ed on POSIX and Open Group compliant Unix-like systems)

enter image description here

See also man ed and info ed on modern BSD/Linux/Unix systems. It's still there.

The ^ begin-anchor and $ end-anchor syntax was also carried forward to other Regular Expression enabled software tools, such as sed, g/re/p global regular expression, Perl Compatible Regular Expressions (PCRE) library, and POSIX standard Basic Regular Syntax (BRE).

If /^.$/ pattern is implied and hidden for some compact code convenience then the REGEX expressive capability is reduced.

Seeing, reading, and writing with ^ begin-anchor and $ end-anchor syntax can be natural (and even expected) for an experienced REGEX user.

Highstepper answered 11/1, 2023 at 20:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.