How to do Erlang pattern matching using regular expressions?
Asked Answered
L

6

7

When I write Erlang programs which do text parsing, I frequently run into situations where I would love to do a pattern match using a regular expression.

For example, I wish I could do something like this, where ~ is a "made up" regular expression matching operator:

my_function(String ~ ["^[A-Za-z]+[A-Za-z0-9]*$"]) ->
    ....

I know about the regular expression module (re) but AFAIK you cannot call functions when pattern matching or in guards.

Also, I wish matching strings could be done in a case-insensitive way. This is handy, for example, when parsing HTTP headers, I would love to do something like this where "Str ~ {Pattern, Options}" means "Match Str against pattern Pattern using options Options":

handle_accept_language_header(Header ~ {"Accept-Language", [case_insensitive]}) ->
    ...

Two questions:

  1. How do you typically handle this using just standard Erlang? Is there some mechanism / coding style which comes close to this in terms of conciseness and easiness to read?

  2. Is there any work (an EEP?) going on in Erlang to address this?

Laos answered 2/11, 2009 at 11:12 Comment(1)
I doubt an EEP to add regular expressions as patterns would be supported. All current patterns can be evaluated in constant time, regexps can't. (length/1 is perhaps the only exception to the constant time rule)Bristling
U
6

You really don't have much choice other than to run your regexps in advance and then pattern match on the results. Here's a very simple example that approaches what I think you're after, but it does suffer from the flaw that you need to repeat the regexps twice. You could make this less painful by using a macro to define each regexp in one place.

-module(multire).

-compile(export_all).

multire([],_) ->
    nomatch;
multire([RE|RegExps],String) ->
    case re:run(String,RE,[{capture,none}]) of
    match ->
        RE;
    nomatch ->
        multire(RegExps,String)
    end.


test(Foo) ->
    test2(multire(["^Hello","world$","^....$"],Foo),Foo).

test2("^Hello",Foo) ->
    io:format("~p matched the hello pattern~n",[Foo]);
test2("world$",Foo) ->
    io:format("~p matched the world pattern~n",[Foo]);
test2("^....$",Foo) ->
    io:format("~p matched the four chars pattern~n",[Foo]);
test2(nomatch,Foo) ->
    io:format("~p failed to match~n",[Foo]).
Uppish answered 2/11, 2009 at 15:35 Comment(0)
E
6

A possibility could be to use Erlang Web-style annotations (macros) combined with the re Erlang module. An example is probably the best way to illustrate this.

This is how your final code will look like:

[...]
?MATCH({Regexp, Options}).
foo(_Args) ->
  ok.
[...]

The MATCH macro would be executed just before your foo function. The flow of execution will fail if the regexp pattern is not matched.

Your match function will be declared as follows:

?BEFORE.
match({Regexp, Options}, TgtMod, TgtFun, TgtFunArgs) ->
String = proplists:get_value(string, TgtArgs),
case re:run(String, Regexp, Options) of
  nomatch ->
    {error, {TgtMod, match_error, []}};
  {match, _Captured} ->
    {proceed, TgtFunArgs}
end.

Please note that:

  • The BEFORE says that macro will be executed before your target function (AFTER macro is also available).
  • The match_error is your error handler, specified in your module, and contains the code you want to execute if you fail a match (maybe nothing, just block the execution flow)
  • This approach has the advantage of keeping the regexp syntax and options uniform with the re module (avoid confusion).

More information about the Erlang Web annotations here:

http://wiki.erlang-web.org/Annotations

and here:

http://wiki.erlang-web.org/HowTo/CreateAnnotation

The software is open source, so you might want to reuse their annotation engine.

Enzymolysis answered 2/11, 2009 at 16:16 Comment(0)
N
4

You can use the re module:

re:run(String, "^[A-Za-z]+[A-Za-z0-9]*$").
re:run(String, "^[A-Za-z]+[A-Za-z0-9]*$", [caseless]).

EDIT:

match(String, Regexps) -> 
  case lists:dropwhile(
               fun({Regexp, Opts}) -> re:run(String, Regexp, Opts) =:= nomatch;
                  (Regexp) -> re:run(String, Regexp) =:= nomatch end,
               Regexps) of
    [R|_] -> R;
    _     -> nomatch
  end.

example(String) ->
  Regexps = ["$RE1^", {"$RE2^", [caseless]}, "$RE3"]
  case match(String, Regexps) of
    nomatch -> handle_error();
    Regexp -> handle_regexp(String, Regexp)
    ...
Neapolitan answered 2/11, 2009 at 11:29 Comment(4)
Yes, the re module does a great job at regular expressions, but you AFAIK you cannot call functions while pattern matching or in guards.Laos
If I only understood what you mean by pattern matching... should Erlang make-up a regular expression for you that matches the string, or what?Neapolitan
I think what he would like is something like an is_match(RegExp,S) bif for use in guards, so: foo(X) when is_match(RE1,X) -> one_thing(); foo(X) when is_match(RE2,X) -> another_thing(). etc.Uppish
OK, I added an example of what could be done, if that's the case.Neapolitan
W
3
  1. For string, you could use the 're' module : afterwards, you iterate over the result set. I am afraid there isn't another way to do it AFAIK: that's why there are regexes.

  2. For the HTTP headers, since there can be many, I would consider iterating over the result set to be a better option instead of writing a very long expression (potentially).

  3. EEP work : I do not know.

Washout answered 2/11, 2009 at 11:35 Comment(0)
O
2
  1. Erlang does not handle regular expressions in patterns.
  2. No.
Outthink answered 5/11, 2009 at 0:22 Comment(0)
C
1

You can't pattern match on regular expressions, sorry. So you have to do

my_function(String) -> Matches = re:run(String, "^[A-Za-z]+[A-Za-z0-9]*$"),
                       ...
Churchlike answered 2/11, 2009 at 14:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.