How do I prioritize two overlapping expressions? (Ragel)
Asked Answered
D

2

1

I have 2 expression :

ident = alpha . (alnum|[._\-])*;
string = (printable1)+;
  # Printable includes almost all Windows-1252 characters with glyphs.
main := ( ident % do_ident | string % do_string )
  # The do_* actions have been defined, and generate tokens.

Obviously, any ident is a string. Ragel has priority operators to overcome this. But no matter how I've tried to set the priorities, either some idents execute both actions, or some valid strings are ignored (valid strings with a valid ident as a prefix, for example: ab$).

I have found one way around it, without using priorities:

main := ( ident % do_ident | (string - ident) % do_string )

But if I have more than a few overlapping expression, this will get cumbersome. Is this the only practical way?

Any help with the correct way to do this would be appreciated.

Dorkas answered 30/1, 2014 at 21:11 Comment(2)
I have the same question as you. Do you find a good answer ? Or just keep it as you have suggested?Princeling
I ended up going with my method above, subtracting various higher-priority definition from lower-priority ones. It turned out I only needed to do this for 4 out of 14 patterns. The resulting code is not pretty, but it works.Dorkas
A
1

Take a look at section '6.3 Scanners' in the Ragel Guide.

main := |*
    ident  => do_ident;
    string => do_string;
*|;

Note: When using scanners, have ts, te, and act defined in the host language.

Aristate answered 4/2, 2014 at 12:26 Comment(4)
The problem with Ragel scanners is that they test the input against every pattern, to find longest match. But for quoted strings, I have 4 patterns, 3 of which are proper subsets of the default. Each string will get checked against 4 patterns, & for the 3 special cases that's 2 complete scans of that string. For unquoted strings, there are 6 sub-patterns, with the same problem. I can eliminate some of those, but overlap will remain. I can do it w/o scanners, just a bit more trouble to write. Against doubling of lexing time, I'll just have to spend more coding time. :(Dorkas
Wow, true a scanner wouldn't work for you. I have no visibility to your greater project, but I'd recommend isolating the overlapping issue in a single machine instantiation. Then just use fhold when you can identify an issue can exist, fgoto to address the overlap, and fret to jump back into the main machine like in this example.Aristate
Looking at that example code, that a lot more programming than my method above of subtracting out patterns. So far, my graph has 54 states & 152 transitions. Ragel is generating it quite well, but I've found that Graphviz is horrible for looking at it. :PDorkas
For more detail, I'm reading data from a file sort of like JSON, but definitely not. I need ints, quoted ints, floats, dates (in the form 999.1.31), quoted dates, identifiers, quoted identifiers, strings, quoted strings, comments (starts with #, runs to EOL), and '{', '}', '=' symbols. Whitespace is irrelevant, accept that it can break up other tokens. Character set is Windows-1252, with most of it valid for strings & quoted strings.Dorkas
K
1

Looks like your issue is that all valid identifiers are also valid strings, you just want it to be interpreted as an identifier first if possible. You can force it to accept an identifier first by embedding a priority in the leaving action for ident, which overrides over all transitions for string:

main := ( ident %(ident_vs_string, 1) % do_ident | string $(ident_vs_string, 0) % do_string )

This will ensure that the leaving transition following a valid expression stops the machine exploring either continuing or leaving a string.

Be careful with how this combined expression is terminated. Whatever expression follows the identifier/string must start with a character not permissible in either, so that the exit transitions are well defined.

Kriss answered 2/10, 2014 at 23:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.