whitespace in flex patterns leads to "unrecognized rule"
Asked Answered
A

1

0

The flex info manual provides allows whitespace in regular expressions using the "x" modifier in the (?r-s:pattern) form. It specifically offers a simple example (without whitespace)

(?:foo)         same as  (foo)

but the following program fails to compile with the error "unrecognized rule":

BAD (?:foo)
%%
{BAD} {}

I cannot find any form of (? that is acceptable as a rule pattern. Is the manual in error, or do I misunderstand?

Anagram answered 24/10, 2018 at 19:1 Comment(2)
Works fine for me. Is it possible that you're reading the info page for a newer version of flex than the one you're using?Swiercz
Yes and no. In simplifying my example, I did cross versions from 2.5 to 2.6, which might be why it failed. What I've learned since is that the ignore-whitespace option can't be used in a definition (1st section) because it conflicts with the format rules,Anagram
B
1

The example in your question does not seem to reflect the question itself, since it shows neither the use of whitespace nor a x flag. So I'm going to assume that the pattern which is failing for you is something like

BAD      (?x:two | lines | 
             of | words)
%%
{BAD}    { }

And, indeed, that will not work. Although you can use extended format in a pattern, you can only use it in a definition if it doesn't contain a newline. The definition terminates at the last non-whitespace character on the definition line.

Anyway, definitions are overused. You could write the above as

%%
(?x:two | lines |
    of | words )     { }

Which saves anyone reading your code from having to search for a definition.

I do understand that you might want to use a very long pattern in a rule, which is awkward, particularly if you want to use it twice. Regardless of the issue with newlines, this tends to run into problems with Flex's definition length limit (2047 characters). My approach has been to break the very long pattern into a series of definitions, and then define another symbol which concatenates the pieces.

Before v2.6, Flex did not chop whitespace off the end of the definition line, which also leads to mysterious "unrecognized rule" errors. The manual seems to still reflect the v2.5 behaviour:

The definition is taken to begin at the first non-whitespace character following the name and continuing to the end of the line.

Batsman answered 24/10, 2018 at 20:40 Comment(2)
Thank you. It's a shame the manual isn't clearer about the limitations of definitions. In the actual use case, the pattern is 60 words and referred to in a couple of places amidst other definitions. The least bad answer was not a model of clarity: a (very long) single-line non-whitespace definition.Anagram
@James: Actually, my memory was playing tricks on me; the manual is actually clear. Changed my answer (but it probably doesn't help you). I knew I'd run into this problem trying to use definitions for very long patterns.Batsman

© 2022 - 2024 — McMap. All rights reserved.