Flex seems do not support a regex lookahead assertion (the fast lex analyzer)
Asked Answered
L

1

10

When I tried to use regex in flex as following to define an int type:

int    (?<!\w)(([1-9]\d*)|0)(?!\w)

I meant to make this invalid:

int a = 123;
int b = 123f; //the '123' should not filtered as an int type

However, I got this:

bad character: <
bad character: !
bad character: \
...

What's more, it seems that the ? in the regex was ignored. I got confused. Does flex not support the lookahead assertion (?<=xxx) or (?<!xxx) ?

I am new in flex, I really need some help

Littell answered 11/3, 2014 at 12:56 Comment(4)
Look here: stratulat.com/Regular_Expressions_Flex.html Your regex seems to work, perhaps the problem is in your code.Fluoridate
Your code snippets look like C, not Actionscript. Are you using the "fast lexical analyzer" or Adobe Flex?Acidimetry
Your regex works regexr.com?38gcv post codeZincography
I am using the fast lexical analyzer, sorry i did not mention it clearly ...Littell
O
18

That's correct. Flex does not support negative lookahead assertions. It also does not support \w or \d, although it does allow posix-style character classes ([[:alpha:]], [[:digit:]], [[:alnum:]], etc.)

Flex regular expressions are quite different from javascript-like or perl/python-like "regular" expressions. For one thing, flex's regular expressions are really regular.

A complete list of the syntaxes flex allows is in the flex manual. Anything not described in that section of the manual is not implemented by flex.

There is very little point using "lookbehind" with flex, because flex always matches the longest token at the current input point. It does not search the input for a pattern.

Flex does implement a limited form of positive lookahead, using the / operator (which is not part of any regular expression library I know of.) You could use that to only match a sequence of digits not immediately followed by a letter:

[[:digit:]]+/[^[:alpha:]]

But you'll then need some pattern which does match the sequence of digits followed by an alphabetic character, because flex does not search for a matching token.

Omniscient answered 14/3, 2014 at 19:8 Comment(2)
Thanks a lot. And... can I just believe that the lookahead assertion makes the matcher much more complicated, and flex does not support it, or there are some other reasons?Littell
@sstt: In general, lookahead assertions complicate the scanner, but flex does support positive lookahead (with some restrictions) and any regular expression can be negated into another regular expression. Any non-trivial lookahead assertion will require scanning input characters twice, so they should be avoided if you want a fast scanner. In any event, they are rarely needed in lexical analysis. For really complex lexical analysis, use start conditions to build a state machine.Omniscient

© 2022 - 2024 — McMap. All rights reserved.