Lex/Flex :Regular expression for string literals in C/C++?
Asked Answered
P

1

8

I look here ANSI C grammar .

This page includes a lot of regular expressions in Lex/Flex for ANSI C.

Having a problem in understanding regular expression for string literals.

They have mentioned regular expression as \"(\\.|[^\\"])*\"

As I can understand \" this is used for double quotes, \\ is for escape character, . is for any character except escape character and * is for zero or more times.

[^\\"] implies characters except \ , " .

So, in my opinion, regular expression should be \"(\\.)*\".

Can you give some strings where above regular expression will fail?

or

Why they have used [^\\"]?

Petes answered 31/1, 2015 at 3:50 Comment(1)
if you have " as delimiter, then \\. would matcha literal dot. not a backslash and dot.Ado
L
5

The regex \"(\\.)*\" that you proposed matches strings that consist of \ symbols alternating with any characters like:

"\z\x\p\r"

This regular expression would therefore fail to match a string like:

"hello"

The string "hello" would be matched by the regex \".*\" but that would also match the string """" or "\" both of which are invalid.

To get rid of these invalid matches we can use \"[^\\"]*\", but this will now fail to match a string like "\a\a\a" which is a valid string.

As we saw \"(\\.)*\" does match this string, so all we need to do is combine these two to get \"(\\.|[^\\"])*\".

Loyal answered 31/1, 2015 at 4:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.