I'm doing a lexer as a part of a university course. One of the brain teasers (extra assignments that don't contribute to the scoring) our professor gave us is how could we implement comments inside string literals.
Our string literals start and end with exclamation mark. e.g. !this is a string literal!
Our comments start and end with three periods. e.g. ...This is a comment...
Removing comments from string literals was relatively straightforward. Just match string literal via /!.*!/
and remove the comment via regex. If there's more than three consecutive commas, but no ending commas, throw an error.
However, I want to take this even further. I want to implement the escaping of the exclamation mark within the string literal. Unfortunately, I can't seem to get both comments and exclamation mark escapes working together.
What I want to create are string literals that can contain both comments and exclamation mark escapes. How could this be done?
Examples:
!Normal string!
!String with escaped \! exclamation mark!
!String with a comment ... comment ...!
!String \! with both ... comments can have unescaped exclamation marks!!!... !
This is my current code that can't ignore exclamation marks inside comments:
def t_STRING_LITERAL(t):
r'![^!\\]*(?:\\.[^!\\]*)*!'
# remove the escape characters from the string
t.value = re.sub(r'\\!', "!", t.value)
# remove single line comments
t.value = re.sub(r'\.\.\.[^\r\n]*\.\.\.', "", t.value)
return t