Detecting and skipping line comments with Flex
Asked Answered
G

3

6

How can I detect one line comments like // in Flex and skip those lines?

Also, for /* comments, will the following snippet be enough?

   "/*"         { comment(); }
  %%

    comment()
    {
        char c, c1;
    loop:
        while ((c = input()) != '*' && c != 0)
            putchar(c);

        if ((c1 = input()) != '/' && c != 0)
        {
            unput(c1);
            goto loop;
        }

        if (c != 0)
            putchar(c1);
    }
Greenshank answered 20/8, 2014 at 1:2 Comment(0)
H
30

Why don't you just use regular expressions to recognize the comments? The whole point of lex/flex is to save you from having to write lexical scanners by hand. The code you present should work (if you put the pattern /* at the beginning of the line), but it's a bit ugly, and it is not obvious that it will work.

Your question says that you want to skip comments, but the code you provide uses putchar() to print the comment, except for the /* at the beginning. Which is it that you want to do? If you want to echo the comments, you can use an ECHO action instead of doing nothing.

Here are the regular expressions:

Single line comment

This one is easy because in lex/flex, . won't match a newline. So the following will match from // to the end of the line, and then do nothing.

"//".*                                    { /* DO NOTHING */ }

Multiline comment

This is a bit trickier, and the fact that * is a regular expression character as well as a key part of the comment marker makes the following regex a bit hard to read. I use [*] as a pattern which recognizes the character *; in flex/lex, you can use "*" instead. Use whichever you find more readable. Essentially, the regular expression matches sequences of characters ending with a (string of) * until it finds one where the next character is a /. In other words, it has the same logic as your C code.

[/][*][^*]*[*]+([^*/][^*]*[*]+)*[/]       { /* DO NOTHING */ }

The above requires the terminating */; an unterminated comment will force the lexer to back up to the beginning of the comment and accept some other token, usually a / division operator. That's likely not what you want, but it's not easy to recover from an unterminated comment since there's no really good way to know where the comment should have ended. Consequently, I recommend adding an error rule:

[/][*][^*]*[*]+([^*/][^*]*[*]+)*[/]       { /* DO NOTHING */ }
[/][*]                                    { fatal_error("Unterminated comment"); }
Hydrolysis answered 20/8, 2014 at 4:12 Comment(9)
You solution for multiline comments is simple and great but i still have a problem... I need to increment a line counter for each line commented, so I need to consider the \n inside comments... Is it possible maintaining the main idea of your regex solution? ThanksGreenshank
flex will count lines for you. See yylinenoHydrolysis
Hmmm.. Thats strange. Flex is returning wrong lines for my entry... Its counting just the line with the initial comment symbolGreenshank
I have just created another question to discuss this point. #25416937Greenshank
The single line comment solution doesn't appear to handle line splicing.Blanchette
@DanO: Guilty as charged. Handling line-splicing inline in (f)lex patterns is possible, and it will become less of a pain once trigraphs really vanish. But it's still ugly and IMHO doesn't really contribute to learning (f)lex. In most applications, the overhead of doing phase 2 as a separate phase is not noticeable, and the code is much simpler. You can do it in YY_INPUT, for example, where the entire input is copied into the flex buffer anyway. I'd be happy to write that up if anyone ever asked about implementing line splicing in (f)lex.Hydrolysis
@Hydrolysis is "//"(\\\n|\\\r|\\\r\n|[^\r\n])* not correct for the line splicing case? I'm not sure if this inhibits learning c grammar or flex or both or neither :)Blanchette
@DanO: Not correct. The line splice could happen between the initial //. Also, if you support trigraphs, the line splice can be spelt ??/<newline>.Hydrolysis
(Some) real-world compilers do accept \r but that's an extension. Other real-world compilers (gcc & clang, for example) allow trailing whitespace after the backslash, which does save a certain amount of grief. So it's not a simple question.Hydrolysis
A
2

For // you can read until you find the end of line \n or EOF, in case if the comment was at the end of file, for example:

static void
skip_single_line_comment(void)
{
  int c;

  /* Read until we find \n or EOF */
  while((c = input()) != '\n' && c != EOF)
    ;

  /* Maybe you want to place back EOF? */
  if(c == EOF)
    unput(c);
}

as for multiple lines comments /* */, you can read until you see * and peek the next character, if it's / this means this is the end of comment, if not just skip it with any other character. You shouldn't expect EOF, means unclosed comment:

static void
skip_multiple_line_comment(void)
{
  int c;

  for(;;)
  {
    switch(input())
    {
      /* We expect ending the comment first before EOF */
      case EOF:
        fprintf(stderr, "Error unclosed comment, expect */\n");
        exit(-1);
        goto done;
      break;
      /* Is it the end of comment? */
      case '*':
        if((c = input()) == '/')
          goto done;
        unput(c);
        break;
      default:
        /* skip this character */
        break;
    }
  }

done:
  /* exit entry */ ;
}

Complete file:

%{
#include <stdio.h>

static void skip_single_line_comment(void);
static void skip_multiple_line_comment(void);

%}

%option noyywrap

%%
"//"              { puts("short comment was skipped ");
                    skip_single_line_comment();}

"/*"              { puts("long comment begins ");
                    skip_multiple_line_comment();
                    puts("long comment ends");}

" "               { /* empty */ }
[\n|\r\n\t]       { /* empty */ }
.                 { fprintf(stderr, "Tokenizing error: '%c'\n", *yytext);
                    yyterminate(); }
%%

static void
skip_single_line_comment(void)
{
  int c;

  /* Read until we find \n or EOF */
  while((c = input()) != '\n' && c != EOF)
    ;

  /* Maybe you want to place back EOF? */
  if(c == EOF)
    unput(c);
}

static void
skip_multiple_line_comment(void)
{
  int c;

  for(;;)
  {
    switch(input())
    {
      /* We expect ending the comment first before EOF */
      case EOF:
        fprintf(stderr, "Error unclosed comment, expect */\n");
        exit(-1);
        goto done;
      break;
      /* Is it the end of comment? */
      case '*':
        if((c = input()) == '/')
          goto done;
        unput(c);
        break;
      default:
        /* skip this character */
        break;
    }
  }

done:
  /* exit entry */ ;
}

int main(int argc, char **argv)
{
  yylex();
  return 0;
}
Apropos answered 20/8, 2014 at 2:14 Comment(0)
C
-1

To detect single line comments :

^"//"    printf("This is a comment line\n");

This says any line which starts with // will be considered as comment line.

To detect multi line comments :

^"/*"[^*]*|[*]*"*/" printf("This is a Multiline Comment\n");

*

Explanation :

*

^"/*" This says beginning should be /*.

[^*]* includes all characters including \n but excludes *.

[*]* says 0 or more number of stars.

[^*]|[*]* - "or" operator is applied to get any string.

"*/" specifies */ as end.

This will work perfectly in lex.

Below is the complete code of lex file :

%{
#include <stdio.h>
int v=0;
%}
%%
^"//"    printf("This is a comment line\n");
^"/*"[^*]*|[*]*"*/" printf("This is a Multiline Comment\n");
.|\n {}
%%
int yywrap()
{
    return 1;
}
main()
{
    yylex();
}
Catatonia answered 3/1, 2017 at 6:48 Comment(1)
Multiline comment here clearly doesn't work. Consider: /* foo bar* */Citrin

© 2022 - 2024 — McMap. All rights reserved.