Removing nested comments bz lex
Asked Answered
L

3

5

How should I do program in lex (or flex) for removing nested comments from text and print just the text which is not in comments? I should probably somehow recognize states when I am in comment and number of starting "tags" of block comment.

Lets have rules:
1.block comment

/*
block comment
*/

2. line comment

// line comment

3. Comments can be nested.

Example 1

show /* comment /* comment */ comment */ show

output:

show  show

Example 2

show /* // comment
comment
*/
show

output:

show 
show 

Example 3

show
///* comment
comment
// /*
comment
//*/ comment
//
comment */
show

output:

show
show
Laubin answered 17/10, 2012 at 20:56 Comment(0)
B
7

You got the theory right. Here's a simple implementation; could be improved.

%x COMMENT
%%
%{
   int comment_nesting = 0;
%}

"/*"            BEGIN(COMMENT); ++comment_nesting;
"//".*          /* // comments to end of line */

<COMMENT>[^*/]* /* Eat non-comment delimiters */
<COMMENT>"/*"   ++comment_nesting;
<COMMENT>"*/"   if (--comment_nesting == 0) BEGIN(INITIAL);
<COMMENT>[*/]   /* Eat a / or * if it doesn't match comment sequence */

  /* Could have been .|\n ECHO, but this is more efficient. */
([^/]*([/][^/*])*)* ECHO;  
%%
Boustrophedon answered 18/10, 2012 at 7:34 Comment(0)
Y
2

This is exactly what you need : yy_push_state(COMMENT) Its uses a stack to store our states which comes handy in nested situations.

Yovonnda answered 9/3, 2013 at 16:59 Comment(0)
S
0

I am afraid that @rici 's answer might be wrong. First we need to record line no and might change the file line directive later. Second giving open_sign and close_sign. We have following principles:

1) using an integer for stack control: push for open sign, popup for close sign
2) eat up CHARACTER BEFORE EOF and close sign WITHOUT open sign inside
<comments>{open} {no_open_sign++;}
<comments>\n {curr_lineno++;}
<comments>[^({close})({open})(EOF)] /*EAT characters by doing nothing*/
3) Errors might happen when no_open_sign down to zero, hence
<comments>{close}  similar as above post
4) EOF should not be inside the string, hence you need a rule
<comments>(EOF) {return ERROR_TOKEN;}

to make it more robust, you also need to have another close checking rule out side of

And in practice, you should use negative look before and look behind regular expression gramma if your lexical analyzer supports it.

Subclavius answered 6/10, 2016 at 5:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.