How do I make a nom whitespace parser that also skips line-oriented comments?
Asked Answered
K

1

8

I'm writing a parser for a text-based format in nom 4.2.2, and I'm using the whitespace facility to skip whitespace. I have to use a custom parser because this format treats some unusual characters as whitespace. Following the example on that page, I've made one using eat_separator.

How do I efficiently extend my space parser to also consume line comments from # to end-of-line? These comments can appear anywhere except within strings. I always want to throw away the contents of the comment: there's nothing like pre-processor directives.

Kuopio answered 5/3, 2019 at 18:54 Comment(0)
Z
11

That's a tricky issue; I had it as well when writing a Python parser.

Here is how I ended up implementing "line break optionally preceded by a comment":

named!(pub newline<StrSpan, ()>,
  map!(
    many1!(
      tuple!(
        spaces_nonl,
        opt!(preceded!(char!('#'), many0!(none_of!("\n")))),
        char!('\n')
      )
    ),
    |_| ()
  )
);

named!(pub spaces_nl<StrSpan, ()>,
  map!(many0!(alt!(one_of!(" \t\x0c") => { |_|() } | escaped_newline | newline)), |_| ())
);
named!(pub spaces_nonl<StrSpan, ()>,
  map!(many0!(alt!(one_of!(" \t\x0c") => { |_| () }|escaped_newline)), |_| ())
);

Which you can then use to rewrite ws! to use this new function (I copy-pasted the code from nom and replaced the name of the argument of sep!):

/// Like `ws!()`, but ignores comments as well
macro_rules! ws_comm (
  ($i:expr, $($args:tt)*) => (
    {
      use nom::Convert;
      use nom::Err;

      match sep!($i, spaces_nl, $($args)*) {
        Err(e) => Err(e),
        Ok((i1,o))    => {
          match spaces_nl(i1) {
            Err(e) => Err(Err::convert(e)),
            Ok((i2,_))    => Ok((i2, o))
          }
        }
      }
    }
  )
);

Related code, in case you are curious: https://github.com/ProgVal/rust-python-parser/blob/1e03122f030e183096d7d3271907106678036f56/src/helpers.rs

Zahavi answered 5/3, 2019 at 19:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.