How to parse a symmetric quoted string using nom in rust?
Asked Answered
C

1

6

How should I parse a quoted string similar to rust's raw strings using nom? I want to parse the following:

"A standard string"
#"A string containing ["] a quote"#
##"A string containing ["#] a quote and hash "##

How would I do this, requiring equal number of '#' symbols at the start and end, while allowing #'ed strings to contain unescaped quotes and hashes?

Classieclassification answered 24/5, 2020 at 4:32 Comment(0)
P
6

This would be my approach (using nom-5.1.1):

extern crate nom;

use nom::{
  IResult,
  multi::{count, fold_many0, many_till},
  bytes::complete::{tag, take},
  sequence::pair
};

fn quoted_str(input: &str) -> IResult<&str, &str> {

  // Count number of leading #
  let (remaining, hash_count) = fold_many0(tag("#"), 0, |acc, _| acc + 1)(input)?;

  // Match "
  let (remaining, _) = tag("\"")(remaining)?;

  // Take until closing " plus # (repeated hash_count times)
  let closing = pair(tag("\""), count(tag("#"), hash_count));
  let (remaining, (inner, _)) = many_till(take(1u32), closing)(remaining)?;

  // Extract inner range
  let offset = hash_count + 1;
  let length = inner.len();

  Ok((remaining, &input[offset .. offset + length]))
}

#[test]
fn run_test() {
  assert_eq!(quoted_str("\"ABC\""), Ok(("", "ABC")));
  assert_eq!(quoted_str("#\"ABC\"#"), Ok(("", "ABC")));
  assert_eq!(quoted_str("##\"ABC\"##"), Ok(("", "ABC")));
  assert_eq!(quoted_str("###\"ABC\"###"), Ok(("", "ABC")));

  assert_eq!(quoted_str("#\"ABC\"XYZ\"#"), Ok(("", "ABC\"XYZ")));
  assert_eq!(quoted_str("#\"ABC\"#XYZ\"#"), Ok(("XYZ\"#", "ABC")));
  assert_eq!(quoted_str("#\"ABC\"##XYZ\"#"), Ok(("#XYZ\"#", "ABC")));

  assert_eq!(quoted_str("##\"ABC\"XYZ\"##"), Ok(("", "ABC\"XYZ")));
  assert_eq!(quoted_str("##\"ABC\"#XYZ\"##"), Ok(("", "ABC\"#XYZ")));
  assert_eq!(quoted_str("##\"ABC\"##XYZ\"##"), Ok(("XYZ\"##", "ABC")));
  assert_eq!(quoted_str("##\"ABC\"###XYZ\"##"), Ok(("#XYZ\"##", "ABC")));

  assert_eq!(quoted_str("\"ABC\"XYZ"), Ok(("XYZ", "ABC")));
  assert_eq!(quoted_str("#\"ABC\"#XYZ"), Ok(("XYZ", "ABC")));
  assert_eq!(quoted_str("##\"ABC\"##XYZ"), Ok(("XYZ", "ABC")));
}

If performance is important to you, the implicit vector allocation in many_till could be avoided by writing a fold_many_till function based on the code for fold_many0 and many_fill. It seems nom does not currently provide such a function.

Pythian answered 24/5, 2020 at 16:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.