How do I create a streaming parser in nom?
Asked Answered
G

3

10

I've created a few non-trivial parsers in nom, so I'm pretty familiar with it at this point. All the parsers I've created until now always provide the entire input slice to the parser.

I'd like to create a streaming parser, which I assume means that I can continue to feed bytes into the parser until it is complete. I've had a hard time finding any documentation or examples that illustrate this, and I also question my assumption of what a "streaming parser" is.

My questions are:

  • Is my understanding of what a streaming parser is correct?
  • If so, are there any good examples of a parser using this technique?
Guesstimate answered 22/10, 2017 at 17:6 Comment(1)
You can find an accurate answer to the question what is a streaming / online parser as opposed to an offline parser here.Mysticism
G
4

nom parsers neither maintain a buffer to feed more data into, nor do they maintain "state" where they previously needed more bytes.

But if you take a look at the IResult structure you see that you can return a partial result or indicate that you need more data.

There seem to be some structures provided to handle streaming: I think you are supposed to create a Consumer from a parser using the consumer_from_parser! macro, implement a Producer for your data source, and call run until it returns None (and start again when you have more data). Examples and docs seem to be mostly missing so far - see bottom of https://github.com/Geal/nom :)

Also it looks like most functions and macros in nom are not documented well (or at all) regarding their behavior when hitting the end of the input. For example take_until! returns Incomplete if the input isn't long enough to contain the substr to look for, but returns an error if the input is long enough but doesn't contain substr.

Also nom mostly uses either &[u8] or &str for input; you can't signal an actual "end of stream" through these types. You could implement your own input type (related traits: nom::{AsBytes,Compare,FindSubstring,FindToken,InputIter,InputLength,InputTake,Offset,ParseTo,Slice}) to add a "reached end of stream" flag, but the nom provided macros and functions won't be able to interpret it.

All in all I'd recommend splitting streamed input through some other means into chunks you can handle with simple non-streaming parsers (maybe even use synom instead of nom).

Gasolier answered 22/10, 2017 at 21:15 Comment(1)
is there a way to do this in the newest nom version?Ritualist
C
4

As far as I understand, nom's architecture has changed since this question was originally asked (which is why the accepted answer didn't work for me).

I was struggling with the same question, and the more I found out about it, the more I realised it was not that easy and straight-forward.

I wrote a blog post about my findings. In short, it comes down to the following steps:

  • Create something that streams in the data (an iterator that serves bytes as they are read from a file)
  • Decide what it is that you want to stream out (usually something like "log lines" or "video frames", etc).
  • Build an iterator that outputs these things. In the iterator-struct, keep track of the unparsed data. In the next() function, first see if the parser will parse based on the current unparsed data. If it returns an Err(Err::Incomplete(_)), add more data and try again until it returns an object that you return in the iterator.

See the blog post and this GitHub repo for more info.

Update:

After writing the blog I ran into winnow -- and I actually decided to use Winnow for my own project. If you can choose your architecture, I would advise you to use winnow.

Colorful answered 19/6, 2023 at 11:0 Comment(0)
E
1

Here is a minimal working example. As @Stefan wrote, "I'd recommend splitting streamed input through some other means into chunks you can handle".

What somewhat works (and I'd be glad for suggestions on how to improve it), is to combine a File::bytes() method and then only take as many bytes as necessary and pass them to nom::streaming::take.

let reader = file.bytes();
let buf = reader.take(length).collect::<B>()?;
let (_input, chunk) = take(length)(&*buf)...; 

The complete function can look like this:

/// Parse the first handful of bytes and return the bytes interpreted as UTF8
fn parse_first_bytes(file: std::fs::File, length: usize) -> Result<String> {
    type B = std::result::Result<Vec<u8>, std::io::Error>;
    let reader = file.bytes();

    let buf = reader.take(length).collect::<B>()?;
    let (_input, chunk) = take(length)(&*buf)
        .finish()
        .map_err(|nom::error::Error { input: _, code: _ }| eyre!("..."))?;
    let s = String::from_utf8_lossy(chunk);

    Ok(s.to_string())
}

Here is the rest of main for an implementation similar to Unix' head command.

use color_eyre::Result;
use eyre::eyre;
use nom::{bytes::streaming::take, Finish};
use std::{fs::File, io::Read, path::PathBuf};
use structopt::StructOpt;

#[derive(Debug, StructOpt)]
#[structopt(about = "A minimal example of parsing a file only partially. 
  This implements the POSIX 'head' utility.")]
struct Args {
    /// Input File
    #[structopt(parse(from_os_str))]
    input: PathBuf,
    /// Number of bytes to consume
    #[structopt(short = "c", default_value = "32")]
    num_bytes: usize,
}

fn main() -> Result<()> {
    let args = Args::from_args();
    let file = File::open(args.input)?;

    let head = parse_first_bytes(file, args.num_bytes)?;
    println!("{}", head);

    Ok(())
}
Eighteen answered 29/11, 2020 at 21:37 Comment(1)
I cannot give myself a bounty... So anyone with some improvement/alternative approach please post your own answer.Eighteen

© 2022 - 2024 — McMap. All rights reserved.