Chunked Parsing with FParsec
Asked Answered
B

1

5

Is it possible to submit input to an FParsec parser in chunks, as from a socket? If not, is it possible to retrieve the current result and unparsed portion of an input stream so that I might accomplish this? I'm trying to run the chunks of input coming in from SocketAsyncEventArgs without buffering entire messages.

Update

The reason for noting the use of SocketAsyncEventArgs was to denote that sending data to a CharStream might result in asynchronous access to the underlying Stream. Specifically, I'm looking at using a circular buffer to push the data coming in from the socket. I remember the FParsec documentation noting that the underlying Stream should not be accessed asynchronously, so I had planned on manually controlling the chunked parsing.

Ultimate questions:

  1. Can I use a circular buffer under my Stream passed to the CharStream?
  2. Do I not need to worry myself with manually controlling the chunking in this scenario?
Brail answered 17/1, 2012 at 7:27 Comment(1)
FParsec works on CharStream, so the answer is probably yes. I will defer to better answers (hopefully) though.Sallee
S
8

The normal version of FParsec (though not the Low-Trust version) reads the input chunk-wise, or "block-wise", as I call it in the CharStream documentation. Thus, if you construct a CharStream from a System.IO.Stream and the content is large enough to span multiple CharStream blocks, you can start parsing before you've fully retrieved the input.

Note however, that the CharStream will consume the input stream in chunks of a fixed (but configurable) size, i.e. it will call the Read method of the System.IO.Stream as often as is necessary to fill a complete block. Hence, if you parse the input faster than you can retrieve new input, the CharStream may block even though there is already some unparsed input, because there's not yet enough input to fill a complete block.

Update

The answer(s) to your ultimate questions: 42.

  • How you implement the Stream from which you construct the CharStream is entirely up to you. The restriction you're remembering that excludes parallel access only applies to the CharStream class, which isn't thread safe.

  • Implementing the Stream as a circular buffer will likely restrict the maximum distance over which you can backtrack.

  • The block size of the CharStream influences how far you can backtrack when the Stream does not support seeking.

  • The simplest way to parse input asynchronously is to do the parsing in an async task (i.e. on a background thread). In the task you could simply read the socket synchronously, or, if you don't trust the buffering by the OS, you could use a stream class like the BlockingStream described in the article you linked in the second comment below.

  • If the input can be easily separated into independent chunks (e.g. lines for a line-based text format), it might be more efficient to chunk it up yourself and then parse the input chunk by chunk.

Scouting answered 17/1, 2012 at 9:48 Comment(4)
Thanks, Stephan. I revised my question to better reflect what I'm trying to better understand. I was initially excited about the CharStream approach, but the note I found previously about asynchronous access to the underlying Stream caused me to pause.Brail
Would something like this work? msdn.microsoft.com/en-us/magazine/cc163290.aspxBrail
I've updated my reply. Does this answer your questions? I'm not sure which note regarding asynchrounous access to the underlying Stream you refer to exactly. Could you write where exactly you found that note?Scouting
I believe the note was in the old documentation. I can't find the reference in the new documentation. Thanks! This answers all of my questions. :)Brail

© 2022 - 2024 — McMap. All rights reserved.