Lexing and parsing concurrently in F#
Asked Answered
V

1

37

Is there an easy way to get lexing and parsing to run concurrently when using fslex and fsyacc?

Vitriform answered 15/7, 2012 at 18:7 Comment(15)
What is the purpose of concurrently running fslex and fsyacc? Usually parsing is done after lexing as the output of lexing is the input of parsing. The lexing phase builds tokens from an input stream then the parser usually checks the correctness of the tokens with respect to a grammar. So I don't know why should one run them concurrently? Can you be a bit more precise? p.s.: I'm not critisizing your question. I'm sure you have valid reasons to ask it. I'm just curious. :)Gossip
@MSX: Performance. Lexing and parsing are often performance critical (IME) and often take ~50% of the total time each, so lexing on one core and concurrently parsing the lexical tokens on another core gives a potential 2x speedup. Simiarly, compression/decompression and disk IO can both be performance critical and can be done concurrently. Despite this potential there appears to be no way to do this using F# and/or .NET without doing a huge rewrite.Vitriform
Right. Didn't think about that. I just voted up your question. As far as I know this cannot be done in fslex and fsyacc.Gossip
@JonHarrop I'm currently doing that huge rewrite you mentioned: see my fsharp-tools project. I'm currently working to get my new tools to par with fslex and fsyacc, and once that's done I'm planning to implement new backends (for generating the F# code implementing the lexer/parser). If you're still interested in this, please open a Github issue on the project so we can discuss further.Kreiner
While I am not completely sure, I believe that fsyacc-generated parsers actually only calls the associated lexer when needed, i.e., the lexer runs only when the parser needs a new token.Xerography
Not sure if this is what you're asking but there is a similar answer here:#6533220Grotesque
No answer? Really? Anyone? Anyone? Bueller? Bueller?Penury
Are you asking for 1) the ability to use multiple processors to parse a single input file, or 2) the ability to use multiple processors to parse separate files, where each file is parsed on a single processor?Hessian
@280Z28: I am asking for the ability to lex and parse concurrently, e.g. lex on one core and pass lexical tokens to a second core for parsing.Vitriform
@jonharrop: As far as I know lexing and parsing are - for most tasks - not the time critical steps in a compiler/program. They both work in O(n), with n the size of the input while for instance semantical analysis, liveness analysis and tiling take way more time.Amata
@CommuSoft: That's great but I am not doing instance semantical analysis, liveness analysis or tiling because I am not writing a compiler.Vitriform
It's still not clear if it's possible to split work among multiple fslex instances by splitting the inputs. This would be sensible, producing a "chunky" parallelization possibility, instead of a "chatty" one.Bleacher
@GregC: I am not sure what you mean by "splitting the inputs" to fslex. The input is a single stream of characters processed by a state machine that is embarrassingly sequential. I am interested in lexing and parsing concurrently.Vitriform
@JonHarrop: All I am saying is that maybe this should not be parallelized, because parallelization opportunities might exist at a higher level of abstraction. For example, don't parallelize C# compiler's lex/yacc stage, but parallelize working with several C# source files.Bleacher
Make the lexer stream tokens to an RX observable, let the parser be a subscriber of the lexerstream ...profit :) jokes aside, it is an interesting topic and should deffo be possible to acheive. never heard of any tool for it though.Beaver
F
1

First of all in real case lexing and parsing is time critical. Especially if you need to process tokens before parsing. For example -- filtering and collecting of comments or resolving of context-depended conflicts. In this case parser often wait for a lexer.

The answer for a question. You can run lexing and parsing concurrently with MailboxProcessor.

Core of idea. You can run lexer in mailBoxProcessor. Lexer should produce new tokens, process and post them. Lexer often faster than parser, and sometimes it should wait for a parser. Parser can receive next token when necessary. Code provided below. You can modify timeouts, traceStep to find optimal for your solution.

[<Literal>]
let traceStep = 200000L

let tokenizerFun = 
    let lexbuf = Lexing.LexBuffer<_>.FromTextReader sr                        
    let timeOfIteration = ref System.DateTime.Now
    fun (chan:MailboxProcessor<lexer_reply>) ->
    let post = chan.Post 
    async {
        while not lexbuf.IsPastEndOfStream do
            lastTokenNum := 1L + !lastTokenNum
            if (!lastTokenNum % traceStep) = 0L then 
                let oldTime = !timeOfIteration
                timeOfIteration := System.DateTime.Now
                let mSeconds = int64 ((!timeOfIteration - oldTime).Duration().TotalMilliseconds)
                if int64 chan.CurrentQueueLength > 2L * traceStep then                                                                                  
                    int (int64 chan.CurrentQueueLength * mSeconds / traceStep)  |> System.Threading.Thread.Sleep      
            let tok = Calc.Lexer.token lexbuf
            // Process tokens. Filter comments. Add some context-depenede information.
            post tok
    }   

use tokenizer =  new MailboxProcessor<_>(tokenizerFun)

let getNextToken (lexbuf:Lexing.LexBuffer<_>) =
    let res = tokenizer.Receive 150000 |> Async.RunSynchronously
    i := 1L + !i 

    if (!i % traceStep) = 0L then 
        let oldTime = !timeOfIteration
        timeOfIteration := System.DateTime.Now
        let seconds = (!timeOfIteration - oldTime).TotalSeconds          
    res

let res =         
    tokenizer.Start()            
    Calc.Parser.file getNextToken <| Lexing.LexBuffer<_>.FromString "*this is stub*"

Full solution is available here: https://github.com/YaccConstructor/ConcurrentLexPars In this solution we only demonstrate full implementation of described idea . Performance comparison is not actual because semantic calculation is very simple and no tokens processing.

To find out performance comparison result look at full report https://docs.google.com/document/d/1K43g5jokNKFOEHQJVlHM1gVhZZ7vFK2g9CJHyAVtUtg/edit?usp=sharing Here we compare performance of sequential and concurrent solution for parser of T-SQL subset. Sequential: 27 sec, concurrent: 20 sec.

Also we use this technique in production T-SQL translator.

Friday answered 24/6, 2014 at 11:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.