Correctly parsing line indentations in uu-parsinglib in Haskell
Asked Answered
S

1

6

I want to create a parser combinator, which will collect all lines below current place, which indentation levels will be greater or equal some i. I think the idea is simple:

Consume a line - if its indentation is:

  • ok -> do it for next lines
  • wrong -> fail

Lets consider following code:

import qualified Text.ParserCombinators.UU as UU
import           Text.ParserCombinators.UU hiding(parse)
import           Text.ParserCombinators.UU.BasicInstances hiding (Parser)

-- end of line
pEOL   = pSym '\n'

pSpace = pSym ' '
pTab   = pSym '\t'

indentOf s = case s of
    ' '  -> 1
    '\t' -> 4

-- return the indentation level (number of spaces on the beginning of the line)
pIndent = (+) <$> (indentOf <$> (pSpace <|> pTab)) <*> pIndent `opt` 0

-- returns tuple of (indentation level, result of parsing the second argument)
pIndentLine p = (,) <$> pIndent <*> p <* pEOL

-- SHOULD collect all lines below witch indentations greater or equal i
myParse p i = do
    (lind, expr) <- pIndentLine p
    if lind < i
        then pFail
        else do
            rest <- myParse p i `opt` []
            return $ expr:rest

-- sample inputs
s1 = " a\
   \\n a\
   \\n"

s2 = " a\
   \\na\
   \\n"

-- execution
pProgram = myParse (pSym 'a') 1 

parse p s = UU.parse ( (,) <$> p <*> pEnd) (createStr (LineColPos 0 0 0) s)

main :: IO ()
main = do 
    print $ parse pProgram s1
    print $ parse pProgram s2
    return ()

Which gives following output:

("aa",[])
Test.hs: no correcting alternative found

The result for s1 is correct. The result for s2 should consume first "a" and stop consuming. Where this error comes from?

Sac answered 14/8, 2013 at 16:45 Comment(0)
E
1

The parsers which you are constructing will always try to proceed; if necessary input will be discarded or added. However pFail is a dead-end. It acts as a unit element for <|>.

In you parser there is however no other alternative present in case the input does not comply to the language recognised by the parser. In you specification you say you want the parser to fail on input s2. Now it fails with a message saying that is fails, and you are surprised.

Maybe you do not want it to fail, but you want to stop accepting further input? In that case replace pFail by return [].

Note that the text:

do
    rest <- myParse p i `opt` []
    return $ expr:rest

can be replaced by (expr:) <$> (myParse p i `opt` [])

A natural way to solve your problem is probably something like

pIndented p = do i <- pGetIndent
             (:) <$> p <* pEOL  <*> pMany (pToken (take i (repeat ' ')) *> p <* pEOL)

pIndent = length <$> pMany (pSym ' ')
Enshroud answered 15/8, 2013 at 8:13 Comment(4)
Thank you, but it does not yet completely solve my problem (I've updated the code in question) - what if the indentations could be spaces or tabs, where tabs are 4 spaces?Sac
additional return [] does not work as we want - if we replace pFail with return [] the second "a" will be consumed by the parser (it will be consumed and [] will be returned) - I do not want the second "a" in the example s2 to be consumed.Sac
If you want to properly handle tabs (and not replace tabs by just four spaces) you will have to program your own small finite state machine which "knows" what a tab stands for.Enshroud
Could you please tell me more about how such state machine could be used with uu-parsinglib? Additional, you have told, that "pFail acts as an unit element for <|>" - so (maybe I'm wrong) it still should work: please notice - it is used in the expression myParse p i `opt` [] - so if myParse fails, then opt should return [], shouldnt it?Sac

© 2022 - 2024 — McMap. All rights reserved.