Error message on match fail in Rebol Parse

Asked 4/7, 2013 at 19:36 Answered 4/7, 2013 at 23:48

PEG-based parser generators usually provide limited error reporting on invalid inputs. From what I read, the parse dialect of rebol is inspired by PEG grammars extended with regular expressions.

For example, typing the following in JavaScript:

d8> function () {}

gives the following error, because no identifier was provided in declaring a global function:

(d8):1: SyntaxError: Unexpected token (
function () {}
         ^

The parser is able to pinpoint exactly the position during parsing where an expected token is missing. The character position of the expected token is used to position the arrow in the error message.

Does the parse dialect in rebol provides built-in facilities to report the line and column errors on invalid inputs?

Otherwise, are there examples out there of custom rolled out parse rules that provide such error reporting?

Saltine answered 4/7, 2013 at 19:36 Comment(2)

You ask "line and column of invalid token/rule". Are you asking about how to tell when there's a problem with the dialected block of rules you pass in, or for tools those rules can use to report on problems in the input to the parse process itself? Editing this question to add an idealized example of what you're looking for could be helpful. – Monotint 4/7, 2013 at 19:54

@HostileFork I am asking for the second case, when the input is invalid. – Saltine 4/7, 2013 at 20:17

I've done very advanced Rebol parsers which manage live and mission-critical TCP servers, and doing proper error reporting was a requirement. So this is important!

Probably one of the most unique aspects of Rebol's PARSE is that you can include direct evaluation within the rules. So you can set variables to track the parse position, or the error messages, etc. (It's very easy because the nature of Rebol is that mixing code and data as the same thing is a core idea.)

So here's the way I did it. Before each match rule is attempted, I save the parse position into "here" (by writing here:) and then also save an error into a variable using code execution (by putting (error: {some error string}) in parentheses so that the parse dialect runs it). If the match rule succeeds, we don't need to use the error or position...and we just go on to the next rule. But if it fails we will have the last state we set to report after the failure.

Thus the pattern in the parse dialect is simply:

; use PARSE dialect handling of "set-word!" instances to save parse
; position into variable named "here"

here:

; escape out of the parse dialect using parentheses, and into the DO 
; dialect to run arbitrary code.  Here we run code that saves an error
; message string into a variable named "error"

(error: "<some error message relating to rule that follows>")

; back into the PARSE dialect again, express whatever your rule is,
; and if it fails then we will have the above to use in error reporting

what: (ever your) [rule | {is}]

That's basically what you need to do. Here is an example for phone numbers:

digit: charset "012345689"

phone-number-rule: [
    here:
    (error: "invalid area code")
    ["514" | "800" | "888" | "916" "877"]

    here:
    (error: "expecting dash")
    "-"

    here:
    (error: "expecting 3 digits")
    3 digit

    here:
    (error: "expecting dash")
    "-"

    here:
    (error: "expecting 4 digits")
    4 digit

    (error: none)
]

Then you can see it in action. Notice that we set error to none if we reach the end of the parse rules. PARSE will return false if there is still more input to process, so if we notice there is no error set but PARSE returns false anyway... we failed because there was too much extra input:

input: "800-22r2-3333"

if not parse input phone-number-rule [
   if none? error [
        error: "too much data for phone number"
    ]
]

either error [
    column: length? copy/part input here newline
    print rejoin ["error at position:" space column]
    print error
    print input
    print rejoin [head insert/dup "" space column "^^"}
    print newline
][
    print {all good}
]

The above will print the following:

error at position: 4

expecting 3 digits
800-22r2-3333
    ^

Obviously, you could do much more potent stuff, since whatever you put in parens will be evaluated just like normal Rebol source code. It's really flexible. I even have parsers which update progress bars while loading huge datasets... :-)

Buitenzorg answered 4/7, 2013 at 23:48 Comment(6)

Sorry for all the rewriting, but I thought it was a good enough example to explain a little more. :-) – Monotint 5/7, 2013 at 0:39

I know you come from a language for which its natural, please don't start adding double semi colons, its both ugly and unrequired. – Buitenzorg 5/7, 2013 at 2:49

I'll be honest, I was trying to keep it simple and direct. it was short and sweet on purpose, too much text and it sometimes becomes less accessible (it makes it look like parse is complex). I'll leave it as you edited... this time (pride wounded ;-) (albeit with double semi-colon removed) ;-P – Buitenzorg 5/7, 2013 at 2:57

Thanks, so basically the point is the dialect is sufficiently flexible to add manual error support. Given the metaprogramming facilities of the language it should be possible to automate the insertion of basic error handling through parsing rules. Are there documented examples? – Saltine 5/7, 2013 at 5:2

if you look around you'll find quite a few PARSE primers and one or two in-depth looks. But PARSE, as useful as it is, is still a bit mis-understood by the common Rebol programmer. It lacks in hard-core documentation. – Buitenzorg 5/7, 2013 at 11:58

because of the Code is data aspect to Rebol, building PARSE rules on the fly is a common idiom. In fact I have build two tiered compilers which built parse rules on the fly and whose output, when parsing input, where other more specific parse rules. – Buitenzorg 5/7, 2013 at 12:1

Here is a simple example of finding the position during parsing a string which could be used to do what you ask.

Let us say that our code is only valid if it contains a and b characters, anything else would be illegal input.

code-rule: [
    some [
        "a" |
        "b"
    ] 
    [ end | mark: (print [ "Failed at position" index? mark ]) ]
]

Let's check that with some valid code

>> parse "aaaabbabb" code-rule
== true

Now we can try again with some invalid input

>> parse "aaaabbXabb" code-rule
Failed at position 7
== false

This is a rather simplified example language, but it should be easy to extend to more a complex example.

Kitkitchen answered 4/7, 2013 at 23:33 Comment(2)

You are very welcome to come and chat about this and other ways of using parse in our stackoverflow chat room chat.stackoverflow.com/rooms/291/rebol-and-red – Kitkitchen 4/7, 2013 at 23:38

I actually needed 20 reputation points before being able to post there. Thanks to all the answerers for pushing me over the 20 pts mark ;-). – Saltine 5/7, 2013 at 5:4

Recommended topics

Hot tags