How to express branch in Rebol PARSE dialect?

Asked 24/5, 2015 at 11:33 Answered 25/5, 2015 at 6:59

I have a mysql schema like below:

data: {
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `name` varchar(10) DEFAULT '' COMMENT 'the name',
    `content` text COMMENT 'something',
}

now I want to extract some info from it： the filed name, type and comment if any. See below:

["id" "int" "" "name" "varchar" "the name" "content" "text" "something" ]

My code is:

parse data [
    any [ 
        thru {`} copy field to {`} {`}
        thru some space copy field-type to [ {(} | space]
        (comm: "")
        opt [ thru {COMMENT} thru some space thru {'} copy comm to {'}]
        (repend temp field repend temp field-type either comm [ repend temp comm ][ repend temp ""])
    ]
]

but I get something like this:

["id" "int" "the name" "content" "text" "something"]

I know the line opt .. is not right.

I want express if found COMMENT key word first, then extract the comment info; if found lf first, then continue the next loop. But I don't know how to express it. Any one can help?

Reagent answered 24/5, 2015 at 11:33 Comment(0)

I much favour (where possible) building up a set of grammar rules with positive terms to match target input—I find it's more literate, precise, flexible and easier to debug. In your snippet above, we can identify five core components:

space: use [space][
    space: charset "^-^/ "
    [some space]
]

word: use [letter][
    letter: charset [#"a" - #"z" #"A" - #"Z" "_"]
    [some letter]
]

id: use [letter][
    letter: complement charset "`"
    [some letter]
]

number: use [digit][
    digit: charset "0123456789"
    [some digit]
]

string: use [char][
    char: complement charset "'"
    [any [some char | "''"]]
]

With terms defined, writing a rule that describes the grammar of the input is relatively trivial:

result: collect [
    parsed?: parse/all data [ ; parse/all for Rebol 2 compatibility
        opt space
        some [
            (field: type: none comment: copy "")
            "`" copy field id "`"
            space 
            copy type word opt ["(" number ")"]
            any [
                space [
                    "COMMENT" space "'" copy comment string "'"
                    | word | "'" string "'" | number
                ]
            ]
            opt space "," (keep reduce [field type comment])
            opt space
        ]
    ]
]

As an added bonus, we can validate the input.

if parsed? [new-line/all/skip result true 3]

One wee application of new-line to smarten things up a little should yield:

== [
    "id" "int" "" 
    "name" "varchar" "the name" 
    "content" "text" "something"
]

Blindage answered 25/5, 2015 at 6:59 Comment(0)

I think this is closer to what you are after.

data: {
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `name` varchar(10) DEFAULT '' COMMENT 'the name',
    `content` text COMMENT 'something',
}
temp: []
parse data [
  any [ 
    thru {`} copy field to {`} {`}
    some space copy field-type to [ {(} | space]
    (comm: copy "")
    opt [ thru {COMMENT} some space thru {'} copy comm to {'}]
    (repend temp field repend temp field-type either comm [ repend temp comm ][ repend temp ""])
  ]
]
probe temp

To break down the differences.

Set up a word with an empty block for temp
Changed thru some space to just some space as this will move forward through the series in the same way. Note that the following is false
```
parse "   " [ thru some space ]
```
Changed comm: "" to comm: copy "" to make sure you get a new string each time you extract the comment (does not seem to affect the output, but is good practice)
Changed {COMMENT} thru some space to {COMMENT} some space as per comment 2.
Just added a probe on the end for debugging

As a note, you can use ?? (almost) anywhere in a parse rule to help with debugging which will show you your current position.

Kept answered 24/5, 2015 at 12:13 Comment(4)

Thanks! After I run your code, I got something like this: ["id" "int" "the name" "name" "varchar" "the name" "content" "text" "something"]. It seems that in the first loop, it runs the first alternative but failed, and then switch to the next. After I add some debug info like [thru {COMMENT} some space thru {'} copy comm to {'} {,} (print 1) | thru {,} (print 2)] , only 2 was output. Why this? – Reagent 24/5, 2015 at 13:25

BTW, there is a typo: copy comm to {'} {',} should be copy comm to {'} {,} but I can't correct it as the edit must be at least 6 characters. – Reagent 24/5, 2015 at 13:34

Hi Wayne, I just updated it to be close to your original (ie less changes to show how close you were to having it correct). Hope that helps -John – Kept 24/5, 2015 at 14:16

Sorry it is not correct yet. I'll have a look at this again tomorrow. – Kept 24/5, 2015 at 14:41

parse/all for string parsing

data: {
    `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `name` varchar(10) DEFAULT '' COMMENT 'the name',
    `content` text COMMENT 'something',
}
nodata:   charset { ()'}
dat: complement nodata

collect [   
    parse/all data [
        some [
            thru {`} copy field to {`} (keep field) skip 
            some " " copy type some dat ( keep type   comm:  copy "" )  
            copy rest thru "," (
                parse/all rest [
                    some [
                        [","   (keep comm) ]  
                     |  ["COMMENT"   some nodata copy comm to "'"  ]
                     |  skip                        
                    ]
                ]
            )
        ]
    ]
]
== ["id" "int" "" "name" "varchar" "the name" "content" "text" "something"]

another (better) solution with pure parse

collect [   
    probe parse/all data [
        some [
            thru {`} copy field to {`} (keep field) skip 
            some " " copy type some dat ( keep type   comm:  ""  further: [])  
            some [ 
            ","   (keep comm  further:  [ to end  skip]) 
            |  ["COMMENT"   some nodata copy comm to "'"  ]
            |  skip  further                     
            ]
        ]
    ]
]

Interpretative answered 25/5, 2015 at 0:40 Comment(1)

I added a solution with pure parsing – Interpretative 25/5, 2015 at 8:26

I figure out an alternative way to get the data as block! but not string!.

data: read/lines data.txt
probe data
temp: copy []

foreach d data [
    parse d [ 
        thru {`} copy field to {`} {`}
        thru some space copy field-type to [ {(} | space]
        (comm: "")
        opt [ thru {COMMENT} thru some space thru {'} copy comm to {'}]
        (repend temp field repend temp field-type either comm [ repend temp comm ][ repend temp ""])
    ]
]

probe temp

Reagent answered 24/5, 2015 at 12:1 Comment(0)

Recommended topics

Hot tags