What is the NLTK FCFG grammar standard/specification?
Asked Answered
S

3

2

NLTK (Natural Language Toolkit) lets you parse a FCFG grammar using nltk.FCFG.fromstring([grammar string here]). Where is the FCFG grammar format specification*? I googled it to death, but all I could find was this.

*i.e. grammar language specification

Scrubber answered 12/3, 2016 at 21:10 Comment(3)
Use the source, Luke.Stagestruck
As the file shows: one NT (NonTerminal) expanded per line, right-hand-sides separated by pipes, terminals in quotes, probabilities for rhs adding up to 1.Mccorkle
Did the question change recently?Valley
V
1

From the demo:

>>> from nltk import CFG
>>> grammar = CFG.fromstring("""
... S -> NP VP
... PP -> P NP
... NP -> Det N | NP PP
... VP -> V NP | VP PP
... Det -> 'a' | 'the'
... N -> 'dog' | 'cat'
... V -> 'chased' | 'sat'
... P -> 'on' | 'in'
... """)

The grammar for writing the grammar from string should work as such:

  • Each line is a rule that makes up of a the left-hand-side (LHS) and right-hand-side (RHS), where
  • Only one non-terminal can be on the LHS of the arrow ->
  • RHS can be made up of a combinations of one or more non-terminals and/or terminals.
  • Terminals strings needs to be enclosed between quotation marks
  • Non-terminal symbols on the RHS are to be separated by spaces.
  • Each non-terminal results (LHS) can be made up of one or more RHS combinations and each combination is delimited by the pip symbol |
  • It is CFG's convention to use capitalized letters for non-terminals but it's not necessary.

Also, see https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols and https://en.wikipedia.org/wiki/Context-free_grammar

Valley answered 12/3, 2016 at 23:54 Comment(0)
C
3

The question was asking for FCFG (Feature Grammars) not plain CFG.

I think you can just add square brackets to the nonterminals and have a feature name, an equal sign and a value in the brackets. The value can either be a variable (starting with a question mark), a terminal symbol (for simple values) or a new feature structure. I found this example on the internet (http://www.nltk.org/howto/featgram.html) and it is working at my laptop.

from nltk import grammar, parse

g = """
% start DP
DP[AGR=?a] -> D[AGR=?a] N[AGR=?a]
D[AGR=[NUM='sg', PERS=3]] -> 'this' | 'that'
D[AGR=[NUM='pl', PERS=3]] -> 'these' | 'those'
D[AGR=[NUM='pl', PERS=1]] -> 'we'
D[AGR=[PERS=2]] -> 'you'
N[AGR=[NUM='sg', GND='m']] -> 'boy'
N[AGR=[NUM='pl', GND='m']] -> 'boys'
N[AGR=[NUM='sg', GND='f']] -> 'girl'
N[AGR=[NUM='pl', GND='f']] -> 'girls'
N[AGR=[NUM='sg']] -> 'student'
N[AGR=[NUM='pl']] -> 'students'
"""

grammar = grammar.FeatureGrammar.fromstring(g)
tokens = 'these girls'.split()
parser = parse.FeatureEarleyChartParser(grammar)
trees = parser.parse(tokens)
for tree in trees: 
    tree.draw()
    print(tree)

It seems that it doesn't matter whether the feature terminal symbols are quoted or not.

Cryotherapy answered 26/11, 2018 at 13:48 Comment(0)
V
1

From the demo:

>>> from nltk import CFG
>>> grammar = CFG.fromstring("""
... S -> NP VP
... PP -> P NP
... NP -> Det N | NP PP
... VP -> V NP | VP PP
... Det -> 'a' | 'the'
... N -> 'dog' | 'cat'
... V -> 'chased' | 'sat'
... P -> 'on' | 'in'
... """)

The grammar for writing the grammar from string should work as such:

  • Each line is a rule that makes up of a the left-hand-side (LHS) and right-hand-side (RHS), where
  • Only one non-terminal can be on the LHS of the arrow ->
  • RHS can be made up of a combinations of one or more non-terminals and/or terminals.
  • Terminals strings needs to be enclosed between quotation marks
  • Non-terminal symbols on the RHS are to be separated by spaces.
  • Each non-terminal results (LHS) can be made up of one or more RHS combinations and each combination is delimited by the pip symbol |
  • It is CFG's convention to use capitalized letters for non-terminals but it's not necessary.

Also, see https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols and https://en.wikipedia.org/wiki/Context-free_grammar

Valley answered 12/3, 2016 at 23:54 Comment(0)
S
0

Wartena is right: the question is indeed FCFG: Feature based Context-Free Gramars. Check this https://nltk.org/book/ch09.html Here is some light into his FCFG:

  • NUM would correspond to singular/plurial
  • GND is gender male/female/other: languages like french or german attribute Gender to objects and verbs
  • PERS is person (first person I/we, second person you, third person he/she/it/they)
  • The exclaimation mark represent a variable (like in Prolog)
  • AGR = agreement features = the set of features NUM, PERS, GND, TENSE=Tense
Sensor answered 3/8, 2020 at 8:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.