Is there a ready-to-use English grammar that I can just load it and use in NLTK? I've searched around examples of parsing with NLTK, but it seems like that I have to manually specify grammar before parsing a sentence.
Thanks a lot!
Is there a ready-to-use English grammar that I can just load it and use in NLTK? I've searched around examples of parsing with NLTK, but it seems like that I have to manually specify grammar before parsing a sentence.
Thanks a lot!
You can take a look at pyStatParser, a simple python statistical parser that returns NLTK parse Trees. It comes with public treebanks and it generates the grammar model only the first time you instantiate a Parser object (in about 8 seconds). It uses a CKY algorithm and it parses average length sentences (like the one below) in under a second.
>>> from stat_parser import Parser
>>> parser = Parser()
>>> print parser.parse("How can the net amount of entropy of the universe be massively decreased?")
(SBARQ
(WHADVP (WRB how))
(SQ
(MD can)
(NP
(NP (DT the) (JJ net) (NN amount))
(PP
(IN of)
(NP
(NP (NNS entropy))
(PP (IN of) (NP (DT the) (NN universe))))))
(VP (VB be) (ADJP (RB massively) (VBN decreased))))
(. ?))
python example.py
with the default text hardcoded. Very easy to use and embeddable. –
Schiffman 2to3 --output-dir=stat_parser3 -W -n stat_parser
rm star_parser
mv stat_parser3 stat_parser
setup.py build
setup.py install
and it worked, thanks @Apprehensive –
Indene (SINV (NP (NP (DT the) (NNP Sun) (NNP rises)) (PP (IN from) (NP (DT the) (NNP East)))) (. .))
Shouldn't "rises" be a VP
? How do we avoid interpreting "rises" as a proper noun? –
Kilogrammeter My library, spaCy, provides a high performance dependency parser.
Installation:
pip install spacy
python -m spacy.en.download all
Usage:
from spacy.en import English
nlp = English()
doc = nlp(u'A whole document.\nNo preprocessing require. Robust to arbitrary formating.')
for sent in doc:
for token in sent:
if token.is_alpha:
print token.orth_, token.tag_, token.head.lemma_
Choi et al. (2015) found spaCy to be the fastest dependency parser available. It processes over 13,000 sentences a second, on a single thread. On the standard WSJ evaluation it scores 92.7%, over 1% more accurate than any of CoreNLP's models.
spacy.en.download all
it initiates a download that appears to be over 600 MB! –
Lucania for sent in doc.sents:
–
Regress import spacy
, then nlp = spacy.load('en')
, and then process your sentences as: doc = nlp(u'Your unprocessed document here
) –
Wiring python -m spacy download en
–
Sennar There are a few grammars in the nltk_data
distribution. In your Python interpreter, issue nltk.download()
.
nltk_data
and derive a CFG from it by simply turning tree fragments (a node and its direct subnodes) into rules. But you probably won't find a "real" grammar unless you look into statistical parsing; no-one builds non-stochastic grammars anymore since they just don't work, except for very domain-specific applications. –
Beebe There is a Library called Pattern. It is quite fast and easy to use.
>>> from pattern.en import parse
>>>
>>> s = 'The mobile web is more important than mobile apps.'
>>> s = parse(s, relations=True, lemmata=True)
>>> print s
'The/DT/B-NP/O/NP-SBJ-1/the mobile/JJ/I-NP/O/NP-SBJ-1/mobile' ...
Use the MaltParser, there you have a pretrained english-grammar, and also some other pretrained languages. And the Maltparser is a dependency parser and not some simple bottom-up, or top-down Parser.
Just download the MaltParser from http://www.maltparser.org/index.html and use the NLTK like this:
import nltk
parser = nltk.parse.malt.MaltParser()
I've tried NLTK, PyStatParser, Pattern. IMHO Pattern is best English parser introduced in above article. Because it supports pip install and There is a fancy document on the website (http://www.clips.ua.ac.be/pages/pattern-en). I couldn't find reasonable document for NLTK (And it gave me inaccurate result for me by its default. And I couldn't find how to tune it). pyStatParser is much slower than described above in my Environment. (About one minute for initialization and It took couple of seconds to parse long sentences. Maybe I didn't use it correctly).
nltk
tool here is like PyStatParser
that builds a grammar that is a PCFG
grammar i.e. Probabilistic Context-Free Grammars - cs.columbia.edu/~mcollins/courses/nlp2011/notes/pcfgs.pdf –
Schiffman Did you try POS tagging in NLTK?
text = word_tokenize("And now for something completely different")
nltk.pos_tag(text)
The answer is something like this
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),('completely', 'RB'), ('different', 'JJ')]
Got this example from here NLTK_chapter03
I'm found out that nltk working good with parser grammar developed by Stanford.
Syntax Parsing with Stanford CoreNLP and NLTK
It is very easy to start to use Stanford CoreNLP and NLTK. All you need is small preparation, after that you can parse sentences with following code:
from nltk.parse.corenlp import CoreNLPParser
parser = CoreNLPParser()
parse = next(parser.raw_parse("I put the book in the box on the table."))
Preparation:
You can use following code to run CoreNLPServer:
import os
from nltk.parse.corenlp import CoreNLPServer
# The server needs to know the location of the following files:
# - stanford-corenlp-X.X.X.jar
# - stanford-corenlp-X.X.X-models.jar
STANFORD = os.path.join("models", "stanford-corenlp-full-2018-02-27")
# Create the server
server = CoreNLPServer(
os.path.join(STANFORD, "stanford-corenlp-3.9.1.jar"),
os.path.join(STANFORD, "stanford-corenlp-3.9.1-models.jar"),
)
# Start the server in the background
server.start()
Do not forget stop server with executing server.stop()
SpaCy 2024
for token in sent:
TypeError: 'spacy.tokens.token.Token' object is not iterable
Solution:
nlp:Language = load('en_core_web_sm') # Sam Redway ==>
"You can now load the package via spacy.load('en_core_web_sm')"
doc:Doc = nlp(text)
tok:Iterable[Token] = chain(doc)
f :Callable[[Token],bool] = lambda token: token.is_alpha
tok = filter(f, tok)
return list(tok)
© 2022 - 2024 — McMap. All rights reserved.
2to3
tool to "manually" convert all the files from Python 2 to Python 3. – Siler