Parse raw text with MaltParser in Java
Asked Answered
F

1

3

I found that NLKT in python does it via *raw_parse* function but I need to use Java. I found cleartk has a MaltParser wrapper but there is no documentation about it. I'm looking for a function or a project that first converts raw English text to conll file that MaltParser can use and parses it with MaltParser. Any help is appreciated.

Folkrock answered 30/6, 2013 at 17:6 Comment(0)
M
0

There are examples coming with the MaltParser 1.7.2 distribution in the folder examples/apiexamples/srcex.

However, these examples only show how to run the MaltParser programmatically after tokenization and pos-tagging have already been performed (and after the output of these steps has been converted to a CONLL-like format).

Since I currently cannot offer a better (simpler/shorter) alternative, at least I could share with you a link to a Groovy script which performs tokenization, part-of-speech tagging (using OpenNLP) and dependency parsing (using MaltParser). The tools are made interoperable using UIMA. If one is familiar with Maven, it should be quite straight forward to derive a Java version of that script.

Mind, this is not the best answer, but at this point possibly better than nothing.

Note: I'm a developer on both, Apache UIMA and DKPro Core (the project to which the link points).

Mental answered 24/7, 2013 at 19:8 Comment(2)
I believe non of those parse raw text. They all take in conll formatted input.Harquebusier
What should I say, you're right... Stupid me... in order to run MaltParser on raw text, one would require a tokenizer and a part-of-speech tagger.Mental

© 2022 - 2024 — McMap. All rights reserved.