I found that NLKT in python does it via *raw_parse* function but I need to use Java. I found cleartk has a MaltParser wrapper but there is no documentation about it. I'm looking for a function or a project that first converts raw English text to conll file that MaltParser can use and parses it with MaltParser. Any help is appreciated.
There are examples coming with the MaltParser 1.7.2 distribution in the folder examples/apiexamples/srcex.
However, these examples only show how to run the MaltParser programmatically after tokenization and pos-tagging have already been performed (and after the output of these steps has been converted to a CONLL-like format).
Since I currently cannot offer a better (simpler/shorter) alternative, at least I could share with you a link to a Groovy script which performs tokenization, part-of-speech tagging (using OpenNLP) and dependency parsing (using MaltParser). The tools are made interoperable using UIMA. If one is familiar with Maven, it should be quite straight forward to derive a Java version of that script.
Mind, this is not the best answer, but at this point possibly better than nothing.
Note: I'm a developer on both, Apache UIMA and DKPro Core (the project to which the link points).
© 2022 - 2024 — McMap. All rights reserved.