Existing API for NLP in C++?
Asked Answered
N

6

16

Is/are there existing C++ NLP API(s) out there? The closest thing I have found is CLucene, a port of Lucene. However, it seems a bit obsolete and the documentation is far from complete.

Ideally, this/these API(s) would permit tokenization, stemming and PoS tagging.

Nabila answered 19/3, 2014 at 13:36 Comment(3)
possible duplicate of c/c++ NLP libraryRosarosabel
@Rosarosabel this question is four years old and doesn't have an adequate answer, since it focuses on PoS taggers.Nabila
I think the OP meant "Part of Speech" for PoSSneaker
R
11

Freeling is written in C++ too, although most people just use their binaries to run the tools: http://devel.cpl.upc.edu/freeling/downloads?order=time&desc=1

Try something like DyNet, it's a generic neural net framework but most of its processes are focusing on NLP because the maintainers are creators of the NLP community.

Or perhaps Marian-NMT, it was designed for sequence-to-sequence model machine translation but potentially many NLP tasks can be structured as a sequence-to-sequence task.


Outdated

Maybe you can try Ellogon http://www.ellogon.org/ , they have GUI support and also C/C++ API for NLP too.

Rosarosabel answered 26/3, 2014 at 20:4 Comment(2)
Both seem interesting, I'll give them a look.Nabila
Both seem to be dead project. Ellogon for sure is.Fawnia
R
6

if you remove the restriction on c++ , you get the perfect NLTK (python)

the remaining effort is then interfacing between python and c++.

Restivo answered 2/4, 2014 at 6:19 Comment(2)
nah, we know NLTK a cython / c/c++ port would exponentially increase the processing time for realistic big data.Rosarosabel
NLTK is a toy and education system (and it was designed as one) not a practical solution.Fawnia
C
3

Apache Lucy would get you part of the way there. It is under active development.

Change answered 21/3, 2014 at 21:2 Comment(1)
It's a good start, but since it is a search engine, there is a lot of features I don't need. Also, NLP capability is a bit limited. I'll keep observing it, though.Nabila
E
3

Maybe you can use Weka-C++. It's the very popular Weka library for machine learning and data mining (including NLP) ported from Java to C++.

Weka supports tokenization and stemming, you'll probably need to train a classifier for PoS tagging.

I only used Weka with Java though, so I'm afraid can't give you more details on this version.

Eulogist answered 2/4, 2014 at 14:2 Comment(1)
While Weka itself is quite good, weka C++ seems to be a dead project (last commit en 2007)Nabila
O
1

There is TurboParser by André Martins at CMU, also has a Python wrapper. There is is an online demo for it.

Oldfangled answered 12/5, 2016 at 0:27 Comment(0)
A
1

This project provides free (even for commercial use) state-of-the-art information extraction tools. The current release includes tools for performing named entity extraction and binary relation detection as well as tools for training custom extractors and relation detectors.

MITIE is built on top of dlib, a high-performance machine-learning library, MITIE makes use of several state-of-the-art techniques including the use of distributional word embeddings and Structural Support Vector Machines[3]. MITIE offers several pre-trained models providing varying levels of support for both English and Spanish, trained using a variety of linguistic resources (e.g., CoNLL 2003, ACE, Wikipedia, Freebase, and Gigaword). The core MITIE software is written in C++, but bindings for several other software languages including Python, R, Java, C, and MATLAB allow a user to quickly integrate MITIE into his/her own applications.

https://github.com/mit-nlp/MITIE

Airedale answered 25/7, 2016 at 15:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.