unsupervised Named entity recognition (NER) with custom controlled vocabulary for crosslink-suggestions in Java
Asked Answered
M

2

6

I'm looking for a Java library that can do Named entity recognition (NER) with a custom controlled vocabulary, without needing labeled training data first. I searched some on SE, but most questions are rather unspecific.

Consider the following use-case:

  • an editor is inputting articles in a CMS (about 500 words).
  • the text may contain references (in plain text) to entities of a specific domain. e.g:
    • names of points of interest, like bars, restaurants, as well as neighborhoods, etc.
  • a controlled vocabulary of these entities exist (about 5.000 entities) .
    • I imagine an entity to be a -tuple in the vocabulary
  • after finishing the text, the user should be able to save the document.
  • This triggers the workflow to scan the piece of text against the vocabulary, by comparing against the name of the entity. It's not required to have a 100% match: 97% on Jarao-winkler or whatever (I'm not familiar with what algo's NER uses) may be enough, I need this to be configurable.
  • Hits are returned to the controller server-side. This in return returns JSON to the client containing of the entities, which are represented as suggested crosslinks to the editor.

Ideally, I'm looking for a project that uses NRE to suggests crosslinks within a CMS-environment to piggyback on. (I'm sure plugins for wordpress exist for example) not so sure if something similar exists in Java.

All other more general pointers to NRE-libraries which work with controlled custom vocabularies are welcome as well.

Mozzarella answered 5/10, 2011 at 15:2 Comment(0)
M
3

For people looking this up in the future:

"Approximate Dictionary-Based Chunking" see: http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html

(URL edited.)

Mozzarella answered 6/12, 2011 at 22:22 Comment(0)
S
1

Unsure if these might be helpful: http://www-nlp.stanford.edu/software/CRF-NER.shtml http://cogcomp.cs.illinois.edu/page/software

Shay answered 3/12, 2011 at 9:56 Comment(1)
As far as I can tell, that only will do names of famous/well known people. "George Washington" shows up as a person, but my name did not.Spider

© 2022 - 2024 — McMap. All rights reserved.