Disease named entity recognition
Asked Answered
P

5

6

I have a bunch of text documents that describe diseases. Those documents are in most cases quite short and often only contain a single sentence. An example is given here:

Primary pulmonary hypertension is a progressive disease in which widespread occlusion of the smallest pulmonary arteries leads to increased pulmonary vascular resistance, and subsequently right ventricular failure.

What I need is a tool that finds all disease terms (e.g. "pulmonary hypertension" in this case) in the sentences and maps them to a controlled vocabulary like MeSH.

Thanks in advance for your answers!

Pampa answered 25/9, 2012 at 8:15 Comment(2)
That sounds very specific and not a programming problem per se. At least not as expressed here.Paulettapaulette
Seems this is more of a data mining question?Volcano
R
6

Here are two pipelines that are specifically designed for medical document parsing:

Both use UMLS, the unified medical language system, and thus require that you have a (free) license. Both are Java and more or less easy to set up.

Radiate answered 14/5, 2013 at 3:8 Comment(1)
I'm not sure I'd classify them as "easy to set up" but they do work rather well. A new version of MetaMap was released late last year as well.Atrophied
V
2

See http://www.ebi.ac.uk/webservices/whatizit/info.jsf

Whatizit is a text processing system that allows you to do textmining tasks on text. The tasks come defined by the pipelines in the drop down list of the above window and the text can be pasted in the text area.

You could also ask biostars: http://www.biostars.org/show/questions/

Virginavirginal answered 25/9, 2012 at 14:56 Comment(0)
N
2

there are many tools to do that. some popular ones:

most of them come with some predefined models, i.e. they've already been trained on some general datasets (news articles, etc.). however, your texts are pretty specific, so you might want to first constitute a corpus and re-train one of those tools, in order to adjust it to your data.

more simply, as a first test, you can try a dictionary-based approach: design a list of entity names, and perform some exact or approximate matching. for instance, this operation is decribed in LingPipe's tutorial.

Nesselrode answered 4/5, 2013 at 20:34 Comment(0)
K
0

Open Targets has a module for this as part of LINK. It's not meant to be used directly so it might require some hacking and tinkering, but it's the most complete medical NER (named entity recognition) tool I've found for python. For more info, read their blog post.

Kielce answered 6/4, 2018 at 8:37 Comment(0)
P
0

a bash script that has as example a lexicon generated from the disease ontology: https://github.com/lasigeBioTM/MER

Pastorate answered 28/4, 2018 at 16:25 Comment(1)
Links are fantastic, but they should never be the only piece of information in your answer. meta.stackexchange.com/questions/8231/…Lam

© 2022 - 2024 — McMap. All rights reserved.