NLP to find relationship between entities
Asked Answered
D

5

9

My current understanding is that it's possible to extract entities from a text document using toolkits such as OpenNLP, Stanford NLP.

However, is there a way to find relationships between these entities?

For example consider the following text :

"As some of you may know, I spent last week at CERN, the European high-energy physics laboratory where the famous Higgs boson was discovered last July. Every time I go to CERN I feel a deep sense of reverence. Apart from quick visits over the years, I was there for three months in the late 1990s as a visiting scientist, doing work on early Universe physics, trying to figure out how to connect the Universe we see today with what may have happened in its infancy."

Entities: I (author), CERN, Higgs boson

Relationships : - I "visited" CERN - CERN "discovered" Higgs boson

Thanks.

Determine answered 6/3, 2013 at 23:27 Comment(5)
You should look at entity linking, anaphora resolutionCrippling
@2er0 - any good link or starting point for this ?Determine
check this conference out nist.gov/tac/2013/KBP/EntityLinking/index.htmlCrippling
Before any entity linking you have to check out Name Entity Recognition nltk.googlecode.com/svn/trunk/doc/book/ch07.html to know that something is an entity. Then linking them would allow you to know which entity are the same. Then you need to find something else that does relationship linking. Possibly slot-filling.Crippling
your task is rather big, so break it down into: NER then entity linking and then slot-filling. Actually the whole set of task could be a full knowledge base population task. =) Google more, i'm not an expert in this but i did some work previously.Crippling
A
3

You can extract verbs with their dependants using Stanford Parser, for example. E.g., you might get "dependency chains" like

"I :: spent :: at :: CERN". 

It is a much tougher task to recognise that "I spent at CERN" and "I visited CERN" and "CERN hosted my visit" (etc) denote the same kind of event. Going into how this can be done is beyond the scope of an SO question, but you can read up literature of paraphrases recognition (here is one overview paper). There is also a related question on SO.

Once you can cluster similar chains, you'd need to find a way to label them. You could simply choose the verb of the most common chain in a cluster.

If, however, you have a pre-defined set of relation types you want to extract and lots of texts manually annotated for these relations, then the approach could be very different, e.g., using machine learning to learn how to recognize a relation type based on annotated data.

Ahner answered 7/3, 2013 at 11:16 Comment(0)
B
7

Yes absolutely. This is called Relation Extraction. Stanford has developed several useful tools for working on this problem.

Here is there website: http://deepdive.stanford.edu/relation_extraction Here is the github repository: https://github.com/philipperemy/Stanford-OpenIE-Python

In general here is how the process works.

results = entract_entity_relations("Barack Obama was born in Hawaii.")
print(results)
# [['Barack Obama','was born in', 'Hawaii']]

Of some importance is that only triples are extracted of the form (subject,predicate,object).

Brewage answered 28/10, 2018 at 20:58 Comment(0)
A
3

You can extract verbs with their dependants using Stanford Parser, for example. E.g., you might get "dependency chains" like

"I :: spent :: at :: CERN". 

It is a much tougher task to recognise that "I spent at CERN" and "I visited CERN" and "CERN hosted my visit" (etc) denote the same kind of event. Going into how this can be done is beyond the scope of an SO question, but you can read up literature of paraphrases recognition (here is one overview paper). There is also a related question on SO.

Once you can cluster similar chains, you'd need to find a way to label them. You could simply choose the verb of the most common chain in a cluster.

If, however, you have a pre-defined set of relation types you want to extract and lots of texts manually annotated for these relations, then the approach could be very different, e.g., using machine learning to learn how to recognize a relation type based on annotated data.

Ahner answered 7/3, 2013 at 11:16 Comment(0)
A
3

Don't know if you're still interested but CoreNLP added a new annotator called OpenIE (Open Information Extraction), which should accomplish what you're looking for. Check it out: OpenIE

Algar answered 24/3, 2017 at 4:44 Comment(0)
B
1

Similar to the Stanford parser, you can also use the Google Language API, where you send a string and get a dependency tree response.

You can test this API first to see if it works well with your corpus: https://cloud.google.com/natural-language/

The outcome here is a subject predicate object (SPO) triplet, where your predicate describes the relationship. You'll need to traverse the dependency graph and write a script to parse out the triplet.

Bice answered 31/1, 2018 at 19:41 Comment(0)
S
0

There are many ways to do relation extraction. As colleagues mentioned that you have to know about NER and coreference resolution. Different techniques require different approaches. Nowadays, Distant Supervision is most common, and for detecting the relation between entities, they used FREEBASE.

Scholem answered 26/2, 2019 at 16:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.