Extracting (subject,predicate,object) from dependency tree [closed]
Asked Answered
T

1

11

I'm interested in extracting triples (subject,predicate,object) from questions.

For example, I would like to transform the following question :

Who is the wife of the president of the USA?

to :

(x,isWifeOf,y) ∧ (y,isPresidentof,USA)

x and y are unknows that we have to find in order to answer the question (/\ denotes the conjunction).

I have read a lot of papers about this topic and I would like to perform this task using existing parsers such as Stanford parser. I know that parsers output 2 types of data :

  • parse structure tree (constituency relations)
  • dependency tree (dependency relations)

Some papers try to build triples from the parse structure tree (e.g., Triple Extraction from Sentences), however this approach seems to be too weak to deal with complicated questions.

On the other hand, dependency trees contain a lot of relevant information to perform the triple extraction. A lot of papers claim to do that, however I didn't find any of them that gives explicitely a detailed procedure or an algorithm. Most of the time, authors say they analyze the dependencies to produce triples according to some rules they didn't give.

Does anyone know any paper with more information on extracting (subject,predicate,object) from dependency tree of a question?

Tynishatynwald answered 13/10, 2014 at 16:45 Comment(7)
This is certainly an interesting question, but it's not really on topic here at Stack Overflow. It's too broad, for one (as there are potentially lots and lots of answers), and requests for off-site resources, books, etc., are specifically off-topic: (from the close reasons): "Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it."Cataclysm
All that said, how would you (programmatically) determine that "(x,isWifeOf,y) ∧ (y,isPresidentof,USA)" is the desired triplification? How do you determine which things should be constants and predicates? Why not (x,isFirstLadyOf,USA)? This will be especially important when you start handling genuinely n-ary relations.Cataclysm
There is not a canonical form for triples (a lot of combinations are possibles). In the above example, "(x,isWifeOf,y) ∧ (y,isPresidentof,USA)" is better than "(x,isFirstLadyOf,USA)" because a database (such as DBpedia) is more likely to contains entries for isPresidentof or isWifeOf than isFirstLadyOf. In a first time, all corrects triples would be good...Tynishatynwald
Yes, I agree; my point was that your question doesn't have the technical specifications for what triples you're trying to extract from the given text. That's part of the reason that it's too broad and open ended for Stack Overflow (but quite possibly a question for a forum built for discussion).Cataclysm
@Tynishatynwald Hi, mate, have you found a proper strategy to extract the triplet? Did you end up using typed dependencies or parsed tree or maybe both?Alphard
You mentioned using either the parse tree OR the dependency parse, in my research I found that it's often useful to use BOTH. Here I describe my approach: ieeexplore.ieee.org/document/7489041/?tp=&arnumber=7489041Thermit
Update: although these are not new ideas, the mentioned task is attempting to be solved by a method named "Semantic Role Labeling" SRL. It is a NLP task that aims to "label" the semantic role of each entity retrieved from text. It attempts to find "agents", "goals", "results", etc. This could be used to identify roles and fill triplets. Keep in mind this is still an open field of research that is being aimed by some projects such as FrameNet. For more information, read: en.wikipedia.org/wiki/Semantic_role_labeling and web.stanford.edu/~jurafsky/slp3/18.pdfElectrophotography
S
1

Textacy has a decent implementation of triple extraction. It's built on top of SpaCy, a popular NLP library in Python. You seem to be specifically interested into the underlying algorithm for triple extraction, so maybe looking into the source code of their algorithm could give you some inspiration. See here: https://textacy.readthedocs.io/en/stable/_modules/textacy/extract.html#subject_verb_object_triples

Simeon answered 30/10, 2020 at 21:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.