software to extract word functions like subject, predicate, object etc

Asked 18/5, 2015 at 17:10 Answered 11/6, 2020 at 23:57

I need to extract relations of the words in a sentence. I'm mostly interested in identifying a subject, predicate and an object. For example, for the follwoing sentence:

She gave him a pen

I'd like to have:

She_subject gave_predicate him a pen_object.

Is Stanford NLP can do that? I've tried their relation annotator but it didn't seem to work as I expected? Maybe there's other software that can produce this result?

Origen answered 18/5, 2015 at 17:10 Comment(0)

According to http://nlp.stanford.edu/software/lex-parser.shtml, Stanford NLP does have a parser which can identify the subject and predicate of a sentence. You can try it out online http://nlp.stanford.edu:8080/parser/index.jsp. You can use the typed dependencies to identify the subject, predicate, and object.

From the example page, the sentence My dog also likes eating sausage will give you this parse:

(ROOT
  (S
    (NP (PRP$ My) (NN dog))
    (ADVP (RB also))
    (VP (VBZ likes)
      (S
        (VP (VBG eating)
          (NP (NN sausage)))))
    (. .)))

The parser can also generate dependencies:

poss(dog-2, My-1)
nsubj(likes-4, dog-2)
advmod(likes-4, also-3)
root(ROOT-0, likes-4)
xcomp(likes-4, eating-5)
dobj(eating-5, sausage-6)

The dependency nsubj shows the main predicate and the subject—in this case, likes and dog. The numbers give the position of the word in the sentence (one-indexed, for some reason). The dobj dependency shows the relation of the predicate and object. The xcomp dependency gives internal information about the predicate.

This also works when the predicate is not a verb: My dog is large and in charge gives:

poss(dog-2, My-1)
nsubj(large-4, dog-2)
cop(large-4, is-3)
root(ROOT-0, large-4)
cc(large-4, and-5)
conj(large-4, in-6)
pobj(in-6, charge-7)

This tells us that large is the main predicate (nsubj(large-4, dog-2)), but there was a copula (cop(large-4, is-3)), as well as a conjunction and a preposition with an object.

I'm not familiar with the API, so I can't give exact code. Perhaps someone else who knows the API can do that. The parser is documented at the Stanford NLP doc site. You might also find the answer to Tools for text simplification (Java) helpful. There's more information about the dependency format in The Stanford Dependency Manual.

Statant answered 18/5, 2015 at 17:38 Comment(7)

No, I'm not looking for part of speech tagging. I already have POS tagging with Stanford NLP. But POS tagging marks verbs, nouns, adjectives etc, but doesn't distinguish, for example, which are subject and which are object. – Origen 18/5, 2015 at 17:40

@Maximus That'll teach me to write answers before I wake up. Edited. – Statant 18/5, 2015 at 17:53

Thanks, I've tried the link, but I can't seem to find the predicate or subject. Can you please paste code into your question and pinpoint where it shows me the predicate? – Origen 18/5, 2015 at 19:20

@Maximus I added some commentary. I hope it's helpful. Unfortunately I'm not familiar with the Stanford NLP API, so this is about as much help as I can give; if it doesn't answer your question, hopefully someone more knowledgeable than me will post a better answer. – Statant 18/5, 2015 at 20:0

Thanks a lot, I'll try to understand the syntax. Best! – Origen 19/5, 2015 at 7:28

But what happens if the subject is a phrase, like "The boy and girl love cakes"? From the example it looks like the relationship is only between two words, and not between two subtrees (of the parsed tree) – Arrowy 13/4, 2018 at 9:23

@NadavB I don't know, but I can't imagine it would be useful if it couldn't handle conjunctions. The documentation I linked has some quick getting-started examples so you could always modify one of those to try it out yourself. – Statant 13/4, 2018 at 18:18

Stanford parser can do it :) You need to look at the dependency parser though. Have a look at the bottom of this page: http://nlp.stanford.edu/software/lex-parser.shtml:

 subject: nsubj(snapped, rain), 
 or direct object: dobj(shut, hub))
 ...

Or have a look at this page (Stanford Dependencies): http://nlp.stanford.edu/software/stanford-dependencies.shtml

And to understand the annotations have a look at this: http://nlp.stanford.edu/software/dependencies_manual.pdf

Dockage answered 19/5, 2015 at 2:2 Comment(0)

I prefer use spaCy for this case, the visualization using spaCy displacy is below :

and you can easily access in their official website :

Website for Displacy Demo

where you can figure it out that subject word will have dependency of "nsubj" or "normal subject" and the predicate is the word with the dependency is "root" that means no dependency to other words.

Orian answered 11/6, 2020 at 23:57 Comment(0)

Recommended topics

Hot tags