I'm trying to parse verbs in a corpus and list them in dictionaries and count how many times each verb appears as a transitive, intransitive and ditransitive. I was wondering how I could use spacy to parse through the verbs and notate them as transitive, intransitive and ditransitive.
How To Parse Verbs Using Spacy
Asked Answered
Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you. –
Flap
Here, I summarize the code from Mirith/Verb-categorizer
. Basically, you can loop through VERB
token and look at their children to classify them as transitive, intransitive or ditransitive. An example is as follows.
First, import spacy
,
import spacy
nlp = spacy.load('en')
Suppose you have an example of tokens,
tokens = nlp('I like this dog. It is pretty good. I saw a bird. We arrived at the classroom door with only seven seconds to spare.')
You can create following function to transform VERB
into new type as you want:
def check_verb(token):
"""Check verb type given spacy token"""
if token.pos_ == 'VERB':
indirect_object = False
direct_object = False
for item in token.children:
if(item.dep_ == "iobj" or item.dep_ == "pobj"):
indirect_object = True
if (item.dep_ == "dobj" or item.dep_ == "dative"):
direct_object = True
if indirect_object and direct_object:
return 'DITRANVERB'
elif direct_object and not indirect_object:
return 'TRANVERB'
elif not direct_object and not indirect_object:
return 'INTRANVERB'
else:
return 'VERB'
else:
return token.pos_
Example
[check_verb(t) for t in tokens] # ['PRON', 'TRAN', 'DET', 'NOUN', 'PUNCT', ...]
In the context of the question, this answer is totally right. But a warning to people just glancing at this code: you need a corpus to answer the question "is verb V transitive, intransitive, or ditransitive?" because just observing some number of times that V is used e.g. intransitively doesn't mean it can't also be used transitively –
Hertfordshire
© 2022 - 2024 — McMap. All rights reserved.