How to extract the verbs and all corresponding adverbs from a text?
Asked Answered
S

2

0

Using ngram in Python my aim is to find out verbs and their corresponding adverbs from an input text. What I have done:

Input text:""He is talking weirdly. A horse can run fast. A big tree is there. The sun is beautiful. The place is well decorated.They are talking weirdly. She runs fast. She is talking greatly.Jack runs slow."" Code:-

`finder2 = BigramCollocationFinder.from_words(wrd for (wrd,tags) in posTagged if tags in('VBG','RB','VBN',))
scored = finder2.score_ngrams(bigram_measures.raw_freq)
print sorted(finder2.nbest(bigram_measures.raw_freq, 5))`

From my code, I got the output: [('talking', 'greatly'), ('talking', 'weirdly'), ('weirdly', 'talking'),('runs','fast'),('runs','slow')] which is the list of verbs and their corresponding adverbs.

What I am looking for:

I want to figure out verb and all corresponding adverbs from this. For example ('talking'- 'greatly','weirdly),('runs'-'fast','slow')etc.

Snuffbox answered 27/1, 2016 at 6:10 Comment(1)
If you are looking for verbs and adverbs, why is your post title about subjects and objects? Please edit the question so you can get sensible answers.Inequitable
I
1

You already have a list of all verb-adverb bigrams, so you're just asking how to consolidate them into a dictionary that gives all adverbs for each verb. But first let's re-create your bigrams in a more direct way:

pairs = list()
for (w1, tag1), (w2, tag2) in nltk.bigrams(posTagged):
    if t1.startswith("VB") and t2 == "RB":
        pairs.append((w1, w2))

Now for your question: We'll build a dictionary with the adverbs that follow each verb. I'll store the adverbs in a set, not a list, to get rid of duplications.

from collections import defaultdict
consolidated = defaultdict(set)
for verb, adverb in pairs:
    consolidated[verb].add(adverb)

The defaultdict provides an empty set for verbs that haven't been seen before, so we don't need to check by hand.

Depending on the details of your assignment, you might also want to case-fold and lemmatize your verbs so that the adverbs from "Driving recklessly" and "I drove carefully" are recorded together:

wnl = nltk.stem.WordNetLemmatizer()
...
for verb, adverb in pairs:
    verb = wnl.lemmatize(verb.lower(), "v")
    consolidated[verb].add(adverb)
Inequitable answered 31/1, 2016 at 10:59 Comment(3)
Thank you very much for the help. It helped me a lot.Snuffbox
do you know how to represent these outputs in a matrix form? A row will be (verb, adv1, adv2,adv3) like this.Snuffbox
Just print each key, value in the dictionary as key, list(value) (since the value is a set).Inequitable
D
-1

I think you are losing information you will need for this. You need to retain the part-of-speech data somehow, so that bigrams like ('weirdly', 'talking') can be processed in the correct manner.

It may be that the bigram finder can accept the tagged word tuples (I'm not familiar with nltk). Or, you may have to resort to creating an external index. If so, something like this might work:

part_of_speech = {word:tag for word,tag in posTagged}
best_bigrams = finger2.nbest(... as you like it ...)

verb_first_bigrams = [b if part_of_speech[b[1]] == 'RB' else (b[1],b[0]) for b in best_bigrams]

Then, with the verbs in front, you can transform it into a dictionary or list-of-lists or whatever:

adverbs_for = {}
for verb,adverb in verb_first_bigrams:
    if verb not in adverbs_for:
        adverbs_for[verb] = [adverb]
    else:
        adverbs_for[verb].append(adverb)
Dinge answered 27/1, 2016 at 7:8 Comment(12)
That's a fail if the same word occurs as verb and noun: "Don't just talk the talk"Inequitable
I think there is something wrong in that code. finder2 = BigramCollocationFinder.from_words(wrd for (wrd,tag) in posTagged if tag in ('VB','VBN','RB','VBD')) best_bigrams = finder2.nbest(bigram_measures.raw_freq, 10) verb_first_bigrams = [b if pos[b[1]] == 'RB' else (b[1],b[0]) for b in best_bigrams] print verb_first_bigrams adverbs_for = {} for verb,adverb in verb_first_bigrams: if verb not in adverbs_for: adverbs_for[verb] = [adverb] else: adverbs_for[verb].append(adverb) print adverbs_for Snuffbox
Output of this code is coming like that : {'talking': ['weirdly', 'decorated'], 'decorated': ['well'], 'run': ['well', 'weirdly']} this is not correct.Snuffbox
Why is that not correct? The verbs are keys to the dictionary, the adverbs are in lists for each corresponding verb. You should be able to do whatever you want with them now, no?Dinge
@Inequitable True, but in this case the OP is filtering only verbs and adverbs. For the general case, it would probably be better to keep the tag data as long as possible. But we don't know what problem he's actually solving...Dinge
@Austin, Look here {talking: weirdly, decorated}. 'decorated' is not an adverb. So this is wrong. You are absolutely right that verbs are the keys and the adverbs are in lists for each correspondending verbs. But this is not followed here and look they are from different sentences.Snuffbox
And with that {run : weirdly} cannot be possible.. But it's coming in the output. If it will be easier keeping the tags, there is no problem..Snuffbox
@SOUBHIKRAKSHIT I think the problem lies here: finder2 = BigramCollocationFinder.from_words(wrd for (wrd,tags) in posTagged if tags in('VBG','RB','VBN',)) If you have a bunch of text, and you are telling the filter "just pull out the adverbs and verbs" then you'll get a list of adverbs and verbs, somewhat randomly. I think you need to separate the text into sentences, first.Dinge
@SOUBHIK, did you try it? You miss the point: With the input "talk/VB the/D talk/NN", the noun talk will overwrite the verb talk in the part_of_speech dictionary, and you'll never see it again. It won't be there for the filter to select. Your approach is incorrect.Inequitable
@Inequitable good point really. But another thing is that - sentences like "he runs fast", fast is tagged with JJ though it is an adverb. Let make it as a primary step, then will figure out a way to get rid off these ambiguities. If you suggest any other way, I will definitely work on that.Snuffbox
@Austin Thanks for your suggestions. I am trying to find a way with it.Snuffbox
@SOUBHIK, dealing with incorrect (or inconvenient) POS tags is on a whole different level from only expecting one POS tag per word. This answer should be fixed to deal with more of them.Inequitable

© 2022 - 2024 — McMap. All rights reserved.