How to extract the verbs and all corresponding adverbs from a text?

Asked 27/1, 2016 at 6:10 Answered 31/1, 2016 at 10:59

Using ngram in Python my aim is to find out verbs and their corresponding adverbs from an input text. What I have done:

Input text:""He is talking weirdly. A horse can run fast. A big tree is there. The sun is beautiful. The place is well decorated.They are talking weirdly. She runs fast. She is talking greatly.Jack runs slow."" Code:-

`finder2 = BigramCollocationFinder.from_words(wrd for (wrd,tags) in posTagged if tags in('VBG','RB','VBN',))
scored = finder2.score_ngrams(bigram_measures.raw_freq)
print sorted(finder2.nbest(bigram_measures.raw_freq, 5))`

From my code, I got the output: [('talking', 'greatly'), ('talking', 'weirdly'), ('weirdly', 'talking'),('runs','fast'),('runs','slow')] which is the list of verbs and their corresponding adverbs.

What I am looking for:

I want to figure out verb and all corresponding adverbs from this. For example ('talking'- 'greatly','weirdly),('runs'-'fast','slow')etc.

Snuffbox answered 27/1, 2016 at 6:10 Comment(1)

If you are looking for verbs and adverbs, why is your post title about subjects and objects? Please edit the question so you can get sensible answers. – Inequitable 27/1, 2016 at 8:33

You already have a list of all verb-adverb bigrams, so you're just asking how to consolidate them into a dictionary that gives all adverbs for each verb. But first let's re-create your bigrams in a more direct way:

pairs = list()
for (w1, tag1), (w2, tag2) in nltk.bigrams(posTagged):
    if t1.startswith("VB") and t2 == "RB":
        pairs.append((w1, w2))

Now for your question: We'll build a dictionary with the adverbs that follow each verb. I'll store the adverbs in a set, not a list, to get rid of duplications.

from collections import defaultdict
consolidated = defaultdict(set)
for verb, adverb in pairs:
    consolidated[verb].add(adverb)

The defaultdict provides an empty set for verbs that haven't been seen before, so we don't need to check by hand.

Depending on the details of your assignment, you might also want to case-fold and lemmatize your verbs so that the adverbs from "Driving recklessly" and "I drove carefully" are recorded together:

wnl = nltk.stem.WordNetLemmatizer()
...
for verb, adverb in pairs:
    verb = wnl.lemmatize(verb.lower(), "v")
    consolidated[verb].add(adverb)

Inequitable answered 31/1, 2016 at 10:59 Comment(3)

Thank you very much for the help. It helped me a lot. – Snuffbox 1/2, 2016 at 21:7

do you know how to represent these outputs in a matrix form? A row will be (verb, adv1, adv2,adv3) like this. – Snuffbox 1/2, 2016 at 21:23

Just print each key, value in the dictionary as key, list(value) (since the value is a set). – Inequitable 2/2, 2016 at 0:5

-1

I think you are losing information you will need for this. You need to retain the part-of-speech data somehow, so that bigrams like ('weirdly', 'talking') can be processed in the correct manner.

It may be that the bigram finder can accept the tagged word tuples (I'm not familiar with nltk). Or, you may have to resort to creating an external index. If so, something like this might work:

part_of_speech = {word:tag for word,tag in posTagged}
best_bigrams = finger2.nbest(... as you like it ...)

verb_first_bigrams = [b if part_of_speech[b[1]] == 'RB' else (b[1],b[0]) for b in best_bigrams]

Then, with the verbs in front, you can transform it into a dictionary or list-of-lists or whatever:

adverbs_for = {}
for verb,adverb in verb_first_bigrams:
    if verb not in adverbs_for:
        adverbs_for[verb] = [adverb]
    else:
        adverbs_for[verb].append(adverb)

Dinge answered 27/1, 2016 at 7:8 Comment(12)

That's a fail if the same word occurs as verb and noun: "Don't just talk the talk" – Inequitable 27/1, 2016 at 22:54

I think there is something wrong in that code.

finder2 = BigramCollocationFinder.from_words(wrd for (wrd,tag) in posTagged if tag in ('VB','VBN','RB','VBD')) best_bigrams = finder2.nbest(bigram_measures.raw_freq, 10) verb_first_bigrams = [b if pos[b[1]] == 'RB' else (b[1],b[0]) for b in best_bigrams] print verb_first_bigrams  adverbs_for = {} for verb,adverb in verb_first_bigrams:     if verb not in adverbs_for:         adverbs_for[verb] = [adverb]     else:         adverbs_for[verb].append(adverb) print adverbs_for

– Snuffbox 28/1, 2016 at 4:16

Output of this code is coming like that : {'talking': ['weirdly', 'decorated'], 'decorated': ['well'], 'run': ['well', 'weirdly']} this is not correct. – Snuffbox 28/1, 2016 at 4:19

Why is that not correct? The verbs are keys to the dictionary, the adverbs are in lists for each corresponding verb. You should be able to do whatever you want with them now, no? – Dinge 28/1, 2016 at 5:21

@Inequitable True, but in this case the OP is filtering only verbs and adverbs. For the general case, it would probably be better to keep the tag data as long as possible. But we don't know what problem he's actually solving... – Dinge 28/1, 2016 at 5:23

@Austin, Look here {talking: weirdly, decorated}. 'decorated' is not an adverb. So this is wrong. You are absolutely right that verbs are the keys and the adverbs are in lists for each correspondending verbs. But this is not followed here and look they are from different sentences. – Snuffbox 28/1, 2016 at 13:9

And with that {run : weirdly} cannot be possible.. But it's coming in the output. If it will be easier keeping the tags, there is no problem.. – Snuffbox 28/1, 2016 at 13:16

@SOUBHIKRAKSHIT I think the problem lies here: finder2 = BigramCollocationFinder.from_words(wrd for (wrd,tags) in posTagged if tags in('VBG','RB','VBN',)) If you have a bunch of text, and you are telling the filter "just pull out the adverbs and verbs" then you'll get a list of adverbs and verbs, somewhat randomly. I think you need to separate the text into sentences, first. – Dinge 28/1, 2016 at 16:26

@SOUBHIK, did you try it? You miss the point: With the input "talk/VB the/D talk/NN", the noun talk will overwrite the verb talk in the part_of_speech dictionary, and you'll never see it again. It won't be there for the filter to select. Your approach is incorrect. – Inequitable 28/1, 2016 at 22:51

@Inequitable good point really. But another thing is that - sentences like "he runs fast", fast is tagged with JJ though it is an adverb. Let make it as a primary step, then will figure out a way to get rid off these ambiguities. If you suggest any other way, I will definitely work on that. – Snuffbox 29/1, 2016 at 9:43

@Austin Thanks for your suggestions. I am trying to find a way with it. – Snuffbox 29/1, 2016 at 9:44

@SOUBHIK, dealing with incorrect (or inconvenient) POS tags is on a whole different level from only expecting one POS tag per word. This answer should be fixed to deal with more of them. – Inequitable 29/1, 2016 at 16:31

Recommended topics

Hot tags