detecting POS tag pattern along with specified words
Asked Answered
C

1

1

I need to identify certain POS tags before/after certain specified words, for example the following tagged sentence:

[('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]

can be abstracted to the form "would be" + Adjective

Similarly:

[('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]

is of the form "am able to" + Verb

How can I go about checking for these type of a pattern in sentences. I am using NLTK.

Certain answered 8/1, 2016 at 8:59 Comment(6)
What do you mean "checking"?Libenson
I meant how do I detect that a pattern of the form "am able to" + Verb exists in a sentence. Or, for example, something like "would be" + Comparative Adjective exists in a sentence.Certain
So do you want to print True if it exists or?Libenson
yep, I've seen examples with just matching POS, but in my case I need to match both words and POS tags, if that makes sense...Certain
Also note that 'JJ' isn't a comparative adjective - it's just an adjective.Libenson
@Certain You need to be clearer in terms of what you want to achieve. Can you give a specific input sentence and the desired output you need?Hambrick
L
2

Assuming you want to check literally for "would" followed by "be", followed by some adjective, you can do this:

def would_be(tagged):
    return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))

The input is a POS tagged sentence (list of tuples, as per NLTK).

It checks if there are any three elements in the list such that "would" is next to "be" and "be" is next to a word tagged as an adjective ('JJ'). It will return True as soon as this "pattern" is matched.

You can do something very similar for the second type of sentence:

def am_able_to(tagged):
    return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))

Here's a driver for the program:

s1 = [('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]
s2 = [('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]

def would_be(tagged):
   return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))

def am_able_to(tagged):
    return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))

sent1 = ' '.join(s[0] for s in s1)
sent2 = ' '.join(s[0] for s in s2)

print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s1), sent1))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s1), sent1))

print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s2), sent2))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s2), sent2))

This correctly outputs:

Is 'This feature would be nice to have' of type 'would be' + adj? True
Is 'This feature would be nice to have' of type 'am able to' + verb? False
Is 'I am able to delete the group functionality' of type 'would be' + adj? False
Is 'I am able to delete the group functionality' of type 'am able to' + verb? True

If you'd like to generalize this, you can change whether you're checking the literal words or their POS tag.

Libenson answered 8/1, 2016 at 12:32 Comment(7)
I were to do something generic like am_able_to(s1), I get a list index out of range error. Other than that, it works. Thanks!Certain
Corrected the functions.Libenson
thanks Erip. I tested would_be on another sentence "I am able to delete the group functionality" and I still get a list index out of range error.Certain
@Certain What's the POS tag list for that sentence? I don't have nltk on this machine.Libenson
[('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]Certain
@Certain It worked on my machine. Are you sure you copied the new functions?Libenson
Strange, yes I did copy those over. Did you try this new sentence with the would_be function? That's where I get the error...Certain

© 2022 - 2024 — McMap. All rights reserved.