How to train a naive bayes classifier with pos-tag sequence as a feature?
Asked Answered
C

1

6

I have two classes of sentences. Each has reasonably distinct pos-tag sequence. How can I train a Naive-Bayes classifier with POS-Tag sequence as a feature? Does Stanford CoreNLP/NLTK (Java or Python) provide any method for building a classifier with pos-tag as a feature? I know in python NaiveBayesClassifier allows for building a NB classifier but it uses contains-a-word as feature but can it be extended to use pos-tag-sequence as a feature ?

Caesarea answered 27/2, 2015 at 11:50 Comment(2)
Do you really need to use NaiveBayesClassifier? Have you looked at CRF? By the way, have you read this chapter: nltk.org/book/ch06.html ?Emilioemily
Thanks for the link. I ended up using concatenated pos-tags-sequence and containsPosSequence as a feature...Caesarea
H
6

If you know how to train and predict texts (or sentences in your case) using nltk's naive bayes classifier and words as features, than you can easily extend this approach in order to classify texts by pos-tags. This is because the classifier don't care about whether your feature-strings are words or tags. So you can simply replace the words of your sentences by pos-tags using for example nltk's standard pos tagger:

sent = ['So', 'they', 'have', 'internet', 'on', 'computers' , 'now']
tags = [t for w, t in nltk.pos_tag(sent)]
print tags

['IN', 'PRP', 'VBP', 'JJ', 'IN', 'NNS', 'RB']

As from now you can proceed with the "contains-a-word" approach.

Hayley answered 28/2, 2015 at 15:51 Comment(1)
Adding to your answer, as the question says "sequence", we can chain the pos tags of the sentence like [IN][PRP][VBP][JJ][IN][NNS][RS] and define a feature like say conatinsPrpVbpSequence and set it to True for an occurance of [PRP][VBP]....Caesarea

© 2022 - 2024 — McMap. All rights reserved.