Finding conditional probability of trigram in python nltk
Asked Answered
S

2

9

I have started learning NLTK and I am following a tutorial from here, where they find conditional probability using bigrams like this.

import nltk
from nltk.corpus import brown
cfreq_brown_2gram = nltk.ConditionalFreqDist(nltk.bigrams(brown.words()))

However I want to find conditional probability using trigrams. When I try to change nltk.bigrams to nltk.trigrams I get the following error.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "home/env/local/lib/python2.7/site-packages/nltk/probability.py", line 1705, in __init__
    for (cond, sample) in cond_samples:
ValueError: too many values to unpack (expected 2)

How can I calculate the conditional probability using trigrams?

Slaughter answered 28/6, 2016 at 6:25 Comment(2)
Could you post your code in the question please? I have a hunch as to what's going on, but can only confirm it if I see exactly what you're doing in your script.Perform
I just ran the three lines of code (updated in my question). But instead of bigrams I want trigrams to be used for conditional probability.Slaughter
P
12

nltk.ConditionalFreqDist expects its data as a sequence of (condition, item) tuples. nltk.trigrams returns tuples of length 3, which causes the exact error you posted.

From your post it's not exactly clear what you want to use as conditions, but the convention when doing language modeling is to condition the last word on its predecessors. The following code demonstrates how you'd implement that.

brown_trigrams = nltk.trigrams(brown.words())
condition_pairs = (((w0, w1), w2) for w0, w1, w2 in brown_trigrams)
cfd_brown = nltk.ConditionalFreqDist(condition_pairs)
Perform answered 29/6, 2016 at 11:24 Comment(0)
M
-1

You can use the n-gram model described here.

An example for usage:

from nltk.util import ngrams

input= '...'
N = 3
trigrams = ngrams(input.split(), N)
for grams in trigrams:
  print grams

I strongly encourage you to read the above documentation, and I hope it would help.

Maryland answered 28/6, 2016 at 6:43 Comment(2)
I don't have a problem getting trigrams. That can be done easily using nltk.trigrams. What I want is, finding conditional probability using trigrams.Slaughter
i might be misunderstanding you here but can't you quite easily calculate the probabilitly after extracting trigrams e.g. to a dictionary?Sidman

© 2022 - 2024 — McMap. All rights reserved.