Error using Stanford POS Tagger in NLTK Python
Asked Answered
P

1

11

I am trying to use Stanford POS Tagger in NLTK but I am not able to run the example code given here http://www.nltk.org/api/nltk.tag.html#module-nltk.tag.stanford

import nltk
from nltk.tag.stanford import POSTagger
st = POSTagger(r'english-bidirectional-distim.tagger',r'D:/stanford-postagger/stanford-postagger.jar')
st.tag('What is the airspeed of an unladen swallow?'.split())

I have already added environment variables as

CLASSPATH = D:/stanford-postagger/stanford-postagger.jar
STANFORD_MODELS =  D:/stanford-postagger/models/

Here is the error I keep getting

Traceback (most recent call last):

File "D:\pos_stanford.py", line 4, in <module>
    st = POSTagger(r'english-bidirectional-distim.tagger',
         r'D:/stanford-postagger/stanford-postagger.jar')  
... LookupError: NLTK was unable to find the english-bidirectional-distim.tagger file! Use software specific configuration paramaters or set the STANFORD_MODELS environment variable.

Some forums suggest that

File "C:\Python27\lib\site-packages\nltk\tag\stanford.py", line 45, in __init__
env_vars=('STANFORD_MODELS'), verbose=verbose)

should be changed so that there is a comma in

env_vars=('STANFORD_MODELS',), verbose=verbose)

but it doesn't solve the problem either. Please Help me in solving this issue.

Other Information: I am using Windows 7 64 bit Python 2.7 32 bit NLTK 2.0

Proprioceptor answered 8/4, 2014 at 7:27 Comment(7)
I noticed you're using forward slashes (/) in your environment paths - on Windows it should be back slashes (). Also, try running it from the same directory as your models to avoid path issues.Caldera
I tried using backslashes too. didn't work.Proprioceptor
Also tried changing the directories and all but no useProprioceptor
Try unpacking the models jar and make sure you have the english-bidirectional-distim.tagger file in path STANFORD_MODELS\edu\stanford\nlp\models\pos-tagger\english-bidirectional\ where STANFORD_MODELS is defined or is your script's CWDCaldera
@jkoreska: Incorrect, Windows has been allowing forward-slashes since at least 2003. Forward-slashes are preferred in Python since you don't get into escaping and raw-strings (r'')Scripture
@Scripture yea thanks now I feel old :/Caldera
@Caldera No worries, I couldn't believe it myself when my friend told me several years ago. Microsoft certainly added this change quietly... almost surrendering to UNIX's convention on '\'Scripture
P
11

Note : Just posting it as answer to help in case others face this issue in future

I finally found out what I did wrong.. it turned out to be a blunder.

Tagger file name is not 'english-bidirectional-distim.tagger' but 'english-bidirectional-distsim.tagger'.

Proprioceptor answered 20/4, 2014 at 11:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.