Nltk stanford pos tagger error : Java command failed
Asked Answered
B

2

11

I'm trying to use nltk.tag.stanford module for tagging a sentence (first like wiki's example) but i keep getting the following error :

Traceback (most recent call last):
  File "test.py", line 28, in <module>
    print st.tag(word_tokenize('What is the airspeed of an unladen swallow ?'))
  File "/usr/local/lib/python2.7/dist-packages/nltk/tag/stanford.py", line 59, in tag
    return self.tag_sents([tokens])[0]
  File "/usr/local/lib/python2.7/dist-packages/nltk/tag/stanford.py", line 81, in tag_sents
    stdout=PIPE, stderr=PIPE)
  File "/usr/local/lib/python2.7/dist-packages/nltk/internals.py", line 160, in java
    raise OSError('Java command failed!')
OSError: Java command failed!

or following LookupError error :

LookupError: 

===========================================================================
NLTK was unable to find the java file!
Use software specific configuration paramaters or set the JAVAHOME environment variable.
===========================================================================

this is the exapmle code :

>>> from nltk.tag.stanford import POSTagger
>>> st = POSTagger('/usr/share/stanford-postagger/models/english-bidirectional-distsim.tagger',
...                '/usr/share/stanford-postagger/stanford-postagger.jar') 
>>> st.tag('What is the airspeed of an unladen swallow ?'.split()) 

I also used word_tokenize instead split but it doesn't made any difference.

I also installed java again or jdk! and my all searches were unsuccessful! something like nltknltk.internals.config_java() or ... !

Note : I use linux (Xubuntu)!

Baronet answered 27/11, 2014 at 12:54 Comment(0)
C
7

If you read through the embedded documentation in the nltk/internals.py (lines 58 - 175) you should find your answer easy enough. The NLTK requires the full path to the Java binary.

If not specified, then nltk will search the system for a Java binary; and if one is not found, it will raise a LookupError exception.

You have a couple of options I believe based on a bit of research:

1) Add the following code to your project (not a great solution)

import os
java_path = "path/to/java" # replace this
os.environ['JAVAHOME'] = java_path

2) Uninstall & Reinstall NLTK (preferably in a virtualenv) (better but still not great)

pip uninstall nltk
sudo -E pip install nltk

3) Set the java environment variable (This is the most pragmatic solution IMO)

Edit the system Path file /etc/profile

sudo gedit /etc/profile

Add following lines in end

JAVA_HOME=/usr/lib/jvm/jdk1.7.0
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
export JAVA_HOME
export JRE_HOME
export PATH
Crustacean answered 27/11, 2014 at 13:27 Comment(5)
in usr/lib/jvm/ i have 3 directory default-java and java-1.7.0-openjdak-amd64 and java-7-openjdk-amd64 which one of them i may use for path ? i use all of them but i get the error again and try all of your way except uninstall and install nltk ! how you say i uninstall and install again ?Baronet
@Kasra java-1.7.0-openjdak-amd64 I believeCrustacean
again in rewrite the code in a new .py file and executed it , then i get this error print(stderr.decode(sys.stdout.encoding)) TypeError: decode() argument 1 must be string, not None are you familiar with ? its for nltk/internals.py file !Baronet
Hmm, I'm trying to replicate the issue now. Try java-7-openjdk-amd64 in the meantime and see if this worksCrustacean
What's the $HOME and JRE_HOME variables in your 3rd solution? They're not previously defined and just suddenly appearNita
L
1

I've also run into this problem while using NLTK for the first time. After some hours spent with this issue I finally managed to make it work.

That is what I did:

  1. Uninstall and reinstall the nltk package
  2. Add both the JAVAHOME and JAVA_HOME environment variables variables (in my case, C:\Program Files\Java\jdk1.8.0_241\bin\)
  3. Add the values (C:\Program Files\Java\jdk1.8.0_241\bin\) to the Path environment variable also.

And, naturally, restart your Terminal.

That worked for me at Windows 7 64-bit.

Lumbard answered 12/2, 2020 at 19:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.