NLTK and Stopwords Fail #lookuperror
Asked Answered
A

7

69

I am trying to start a project of sentiment analysis and I will use the stop words method. I made some research and I found that nltk have stopwords but when I execute the command there is an error.

What I do is the following, in order to know which are the words that nltk use (like what you may found here http://www.nltk.org/book/ch02.html in section4.1):

from nltk.corpus import stopwords
stopwords.words('english')

But when I press enter I obtain

---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
<ipython-input-6-ff9cd17f22b2> in <module>()
----> 1 stopwords.words('english')

C:\Users\Usuario\Anaconda\lib\site-packages\nltk\corpus\util.pyc in __getattr__(self, attr)
 66
 67     def __getattr__(self, attr):
---> 68         self.__load()
 69         # This looks circular, but its not, since __load() changes our
 70         # __class__ to something new:

C:\Users\Usuario\Anaconda\lib\site-packages\nltk\corpus\util.pyc in __load(self)
 54             except LookupError, e:
 55                 try: root = nltk.data.find('corpora/%s' % zip_name)
---> 56                 except LookupError: raise e
 57
 58         # Load the corpus.

LookupError:
**********************************************************************
  Resource 'corpora/stopwords' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
- 'C:\\Users\\Meru/nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
- 'C:\\Users\\Meru\\Anaconda\\nltk_data'
- 'C:\\Users\\Meru\\Anaconda\\lib\\nltk_data'
- 'C:\\Users\\Meru\\AppData\\Roaming\\nltk_data'
**********************************************************************

And, because of this problem things like this cannot run properly (obtaining the same error):

>>> from nltk.corpus import stopwords
>>> stop = stopwords.words('english')
>>> sentence = "this is a foo bar sentence"
>>> print [i for i in sentence.split() if i not in stop]

Do you know what may be problem? I must use words in Spanish, do you recomend another method? I also thought using Goslate package with datasets in english

Thanks for reading!

P.D.: I use Ananconda

Anton answered 1/11, 2014 at 22:5 Comment(0)
S
164

You don't seem to have the stopwords corpus on your computer.

You need to start the NLTK Downloader and download all the data you need.

Open a Python console and do the following:

>>> import nltk
>>> nltk.download()
showing info http://nltk.github.com/nltk_data/

In the GUI window that opens simply press the 'Download' button to download all corpora or go to the 'Corpora' tab and only download the ones you need/want.

Swap answered 1/11, 2014 at 22:26 Comment(1)
Alternatively, if you want to avoid the GUI and know what you want to download: nltk.download("stopwords") Crissman
P
17

I tried from ubuntu terminal and I don't know why the GUI didn't show up according to tttthomasssss answer. So I followed the comment from KLDavenport and it worked. Here is the summary:

Open your terminal/command-line and type python then

>>> import nltk .>>> nltk.download("stopwords")

This will store the stopwords corpus under the nltk_data. For my case it was /home/myusername/nltk_data/corpora/stopwords.

If you need another corpus then visit nltk data and find the corpus with their ID. Then use the ID to download like we did for stopwords.

Pogonia answered 19/10, 2017 at 21:42 Comment(1)
This worked great but I am surprised that this isn't something you can do with pip. Instead you have to script it to pull these resources on each environment.Corticate
V
9
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
STOPWORDS = set(stopwords.words('english'))
Virga answered 10/4, 2020 at 12:20 Comment(0)
I
3

If you want to manually install NLTK Corpus.

1) Go to http://www.nltk.org/nltk_data/ and download your desired NLTK Corpus file.

2) Now in a Python shell check the value of nltk.data.path

3) Choose one of the path that exists on your machine, and unzip the data files into the corpora sub directory inside.

4) Now you can import the data from nltk.corpos import stopwords

Reference: https://medium.com/@satorulogic/how-to-manually-download-a-nltk-corpus-f01569861da9

Idealistic answered 1/5, 2017 at 14:12 Comment(0)
N
1

import nltk

nltk.download()

  • A GUI pops up and in that go the Corpora section, select the required corpus.
  • Verified Result
Nevile answered 13/3, 2020 at 14:50 Comment(0)
Z
1

You can use the following commands

 import nltk

 nltk.download()

After hitting enter, a popup will open up, from where you can download all the required corpora and other nltk tools as well.

Zephyrus answered 26/7, 2021 at 19:57 Comment(0)
C
0
import nltk
nltk.download()

Click on download button when gui prompted. It worked for me.(nltk.download('stopwords') doesn't work for me)

Centigrade answered 16/8, 2017 at 6:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.