Python nltk download and download_shell both freeze (hang) on punkt attempt
Asked Answered
T

4

3

Using NLTK 2.0.4. installed for EPD's Python-2.7.3 (not Canopy). on Ubuntu 12.10. In the terminal I type:

In [96]: nltk.download_shell()
NLTK Downloader
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> d

Download which package (l=list; x=cancel)?
  Identifier> punkt
    Downloading package 'punkt' to /home/espears/nltk_data...

And then it freezes. The relevant punkt.zip file is written to the stated directory, but the download interface never relinquishes.

This example is with IPython, but I tried the same with the regular Python 2.7.3 interpreter and got the same result.

When I try to use unzip to unzip the file directly, I see errors saying that the proper central zip-file code is not found within the file and that it cannot be unzipped. See below:

espears@computer ~/nltk_data/tokenizers $ unzip punkt.zip 
Archive:  punkt.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of punkt.zip or
        punkt.zip.zip, and cannot find punkt.zip.ZIP, period.

This happens with both nltk.download() and nltk.download_shell() in the same way.

I can inspect the .zip file using du to see that initially its size grows from 0 MB to about 2.7 MB, so it is actually downloading something and the file is not empty. But it stops at 2.7 MB (which may or may not correspond to the expected full size of the file) and then the Python shell downloader freezes.

Thermoscope answered 17/1, 2014 at 20:5 Comment(2)
Possibly this problem? support.enthought.com/entries/…Genvieve
No, I am not using Canopy. This is an older distribution from Enthought. I am also using it via IPython, but can confirm that the same hanging happens if used directly from the Python terminal. Note that I experience the same issue even when I use download_shell which bypasses the graphics concerns.Thermoscope
W
3

I had the same problem and downloaded the necessary items manually from the following link:

http://nltk.org/nltk_data/

Not the desired solution, but will work until this is fixed.

UPDATE:

I was actually able to run nltk.download() to install cmudict. Maybe this issue only affects certain packages?

Withershins answered 21/1, 2014 at 1:6 Comment(0)
C
1

I had the same problem with nltk 3.0.01b. I downloaded the "book" package and monitored the download from the task manager's network display while at the same time checking the size of the target folder (AppData\Roaming\nltk_data on my Windows 7 system). The network traffic ceased and the folder stopped growing at a size of 379 MB. But the Python shell was locked. The following was the last message displayed:

showing info http://nltk.github.com/nltk_data/

However, if you cancel out the Tk window that shows what download items are available, the nltk.download() command will terminate and the shell prompt will come back.

Concubine answered 19/8, 2014 at 18:16 Comment(0)
B
0

Most probably it is not stuck. It may be downloading. It downloads at much slower rate even if you have good internet connectivity. I kept checking the folder size using a while loop and it slowly kept on increasing and it was successful finally. It would have worked if you waited. Unzipping might have failed because you tried to unzip before entire file downloaded.

Bladderwort answered 16/6, 2016 at 5:49 Comment(0)
U
0

If any still struggling to download ntlk packages, you can go to google colab and then download the packages using :

!pip install nltk

import nltk
nltk.download('all',download_dir='/content/ntlk_data')

After that a folder will be created , zip the folder using :

!zip -r /content/ntlk_data.zip /content/ntlk_data

Mount the google drive:

from google.colab import drive
drive.mount('/content/gdrive')

Drag the zip file to your drive. Then you can download it directly from you google drive.

Unbacked answered 12/11, 2023 at 7:39 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.