To download a particular dataset/models, use the nltk.download()
function, e.g. if you are looking to download the punkt
sentence tokenizer, use:
$ python3
>>> import nltk
>>> nltk.download('punkt')
If you're unsure of which data/model you need, you can start out with the basic list of data + models with:
>>> import nltk
>>> nltk.download('popular')
It will download a list of "popular" resources.
Ensure that you've the latest version of NLTK
because it's always improving and constantly maintain:
$ pip install --upgrade nltk
EDITED
In case anyone is avoiding errors from downloading larger datasets from nltk
, from https://mcmap.net/q/263475/-nltk-panlex_lite-giving-me-error
$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip
$ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite
$ python
>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('popular')
And if anyone wants to find nltk_data
directory, see https://mcmap.net/q/211871/-nltk-doesn-39-t-add-nltk_data-to-search-path
And to config nltk_data
path, see https://mcmap.net/q/209709/-how-to-config-nltk-data-directory-from-code
nltk
however you want, but you then usenltk.download()
to download the corpus data after you've installed it. – Supernaturalism