NLTK - Download all nltk data except corpara from command line without Downloader UI - McMap

About

NLTK - Download all nltk data except corpara from command line without Downloader UI

Asked 25/6, 2016 at 16:46 Answered 30/7, 2016 at 19:55

Solved python nlp nltk corpus nltk-trainer

S

1

6

We can download all nltk data using:

> import nltk
> nltk.download('all')

Or specific data using:

> nltk.download('punkt')
> nltk.download('maxent_treebank_pos_tagger')

But I want to download all data except 'corpara' files, for example - all chunkers, grammers, models, stemmers, taggers, tokenizers, etc

is there any way to do so without Downloader UI? something like,

> nltk.download('all-taggers')

Stanch answered 25/6, 2016 at 16:46 Comment(1)

i think i looked into this at some point, and couldn't find a way to do it. the source code is here, for what it's worth. – Darrondarrow 10/7, 2016 at 15:36

S

2

List all corpora ids and set _status_cache[pkg.id] = 'installed'.

It will set status value for all corpora as 'installed' and corpora packages will be skipped when we use nltk.download().

Instead of downloading all corpora and models, if you're unsure of which corpora/package you need, use nltk.download('popular').

import nltk

dwlr = nltk.downloader.Downloader()

for pkg in dwlr.corpora():
    dwlr._status_cache[pkg.id] = 'installed'

dwlr.download('popular')

To download all packages of specific folder.

import nltk

dwlr = nltk.downloader.Downloader()

# chunkers, corpora, grammars, help, misc, 
# models, sentiment, stemmers, taggers, tokenizers
for pkg in dwlr.packages():
    if pkg.subdir== 'taggers':
        dwlr.download(pkg.id)

Stanch answered 30/7, 2016 at 19:55 Comment(0)

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.