How to get all words from spacy vocab?
Asked Answered
D

2

19

I need all the words from Spacy vocab. Suppose, I initialize my spacy model as

nlp = spacy.load('en')

How do I get the text of words from nlp.vocab?

Dovetail answered 2/2, 2019 at 17:18 Comment(3)
What do you need exactly?spacy's vocabulary dict for English?Hammer
yes. I am trying to use the nlp.vocab without explicitly downloading it from a URL. Then, use it as a corpus for spell correction(symspell).Dovetail
Also related https://mcmap.net/q/665856/-spacy-word-in-vocabularyInvidious
B
35

You can get it as a list like this:

list(nlp.vocab.strings)
Burleson answered 4/2, 2019 at 5:9 Comment(0)
A
8

As of spaCy v3.0, we need to run

python -m spacy download en_core_web_sm

and then e.g.

import spacy
nlp = spacy.load("en_core_web_sm")
words = set(nlp.vocab.strings)
word = 'would'
print(f"Is '{word}' an English word: {word in words}")  # True
Arminius answered 19/5, 2021 at 7:53 Comment(1)
Why does this give me 780334, while apparently there are 20k words in the vocab?Saurel

© 2022 - 2024 — McMap. All rights reserved.