Use spacy Spanish Tokenizer

Asked 22/3, 2017 at 9:40 Answered 14/12, 2019 at 13:17

I always used spacy library with english or german.

To load the library I used this code:

import spacy
nlp = spacy.load('en')

I would like to use the Spanish tokeniser, but I do not know how to do it, because spacy does not have a spanish model. I've tried this

python -m spacy download es

and then:

nlp = spacy.load('es')

But obviously without any success.

Does someone know how to tokenise a spanish sentence with spanish in the proper way?

Grogram answered 22/3, 2017 at 9:40 Comment(0)

For version till 1.6 this code works properly:

from spacy.es import Spanish
nlp = Spanish()

but in version 1.7.2 a little change is necessary:

from spacy.es import Spanish
nlp = Spanish(path=None)

Source:@honnibal in gitter chat

Grogram answered 22/3, 2017 at 11:33 Comment(1)

This didn't work for me. Maybe the API has been updated? See, for example, #47295816 – Mattos 23/8, 2018 at 14:12

You will have to download a spanish language model ("es" for Spanish, 'md' = medium model size, 'sm' = small model size) using the command line. Currently two pretrained Spanish models are available:

es_core_news_sm
es_core_news_md

Choose the small or medium sized version and download them using the command line:

python -m spacy download es_core_news_sm

python -m spacy download es_core_news_md

Then load the model of choice in python using the name of the model:

import spacy
nlp = spacy.load("es_core_news_sm") # or spacy.load("es_core_news_md")

# do something with the model, e.g. tokenize the text
doc = nlp(text_in_spanish)
for token in doc:
   print(token.text)

Check the documentation for model updates: https://spacy.io/models/es

Sorce answered 14/12, 2019 at 13:17 Comment(0)

This works for me:

python -m spacy download es_core_news_sm


import spacy
nlp = spacy.load("es_core_news_sm")

Wulf answered 3/11, 2019 at 8:37 Comment(0)

Recommended topics

Hot tags