Which Deep Learning Algorithm does Spacy uses when we train Custom model?

Which learning algorithm does spaCy use?

spaCy has its own deep learning library called thinc used under the hood for different NLP models. for most (if not all) tasks, spaCy uses a deep neural network based on CNN with a few tweaks. Specifically for Named Entity Recognition, spacy uses:

A transition based approach borrowed from shift-reduce parsers, which is described in the paper Neural Architectures for Named Entity Recognition by Lample et al. Matthew Honnibal describes how spaCy uses this on a YouTube video.
A framework that's called "Embed. Encode. Attend. Predict" (Starting here on the video), slides here.
- Embed: Words are embedded using a Bloom filter, which means that word hashes are kept as keys in the embedding dictionary, instead of the word itself. This maintains a more compact embeddings dictionary, with words potentially colliding and ending up with the same vector representations.
- Encode: List of words is encoded into a sentence matrix, to take context into account. spaCy uses CNN for encoding.
- Attend: Decide which parts are more informative given a query, and get problem specific representations.
- Predict: spaCy uses a multi layer perceptron for inference.

Advantages of this framework, per Honnibal are:

Mostly equivalent to sequence tagging (another task spaCy offers models for)
Shares code with the parser
Easily excludes invalid sequences
Arbitrary features are easily defined

For a full overview, Matthew Honnibal describes how the model works in this YouTube video. Slides could be found here.

Note: This information is based on slides from 2017. The engine might have changed since then.

When adding a new entity type, should we create a blank model or train an existing one?

Theoretically, when fine-tuning a spaCy model with new entities, you have to make sure the model doesn't forget representations for previously learned entities. The best thing, if possible, is to train a model from scratch, but that might not be easy or possible due to lack of data or resources.

EDIT Feb 2021: spaCy version 3 now uses the Transformer architecture as its deep learning model.

Which learning algorithm does spaCy use?

When adding a new entity type, should we create a blank model or train an existing one?

Recommended topics

Hot tags