Which Deep Learning Algorithm does Spacy uses when we train Custom model?
Asked Answered
G

1

9

When we train custom model, I do see we have dropout and n_iter parameters to tune, but which deep learning algorithm does Spacy Uses to train Custom Models? Also, when Adding new Entity type is it good to create blank or train it on existing model?

Guesswork answered 24/2, 2020 at 17:33 Comment(0)
O
23

Which learning algorithm does spaCy use?

spaCy has its own deep learning library called thinc used under the hood for different NLP models. for most (if not all) tasks, spaCy uses a deep neural network based on CNN with a few tweaks. Specifically for Named Entity Recognition, spacy uses:

  1. A transition based approach borrowed from shift-reduce parsers, which is described in the paper Neural Architectures for Named Entity Recognition by Lample et al. Matthew Honnibal describes how spaCy uses this on a YouTube video.

  2. A framework that's called "Embed. Encode. Attend. Predict" (Starting here on the video), slides here.

    • Embed: Words are embedded using a Bloom filter, which means that word hashes are kept as keys in the embedding dictionary, instead of the word itself. This maintains a more compact embeddings dictionary, with words potentially colliding and ending up with the same vector representations.

    • Encode: List of words is encoded into a sentence matrix, to take context into account. spaCy uses CNN for encoding.

    • Attend: Decide which parts are more informative given a query, and get problem specific representations.

    • Predict: spaCy uses a multi layer perceptron for inference.

Advantages of this framework, per Honnibal are:

  1. Mostly equivalent to sequence tagging (another task spaCy offers models for)
  2. Shares code with the parser
  3. Easily excludes invalid sequences
  4. Arbitrary features are easily defined

For a full overview, Matthew Honnibal describes how the model works in this YouTube video. Slides could be found here.

Note: This information is based on slides from 2017. The engine might have changed since then.

When adding a new entity type, should we create a blank model or train an existing one?

Theoretically, when fine-tuning a spaCy model with new entities, you have to make sure the model doesn't forget representations for previously learned entities. The best thing, if possible, is to train a model from scratch, but that might not be easy or possible due to lack of data or resources.

EDIT Feb 2021: spaCy version 3 now uses the Transformer architecture as its deep learning model.

Ornithopod answered 25/2, 2020 at 12:4 Comment(4)
In the example code, of training custom entity, we can see the SGD as optimizer, so correct me if I am wrong, basically there is a CNN trained model under the hood, but while training custom entities, spacy uses SGD on top of CNN model to adjust the parameters so that error function is minimized?Guesswork
Yes, SGD is the optimizer used, however the model itself is different from an ordinary LSTM or CRF which are often used for NER.Ornithopod
Okay got it. So what you are saying is, core model of Spacy is based on CNN but also it incorporates features from other architectures like LSTM, CRF e.t.cGuesswork
I don't think it incorporates LSTM or CRF. This is probably why it's much faster than recurrent models. If I had to say in one sentence, I would say "CNN on top of Bloom embeddings with attention"Ornithopod

© 2022 - 2024 — McMap. All rights reserved.