How does spacy use word embeddings for Named Entity Recognition (NER)? - McMap

About

How does spacy use word embeddings for Named Entity Recognition (NER)?

Asked 12/6, 2017 at 6:8 Answered 30/1, 2018 at 20:55

python nlp named-entity-recognition spacy

P

1

22

I'm trying to train an NER model using spaCy to identify locations, (person) names, and organisations. I'm trying to understand how spaCy recognises entities in text and I've not been able to find an answer. From this issue on Github and this example, it appears that spaCy uses a number of features present in the text such as POS tags, prefixes, suffixes, and other character and word-based features in the text to train an Averaged Perceptron.

However, nowhere in the code does it appear that spaCy uses the GLoVe embeddings (although each word in the sentence/document appears to have them, if present in the GLoVe corpus).

My questions are -

Are these used in the NER system now?
If I were to switch out the word vectors to a different set, should I expect performance to change in a meaningful way?
Where in the code can I find out how (if it all) spaCy is using the word vectors?

I've tried looking through the Cython code, but I'm not able to understand whether the labelling system uses word embeddings.

Pompeii answered 12/6, 2017 at 6:8 Comment(3)

Did you find out anything? I'd love the same information. – Worry 9/10, 2017 at 7:54

Sadly, no - I wasn't able to and eventually gave up the search. I used MITIE instead - github.com/mit-nlp/MITIE. – Pompeii 9/10, 2017 at 11:10

See an answer on the internals of the spaCy NER here. – Gyroscope 17/8, 2020 at 8:12

G

22

spaCy does use word embeddings for its NER model, which is a multilayer CNN. There's a quite a nice video that Matthew Honnibal, the creator of spaCy made, about how its NER works here. All three English models use GloVe vectors trained on Common Crawl, but the smaller models "prune" the number of vectors by having similar words mapped to the same vector link.

It's quite doable to add custom vectors. There's an overview of the process in the spaCy docs, plus some example code on Github.

Glitter answered 30/1, 2018 at 20:55 Comment(2)

Four years member and never finished the tour ;-) – Eddi 30/1, 2018 at 21:18

More information (derived from the youtube video) can be found here – Gyroscope 17/8, 2020 at 8:12

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.