What is the network structure inside a Tensorflow Embedding Layer?
Asked Answered
O

1

9

Tensoflow Embedding Layer (https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding) is easy to use, and there are massive articles talking about "how to use" Embedding (https://machinelearningmastery.com/what-are-word-embeddings/, https://www.sciencedirect.com/topics/computer-science/embedding-method) . However, I want to know the Implemention of the very "Embedding Layer" in Tensorflow or Pytorch. Is it a word2vec? Is it a Cbow? Is it a special Dense Layer?

Obtest answered 9/6, 2021 at 3:3 Comment(0)
K
11

Structure wise, both Dense layer and Embedding layer are hidden layers with neurons in it. The difference is in the way they operate on the given inputs and weight matrix.

A Dense layer performs operations on the weight matrix given to it by multiplying inputs to it ,adding biases to it and applying activation function to it. Whereas Embedding layer uses the weight matrix as a look-up dictionary.

The Embedding layer is best understood as a dictionary that maps integer indices (which stand for specific words) to dense vectors. It takes integers as input, it looks up these integers in an internal dictionary, and it returns the associated vectors. It’s effectively a dictionary lookup.

from keras.layers import Embedding

embedding_layer = Embedding(1000, 64)

Here 1000 means the number of words in the dictionary and 64 means the dimensions of those words. Intuitively, embedding layer just like any other layer will try to find vector (real numbers) of 64 dimensions [ n1, n2, ..., n64] for any word. This vector will represent the semantic meaning of that particular word. It will learn this vector while training using backpropagation just like any other layer.

When you instantiate an Embedding layer, its weights (its internal dictionary of token vectors) are initially random, just as with any other layer. During training, these word vectors are gradually adjusted via backpropagation, structuring the space into something the downstream model can exploit. Once fully trained, the embedding space will show a lot of structure—a kind of structure specialized for the specific problem for which you’re training your model.

-- Deep Learning with Python by F. Chollet


Edit - How "Backpropagation" is used to train the look-up matrix of the Embedding Layer ?

Embedding layer is similar to the linear layer without any activation function. Theoretically, Embedding layer also performs matrix multiplication but doesn't add any non-linearity to it by using any kind of activation function. So backpropagation in the Embedding layer is similar to as of any linear layer. But practically, we don't do any matrix multiplication in the embedding layer because the inputs are generally one hot encoded and the matrix multiplication of weights by a one-hot encoded vector is as easy as a look-up.

Kingship answered 9/6, 2021 at 3:29 Comment(10)
However, I want to know the network structure behind "from keras.layers import Embedding". Is it a 1000x 64 units Dense Layer ?Obtest
@Obtest yes, you are somewhat correct. embedding layer is 1000 x 64 layer. Don't call it Dense layer. Dense layer performs operation like matrix multiplication etc. on weight matrix whereas Embedding layer uses the weight matrix as a look up dictionary. So structurally they both are layers with neurons in them , Dense layers performs operation on its weight while Embedding layer doesn'tKingship
Thank you! Can it be more specific ? or How can we use a "Backpropagation" algorithm to train that look-up matrix ?Obtest
Embedding layer is similar to the linear layer without any activation function. Theoretically, Embedding layer also performs matrix multiplication but doesn't add any non-linearity to it by using any kind of activation function. So backpropagation in Embedding layer is similar to as of any linear layer. But practically, we don't do any matrix multiplication in the embedding layer because the inputs are generally one hot encoded and the matrix multiplication of weights by a one-hot encoded vector is as easy as a look-up.Kingship
I am sorry I actually forgot to mention the role of activation function in the Dense layer in my answer so I edited it.Kingship
I see! Thank you again!Obtest
I'm confused by your claim that the input to an embedding layer is generally one-hot encoded, @coderina. This answer suggests that the input is actually an index value?Gallardo
@Gallardo please read the third last paragraph of that answer...Kingship
@Kingship If you mean the paragraph that starts Here 1000 means the number....., then no, that's not what I'm referring to. I'm talking about the input to the model, not the output.Gallardo
@Gallardo No I referred the last third paragraph of the answer that you referred. It starts from For an intuition of how this table lookup is implemented....Kingship

© 2022 - 2024 — McMap. All rights reserved.