What is the network structure inside a Tensorflow Embedding Layer?

Structure wise, both Dense layer and Embedding layer are hidden layers with neurons in it. The difference is in the way they operate on the given inputs and weight matrix.

A Dense layer performs operations on the weight matrix given to it by multiplying inputs to it ,adding biases to it and applying activation function to it. Whereas Embedding layer uses the weight matrix as a look-up dictionary.

The Embedding layer is best understood as a dictionary that maps integer indices (which stand for specific words) to dense vectors. It takes integers as input, it looks up these integers in an internal dictionary, and it returns the associated vectors. It’s effectively a dictionary lookup.

from keras.layers import Embedding

embedding_layer = Embedding(1000, 64)

Here 1000 means the number of words in the dictionary and 64 means the dimensions of those words. Intuitively, embedding layer just like any other layer will try to find vector (real numbers) of 64 dimensions [ n1, n2, ..., n64] for any word. This vector will represent the semantic meaning of that particular word. It will learn this vector while training using backpropagation just like any other layer.

When you instantiate an Embedding layer, its weights (its internal dictionary of token vectors) are initially random, just as with any other layer. During training, these word vectors are gradually adjusted via backpropagation, structuring the space into something the downstream model can exploit. Once fully trained, the embedding space will show a lot of structure—a kind of structure specialized for the specific problem for which you’re training your model.

-- Deep Learning with Python by F. Chollet

Edit - How "Backpropagation" is used to train the look-up matrix of the Embedding Layer ?

Embedding layer is similar to the linear layer without any activation function. Theoretically, Embedding layer also performs matrix multiplication but doesn't add any non-linearity to it by using any kind of activation function. So backpropagation in the Embedding layer is similar to as of any linear layer. But practically, we don't do any matrix multiplication in the embedding layer because the inputs are generally one hot encoded and the matrix multiplication of weights by a one-hot encoded vector is as easy as a look-up.

Recommended topics

Hot tags