Embedding layer creates embedding vectors out of the input words (I myself still don't understand the math) similarly like word2vec or pre-calculated glove would do.
Before I get to your code, let's make a short example.
texts = ['This is a text', 'This is not a text']
First we turn these sentences into a vector of integers where each word is a number assigned to the word in the dictionary and order of the vector creates the sequence of the words.
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical
max_review_length = 6 # maximum length of the sentence
embedding_vector_length = 3
top_words = 10
# num_words is the number of unique words in the sequence, if there's more top count words are taken
tokenizer = Tokenizer(top_words)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
word_index = tokenizer.word_index
input_dim = len(word_index) + 1
print('Found %s unique tokens.' % len(word_index))
# max_review_length is the maximum length of the input text so that we can create vector [... 0,0,1,3,50] where 1,3,50 are individual words
data = pad_sequences(sequences, max_review_length)
print('Shape of data tensor:', data.shape)
print(data)
[Out:]
'This is a text' --> [0 0 1 2 3 4]
'This is not a text' --> [0 1 2 5 3 4]
Now you can input these into the embedding layer.
from keras.models import Sequential
from keras.layers import Embedding
model = Sequential()
model.add(Embedding(top_words, embedding_vector_length, input_length=max_review_length, mask_zero=True))
model.compile(optimizer='adam', loss='categorical_crossentropy')
output_array = model.predict(data)
output_array
contains array of size (2, 6, 3): 2 input reviews or sentences in my case, 6 is the maximum number of words in each review (max_review_length
) and 3 is embedding_vector_length
.
E.g.
array([[[-0.01494285, -0.007915 , 0.01764857],
[-0.01494285, -0.007915 , 0.01764857],
[-0.03019481, -0.02910612, 0.03518577],
[-0.0046863 , 0.04763055, -0.02629668],
[ 0.02297204, 0.02146662, 0.03114786],
[ 0.01634104, 0.02296363, -0.02348827]],
[[-0.01494285, -0.007915 , 0.01764857],
[-0.03019481, -0.02910612, 0.03518577],
[-0.0046863 , 0.04763055, -0.02629668],
[-0.01736645, -0.03719328, 0.02757809],
[ 0.02297204, 0.02146662, 0.03114786],
[ 0.01634104, 0.02296363, -0.02348827]]], dtype=float32)
In your case you have a list of 5000 words, which can create review of maximum 500 words (more will be trimmed) and turn each of these 500 words into vector of size 32.
You can get mapping between the word indexes and embedding vectors by running:
model.layers[0].get_weights()
In the case below top_words
was 10, so we have mapping of 10 words and you can see that mapping for 0, 1, 2, 3, 4 and 5 is equal to output_array
above.
[array([[-0.01494285, -0.007915 , 0.01764857],
[-0.03019481, -0.02910612, 0.03518577],
[-0.0046863 , 0.04763055, -0.02629668],
[ 0.02297204, 0.02146662, 0.03114786],
[ 0.01634104, 0.02296363, -0.02348827],
[-0.01736645, -0.03719328, 0.02757809],
[ 0.0100757 , -0.03956784, 0.03794377],
[-0.02672029, -0.00879055, -0.039394 ],
[-0.00949502, -0.02805768, -0.04179233],
[ 0.0180716 , 0.03622523, 0.02232374]], dtype=float32)]
As mentioned in: https://stats.stackexchange.com/questions/270546/how-does-keras-embedding-layer-work these vectors are initiated as random and optimized by the network optimizers just like any other parameter of the network.