How to get word vectors from Keras Embedding Layer
Asked Answered
A

1

25

I'm currently working with a Keras model which has a embedding layer as first layer. In order to visualize the relationships and similarity of words between each other I need a function that returns the mapping of words and vectors of every element in the vocabulary (e.g. 'love' - [0.21, 0.56, ..., 0.65, 0.10]).

Is there any way to do it?

Ayah answered 8/7, 2018 at 18:53 Comment(0)
L
51

You can get the word embeddings by using the get_weights() method of the embedding layer (i.e. essentially the weights of an embedding layer are the embedding vectors):

# if you have access to the embedding layer explicitly
embeddings = emebdding_layer.get_weights()[0]

# or access the embedding layer through the constructed model 
# first `0` refers to the position of embedding layer in the `model`
embeddings = model.layers[0].get_weights()[0]

# `embeddings` has a shape of (num_vocab, embedding_dim) 

# `word_to_index` is a mapping (i.e. dict) from words to their index, e.g. `love`: 69
words_embeddings = {w:embeddings[idx] for w, idx in word_to_index.items()}

# now you can use it like this for example
print(words_embeddings['love'])  # possible output: [0.21, 0.56, ..., 0.65, 0.10]
Lamonicalamont answered 8/7, 2018 at 19:30 Comment(9)
with this line 'words_embeddings = {w: embeddings[idx] for w, idx in tokenizer.word_index}' I get the following exception: IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices. tokenizer.word_index return a mapping from words to their index.Ayah
@dermaschder I think you have forgotten to call items() on the dictionary, i.e. tokenizer.word_index.items().Lamonicalamont
Do not forget to add a special character in case you add padding in your input for some use-cases such as LSTM. You can add an index for the padding as well, something like word_to_index['__padding__'] = 0 Fabrianne
@today, are embedding in Keras static or dynamic (sometimes called contextualized embeddings)?Criminate
@Criminate If you are referring to contextualized vs. non-contextualized embeddings, then this is not at all related to Keras or the DL framework you are using. Instead it is related to the method or algorithm you are using. The embedding layer by itself is only a lookup table: given an integer index, it returns a vector corresponding to that index. It's you, as the designer of the method or architecture of the model, who decides to use it in a way that the model gives you contextualized or non-contextualized embeddings.Lamonicalamont
Thankyou @Lamonicalamont the integer index part makes sense. the learnd dense vector it returns, is it more close to static embedding or to contextualized/ dynamic one. If i learn them with downstream lstm prediction tssk, will they become contextualized?Criminate
@Criminate The contextualized embeddings are not achieved with just an embedding layer; it's the architecture of the model (besides the embedding layer) which produce contextualized embds. The values in the embd layer are fixed (after training) and therefore given two sentences like "the bank account" and "the bank of the river", the vector produced by the embd layer for the word "bank" is exactly the same for the two sentences. So you must add other layers, whether it's RNN or Transformer layers, on top in order to produce contextualized embds (as the output of those layers, not the embd layer).Lamonicalamont
Thankyou very much @today. It makes sense and was really helpful :)Criminate
How can I import emebdding_layer? It says emebdding_layer is not defined. I couldn't find it in the documentation. I also think that there is a typoMotivity

© 2022 - 2024 — McMap. All rights reserved.