How to get word vectors from Keras Embedding Layer

About

Asked 8/7, 2018 at 18:53 Answered 8/7, 2018 at 19:30

Solved python dictionary keras keras-layer word-embedding

I'm currently working with a Keras model which has a embedding layer as first layer. In order to visualize the relationships and similarity of words between each other I need a function that returns the mapping of words and vectors of every element in the vocabulary (e.g. 'love' - [0.21, 0.56, ..., 0.65, 0.10]).

Is there any way to do it?

Ayah answered 8/7, 2018 at 18:53 Comment(0)

You can get the word embeddings by using the get_weights() method of the embedding layer (i.e. essentially the weights of an embedding layer are the embedding vectors):

# if you have access to the embedding layer explicitly
embeddings = emebdding_layer.get_weights()[0]

# or access the embedding layer through the constructed model 
# first `0` refers to the position of embedding layer in the `model`
embeddings = model.layers[0].get_weights()[0]

# `embeddings` has a shape of (num_vocab, embedding_dim) 

# `word_to_index` is a mapping (i.e. dict) from words to their index, e.g. `love`: 69
words_embeddings = {w:embeddings[idx] for w, idx in word_to_index.items()}

# now you can use it like this for example
print(words_embeddings['love'])  # possible output: [0.21, 0.56, ..., 0.65, 0.10]

Lamonicalamont answered 8/7, 2018 at 19:30 Comment(9)

with this line 'words_embeddings = {w: embeddings[idx] for w, idx in tokenizer.word_index}' I get the following exception: IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices. tokenizer.word_index return a mapping from words to their index. – Ayah 9/7, 2018 at 22:1

@dermaschder I think you have forgotten to call items() on the dictionary, i.e. tokenizer.word_index.items(). – Lamonicalamont 9/7, 2018 at 22:20

Do not forget to add a special character in case you add padding in your input for some use-cases such as LSTM. You can add an index for the padding as well, something like word_to_index['__padding__'] = 0 – Fabrianne 16/1, 2019 at 22:22

@today, are embedding in Keras static or dynamic (sometimes called contextualized embeddings)? – Criminate 1/7, 2021 at 21:25

@Criminate If you are referring to contextualized vs. non-contextualized embeddings, then this is not at all related to Keras or the DL framework you are using. Instead it is related to the method or algorithm you are using. The embedding layer by itself is only a lookup table: given an integer index, it returns a vector corresponding to that index. It's you, as the designer of the method or architecture of the model, who decides to use it in a way that the model gives you contextualized or non-contextualized embeddings. – Lamonicalamont 2/7, 2021 at 8:26

Thankyou @Lamonicalamont the integer index part makes sense. the learnd dense vector it returns, is it more close to static embedding or to contextualized/ dynamic one. If i learn them with downstream lstm prediction tssk, will they become contextualized? – Criminate 2/7, 2021 at 10:7

@Criminate The contextualized embeddings are not achieved with just an embedding layer; it's the architecture of the model (besides the embedding layer) which produce contextualized embds. The values in the embd layer are fixed (after training) and therefore given two sentences like "the bank account" and "the bank of the river", the vector produced by the embd layer for the word "bank" is exactly the same for the two sentences. So you must add other layers, whether it's RNN or Transformer layers, on top in order to produce contextualized embds (as the output of those layers, not the embd layer). – Lamonicalamont 2/7, 2021 at 10:45

Thankyou very much @today. It makes sense and was really helpful :) – Criminate 2/7, 2021 at 11:3

How can I import emebdding_layer? It says emebdding_layer is not defined. I couldn't find it in the documentation. I also think that there is a typo – Motivity 7/7, 2021 at 11:50

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags