How to get both the word embeddings vector and context vector of a given word by using word2vec?
Asked Answered
A

2

6
from gensim.models import word2vec

sentences = word2vec.Text8Corpus('TextFile')
model = word2vec.Word2Vec(sentences, size=200, min_count = 2, workers = 4)
print model['king']

Is the output vector the context vector of 'king' or the word embedding vector of 'king'? How can I get both context vector of 'king' and the word embedding vector of 'king'? Thanks!

Acroterion answered 9/9, 2016 at 7:28 Comment(0)
M
5

It is the embedding vector for 'king'.

If you use hierarchical softmax, the context vectors are:

model.syn1

and if you use negative sampling they are:

model.syn1neg

The vectors can be accessed by:

model.syn1[model.vocab[word].index]
Mima answered 19/12, 2016 at 0:1 Comment(0)
T
1

'Context vector' is also a 'word embedding' vector. Word embedding means how vocabulary are mapped to vectors of real numbers.

I assume you meant center word's vector when you said 'word embedding' vector.

In word2vec algorithm, when you train the model, it creates two different vectors for one word (when 'king' is used for center word and when it's used for context words.)

I don't know about how gensim is treating these two vectors, but normally, people average both context and center words, or concatinate two vectors. It might not be the most beautiful way to treat the vectors, but it works very well that way.

So when you call model['king'] on some pre-trained vector, the vector you see is probably the averaged version of two vectors.

Trilley answered 2/4, 2017 at 17:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.