I am using Doc2vec
to get vectors from words.
Please see my below code:
from gensim.models.doc2vec import TaggedDocument
f = open('test.txt','r')
trainings = [TaggedDocument(words = data.strip().split(","),tags = [i]) for i,data in enumerate(f)
model = Doc2Vec(vector_size=5, epochs=55, seed = 1, dm_concat=1)
model.build_vocab(trainings)
model.train(trainings, total_examples=model.corpus_count, epochs=model.epochs)
model.save("doc2vec.model")
model = Doc2Vec.load('doc2vec.model')
for i in range(len(model.docvecs)):
print(i,model.docvecs[i])
I have a test.txt
file that its content has 2 lines and contents of these 2 lines is the same (they are "a")
I trained with doc2vec and got the model, but the problem is although the contents of 2 lines is the same, doc2vec gave me 2 different vectors.
0 [ 0.02730868 0.00393569 -0.08150548 -0.04009786 -0.01400406]
1 [ 0.03916578 -0.06423566 -0.05350181 -0.00726833 -0.08292392]
I dont know why this happened. I thought that these vectors would be the same. Can you explain that? And if I want to make the same vectors for the sames words, what should I do in this case?
docs = [data.strip().split(" ") for data in f] for doc in docs: vec = model.infer_vector(doc) print(vec)
. But i dont know if it is what i need. How do you thing about this. Am I true? Thank you. – Retinol