def cosine(vector1,vector2):
cosV12 = np.dot(vector1, vector2) / (linalg.norm(vector1) * linalg.norm(vector2))
return cosV12
model=gensim.models.doc2vec.Doc2Vec.load('Model_D2V_Game')
string='民生 为了 父亲 我 要 坚强 地 ...'
list=string.split(' ')
vector1=model.infer_vector(doc_words=list,alpha=0.1, min_alpha=0.0001,steps=5)
vector2=model.docvecs.doctag_syn0[0]
print cosine(vector2,vector1)
-0.0232586
I use a train data to train a doc2vec
model. Then, I use infer_vector()
to generate a vector given a document which is in trained data. But they are different. The value of cosine was so small (-0.0232586
) distance between the vector2
which was saved in doc2vec
model and the vector1
which was generated by infer_vector()
. But this is not reasonable ah ...
I find where i have error in. I should use 'string=u'民生 为了 父亲 我 要 坚强 地 ...'' instead 'string='民生 为了 父亲 我 要 坚强 地 ...''. When I correct this way, the cosine distance is up to 0.889342.
def cosine(vector1,vector2): cosV12 = np.dot(vector1, vector2) / (linalg.norm(vector1) * linalg.norm(vector2)) return cosV12
– Subirrigate