How combine word embedded vectors to one vector?
Asked Answered
K

3

12

I know the meaning and methods of word embedding(skip-gram, CBOW) completely. And I know, that Google has a word2vector API that by getting the word can produce the vector. but my problem is this: we have a clause that includes the subject, object, verb... that each word is previously embedded by the Google API, now "How we can combine these vectors together to create a vector that is equal to the clause?" Example: Clause: V= "dog bites man" after word embedding by the Google, we have V1, V2, V3 that each of them maps to the dog, bites, man. and we know that: V = V1+ V2 +V3 How can we provide V? I will appreciate if you explain it by taking an example of real vectors.

Kwasi answered 27/6, 2017 at 17:12 Comment(1)
Thanks for your previous Help. I have succeeded to find vectors for triple of words from the GoogleNews dataset by Python. now my question is, for finding the Similarity between the input triple and all other triples, which method is the best? we have more than hundred thousands triples of words and we want to create similarity matrix.Kwasi
B
7

A vector is basically just a list of numbers. You add vectors by adding the number in the same position in each list together. Here's an example:

a = [1, 2, 3]
b = [4, 5, 6]
c = a + b # vector addition
c is [(1+4), (2+5), (3+6)], or [5, 7, 9]

As indicated in this question, a simple way to do this in python is like this:

map(sum, zip(a, b))

Vector addition is part of linear algebra. If you don't understand operations on vectors and matrices the math around word vectors will be very hard to understand, so you may want to look into learning more about linear algebra in general.

Normally adding word vectors together is a good way to approximate a sentence vector, since for any given set of words there's an obvious order. However, your example of Dog bites man and Man bites dog shows the weakness of adding vectors - the result doesn't change based on word order, so the results for those two sentences would be the same, even though their meanings are very different.

For methods of getting sentence vectors that are affected by word order, look into doc2vec or the just-released InferSent.

Boast answered 6/7, 2017 at 1:40 Comment(2)
Try this example with this popular model. It seems to deal with word order.Janeljanela
This answer is from 2017. These days any Transformer model deals with word order, but they do not use the "word vectors" described in this question.Boast
Y
5

Two solutions:

  1. Use vector addition of the constituent words of a phrase - this typically works well because addition is a good estimation of semantic composition.

  2. Use paragraph vectors, which is able to encode arbitrary length sequence of words as a single vector.

Yvonneyvonner answered 28/6, 2017 at 9:50 Comment(3)
Great. I really appreciate your help. would you please let me know the rule of vector addition?(how it works?) can you give me an example of real vectors and combine them with this method?Kwasi
a vector is a sequence of real numbers... so to add two vectors simply add each corresponding component of the sequence, e.g. if a=(1,2) and b=(3,1) are two vectors, then a+b=(1+3,2+1)=(4,3)Yvonneyvonner
Do you know what is the codes/python scripts for this vector addition?Kwasi
L
1

So, In this paper : https://arxiv.org/pdf/2004.07464.pdf They have combined image embedding and text embedding by concatenating them.

X = TE + IE 

Here X is fusion embedding with TE and IE as text and image embedding respectively. If your TE and IE have dimension of suppose 2048 each, your X will be of length 2*2024. Then maybe you can use this if possible or if you want to reduce the dimension you can use t-SNE/PCA or https://arxiv.org/abs/1708.03629 (Implemented here : https://github.com/vyraun/Half-Size)

Lew answered 17/11, 2021 at 15:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.