How to train a reverse embedding, like vec2word?
Asked Answered
N

1

7

how do you train a neural network to map from a vector representation, to one hot vectors? The example I'm interested in is where the vector representation is the output of a word2vec embedding, and I'd like to map onto the the individual words which were in the language used to train the embedding, so I guess this is vec2word?

In a bit more detail; if I understand correctly, a cluster of points in embedded space represents similar words. Thus if you sample from points in that cluster, and use it as the input to vec2word, the output should be a mapping to similar individual words?

I guess I could do something similar to an encoder-decoder, but does it have to be that complicated/use so many parameters?

There's this TensorFlow tutorial, how to train word2vec, but I can't find any help to do the reverse? I'm happy to do it using any deeplearning library, and it's OK to do it using sampling/probabilistic.

Thanks a lot for your help, Ajay.

Nutritive answered 20/4, 2017 at 9:22 Comment(2)
You're interested in language modeling, correct? Do the examples in the TensorFlow RNN tutorial help?Selfstarter
Hi @AllenLavoie, thanks for the pointer. No the TF examples did'nt help that much - maybe you want to up date them with something on this question? Someone else helped me though. It's quite simple actually, you just take the, matrix vector product of the embedding weights, and the query vector, and then rank the vector you get from that product - I'm not explaining that well, but look here - github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/…. Thanks, AjayNutritive
S
1

One easiest thing that you can do is to use the nearest neighbor word. Given a query feature of an unknown word fq, and a reference feature set of known words R={fr}, then you can find out what is the nearest fr* for fq, and use the corresponding fr* word as fq's word.

Strachan answered 14/4, 2018 at 11:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.