I have a set of pre-trained word2vec word vectors and a corpus. I want to use the word vectors to represent words in the corpus. The corpus has some words in it that I don't have trained word vectors for. What's the best way to handle those words for which there is no pre-trained vector?
I've heard several suggestions.
use a vector of zeros for every missing word
use a vector of random numbers for every missing word (with a bunch of suggestions on how to bound those randoms)
an idea I had: take a vector whose values are the mean of all values in that position from all pre-trained vectors
Anyone with experience with the problem have thoughts on how to handle this?