I am building TensorFlow model for NLP task, and I am using pretrained Glove 300d word-vector/embedding dataset.
Obviously some tokens can't be resolved as embeddings, because were not included into training dataset for word vector embedding model, e.g. rare names.
I can replace those tokens with vectors of 0s, but rather than dropping this information on the floor, I prefer to encode it somehow and include to my training data.
Say, I have 'raijin' word, which can't be resolved as embedding vector, what would be the best way to encode it consistently with Glove embedding dataset? What is the best approach to convert it to 300d vector?
Thank you.