I've been reading some NLP with Deep Learning papers and found Fine-tuning seems to be a simple but yet confusing concept. There's been the same question asked here but still not quite clear.
Fine-tuning pre-trained word embeddings to task-specific word embeddings as mentioned in papers like Y. Kim, “Convolutional Neural Networks for Sentence Classification,” and K. S. Tai, R. Socher, and C. D. Manning, “Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks,” had only a brief mention without getting into any details.
My question is:
Word Embeddings generated using word2vec or Glove as pretrained word vectors are used as input features (X)
for downstream tasks like parsing or sentiment analysis, meaning those input vectors are plugged into a new neural network model for some specific task, while training this new model, somehow we can get updated task-specific word embeddings.
But as far as I know, during the training, what back-propagation does is updating the weights (W)
of the model, it does not change the input features (X)
, so how exactly does the original word embeddings get fine-tuned? and where do these fine-tuned vectors come from?