Implement word2vec in Keras

Is this possible?

You've already answered it yourself: yes. In addition to word2veckeras, which uses gensim, here's another CBOW implementation that doesn't have extra dependencies (just in case, I'm not affiliated with this repo). You can use them as examples.

How can I fit the model?

Since the training data is the large corpus of sentences, the most convenient method is model.fit_generator, which "fits the model on data generated batch-by-batch by a Python generator". The generator runs indefinitely yielding (word, context, target) CBOW (or SG) tuples, but you manually specify sample_per_epoch and nb_epoch to limit the training. This way you decouple sentence analysis (tokenization, word index table, sliding window, etc) and actual keras model, plus save a lot of resources.

Should I use custom loss function?

CBOW minimizes the distance between the predicted and true distribution of the center word, so in the simplest form categorical_crossentropy will do it. If you implement negative sampling, which is a bit more complex, yet much more efficient, the loss function changes to binary_crossentropy. Custom loss function is unnecessary.

For anyone interested in details of math and probabilistic model, I highly recommend CS224D class by Stanford. Here is the lecture notes about word2vec, CBOW and Skip-Gram.

Another useful reference: word2vec implementation in pure numpy and c.

Recommended topics

Hot tags