Bigram vector representations using word2vec

About

Asked 2/3, 2016 at 12:27 Answered 10/3, 2016 at 11:30

I want to construct word embeddings for documents using the word2vec tool. I know how to find a vector embedding corresponding to a single word (unigram). Now, I want to find a vector for a bigram. Is it possible to construct a bigram word embedding using word2vec? If yes, how?

Tifanie answered 2/3, 2016 at 12:27 Comment(2)

from gensim.models import Word2Vec, Phrases – Ethnography 2/3, 2016 at 16:37

if the given answer provides a solution for your problem, please accept it to close the issue or comment on why isn't working ! – Assimilable 27/5, 2016 at 7:30

The following snippet will get you the vector representation of a bigram. Note that the bigram you want to convert to a vector needs to have an underscore instead of a space between the words, e.g. bigram2vec(unigrams, "this report") is wrong, it should be bigram2vec(unigrams, "this_report"). For more details on generating the unigrams, please see the gensim.models.word2vec.Word2Vec class here.

from gensim.models import word2vec

def bigram2vec(unigrams, bigram_to_search):
    bigrams = Phrases(unigrams)
    model = word2vec.Word2Vec(bigrams[unigrams])
    if bigram_to_search in model.vocab.keys():
        return model[bigram_to_search]
    else:
        return None

Kissel answered 10/3, 2016 at 11:30 Comment(3)

What is unigrams here? – Cinematograph 24/11, 2021 at 14:29

Good question, unigrams is the corpus words represented as a list. More details with an example here: radimrehurek.com/gensim/models/phrases.html – Kissel 25/11, 2021 at 14:30

Note that unigrams must be a list of lists. Further, model.vocab.keys() no longer works. It's replaced with model.wv.index_to_key – Linebreeding 30/5, 2022 at 9:1

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags