word2vec Questions

5

I tried to follow this. But some how I wasted a lot of time ending up with nothing useful. I just want to train a GloVe model on my own corpus (~900Mb corpus.txt file). I downloaded the files provi...
Statecraft asked 24/2, 2018 at 11:10

5

Solved

I want to use a pre-trained word2vec model, but I don't know how to load it in python. This file is a MODEL file (703 MB). It can be downloaded here: http://devmount.github.io/GermanWordEmbeddings...
Skyway asked 17/9, 2016 at 16:40

4

I have trained a Word2Vec model using Gensim 3.8.0. Later I tried to use the pretrained model using Gensim 4.0.o on GCP. I used the following code: model = KeyedVectors.load_word2vec_format(wv_path...
Villous asked 30/3, 2021 at 9:28

2

Solved

I am using Word2vec through gensim with Google's pretrained vectors trained on Google News. I have noticed that the word vectors I can access by doing direct index lookups on the Word2Vec object ar...
Perrone asked 16/3, 2016 at 11:31

2

Solved

I am using gensim word2vec package in python. I would like to retrieve the W and W' weight matrices that have been learn during the skip-gram learning. It seems to me that model.syn0 gives me the f...
Foreshadow asked 15/12, 2016 at 11:19

6

Given a model, e.g. from gensim.models.word2vec import Word2Vec documents = ["Human machine interface for lab abc computer applications", "A survey of user opinion of computer system response ti...
Visser asked 23/2, 2018 at 5:26

5

I'm working on project using Word2vec and gensim, model = gensim.models.Word2Vec( documents = 'userDataFile.txt', size=150, window=10, min_count=2, workers=10) model = gensim.model.Word2Vec.lo...
Deaconry asked 7/11, 2018 at 18:49

9

I want to create a text file that is essentially a dictionary, with each word being paired with its vector representation through word2vec. I'm assuming the process would be to first train word2vec...
Chrysolite asked 15/7, 2015 at 20:50

19

I trying to import gensim with import gensim but get the following error ImportError Traceback (most recent call last) <ipython-input-5-50007be813d4> in <module>() ----> 1 import g...
Vasiliu asked 12/9, 2017 at 5:33

3

Solved

I have to use a word2vec module containing tons of Chinese characters. The module was trained by my coworkers using Java and is saved as a bin file. I installed gensim and tries to load the modul...
Lamasery asked 23/12, 2015 at 2:24

5

Solved

After training a word2vec model using python gensim, how do you find the number of words in the model's vocabulary?
Silverman asked 24/2, 2016 at 7:39

4

Solved

I am trying to build a Word2vec model but when I try to reshape the vector for tokens, I am getting this error. Any idea ? wordvec_arrays = np.zeros((len(tokenized_tweet), 100)) for i in range(len...
Roxy asked 25/5, 2021 at 12:30

3

Solved

E.g. we train a word2vec model using gensim: from gensim import corpora, models, similarities from gensim.models.word2vec import Word2Vec documents = ["Human machine interface for lab abc compute...
Jos asked 22/2, 2017 at 3:0

1

In a paper titled, "Machine Learning at the Limit," Canny, et. al. report substantial word2vec processing speed improvements. I'm working with the BIDMach library used in this paper, and cannot f...
Chaffee asked 1/4, 2017 at 15:25

1

A few papers on the topics of word and document embeddings (word2vec, doc2vec) mention that they used the Stanford CoreNLP framework to tokenize/lemmatize/POS-tag the input words/sentences: The ...
Evangelical asked 29/5, 2018 at 12:3

8

Solved

I have trained a word2vec model using a corpus of documents with Gensim. Once the model is training, I am writing the following piece of code to get the raw feature vector of a word say "view". my...
Mistreat asked 18/5, 2015 at 11:24

3

Solved

I have been trying word2vec for a while now using the gensim's word2vec library. My question is do I have to remove stopwords from my input text? Because, based on my initial experimental results, ...
Shult asked 11/1, 2016 at 12:49

5

I have a large collection of texts, where each text is rapidly growing. I need to implement a similarity search. The idea is to embed each word as word2vec, and represent each text as a normalized...
Ciapha asked 23/2, 2017 at 6:45

0

I try to do this query with elasticsearch python client : curl -X GET "localhost:9200/articles/_knn_search" -H 'Content-Type: application/json' -d ' { "knn": { "field&quo...
Wheeled asked 1/6, 2022 at 13:7

6

Solved

In LDA model generates different topics everytime i train on the same corpus , by setting the np.random.seed(0), the LDA model will always be initialized and trained in exactly the same way. Is i...
Arnett asked 16/1, 2016 at 20:5

6

Solved

I am playing around with FastText, https://pypi.python.org/pypi/fasttext,which is quite similar to Word2Vec. Since it seems to be a pretty new library with not to many built in functions yet, I was...
Aerophyte asked 13/2, 2017 at 14:33

2

Solved

If I increase the model size of my word2vec model I start to get this kind of exception in my log: org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 6 ...
Barron asked 23/4, 2016 at 19:38

3

Solved

In this page, it is said that: [...] skip-gram inverts contexts and targets, and tries to predict each context word from its target word [...] However, looking at the training dataset it prod...
Halfhour asked 10/7, 2016 at 1:21

3

Solved

I am using gensim doc2vec. I want know if there is any efficient way to know the vocabulary size from doc2vec. One crude way is to count the total number of words, but if the data is huge(1GB or mo...
Hiccup asked 12/1, 2017 at 8:7

2

I 'm working on word2vec model using gensim in Python, but I found that the result are the words having the same theme, synonyms are only part of the result. Can I find synonyms of a word based on...
Stealthy asked 6/6, 2017 at 9:39

© 2022 - 2025 — McMap. All rights reserved.