doc2vec Questions
1
A few papers on the topics of word and document embeddings (word2vec, doc2vec) mention that they used the Stanford CoreNLP framework to tokenize/lemmatize/POS-tag the input words/sentences:
The ...
Evangelical asked 29/5, 2018 at 12:3
10
Solved
I am trying to load my saved model from s3 using joblib
import pandas as pd
import numpy as np
import json
import subprocess
import sqlalchemy
from sklearn.externals import joblib
ENV = 'dev'
mod...
Boudicca asked 19/5, 2020 at 14:36
3
Solved
I am using gensim doc2vec. I want know if there is any efficient way to know the vocabulary size from doc2vec. One crude way is to count the total number of words, but if the data is huge(1GB or mo...
2
Solved
I am using Word2Vec and using a wiki trained model that gives out the most similar words. I ran this before and it worked but now it gives me this error even after rerunning the whole program. I tr...
2
I am trying to follow the official Doc2Vec Gensim tutorial mentioned here - https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb
I modified the code in line 10...
3
I have two directories from which I want to read their text files and label them, but I don't know how to do this via TaggedDocument. I thought it would work as TaggedDocument([Strings],[Labels]) b...
1
I'm trying to find out the similarity between 2 documents. I'm using Doc2vec Gensim to train around 10k documents. There are around 10 string type of tags. Each tag consists of a unique word and co...
Schargel asked 27/5, 2019 at 9:34
1
Solved
I don't understand how word vectors are involved at all in the training process with gensim's doc2vec in DBOW mode (dm=0). I know that it's disabled by default with dbow_words=0. But what happens w...
2
Solved
I have a pyspark dataframe with a corpus of ~300k unique rows each with a "doc" that contains a few sentences of text in each.
After processing, I have a 200 dimension vectorized representation of...
Preference asked 2/1, 2018 at 16:20
2
I'm trying to get the text with its punctuation as it is important to consider the latter in my doc2vec model. However, the wikicorpus retrieve only the text. After searching the web I found these ...
2
Is there a pre-trained doc2vec model with a large data set, like Wikipedia or similar?
1
I am trying to find best hyperparameters for my trained doc2vec gensim model which takes a document as an input and create its document embeddings. My train data consists of text documents but it d...
Terylene asked 18/10, 2018 at 14:12
1
Solved
I am trying to understand the epochs parameter in the Doc2Vec function and epochs parameter in the train function.
In the following code snippet, I manually set up a loop of 4000 iterations. Is i...
2
Solved
I've patched the following code from examples I've found over the web:
# gensim modules
from gensim import utils
from gensim.models.doc2vec import LabeledSentence
from gensim.models import Doc2Vec...
Cruces asked 8/9, 2016 at 13:4
2
I am using Doc2vec to get vectors from words.
Please see my below code:
from gensim.models.doc2vec import TaggedDocument
f = open('test.txt','r')
trainings = [TaggedDocument(words = data.strip()....
1
Solved
I used gensim fit a doc2vec model, with tagged document (length>10) as training data. The target is to get doc vectors of all training docs, but only 10 vectors can be found in model.docvecs.
The ...
Com asked 28/2, 2018 at 3:14
1
Solved
I have been using gensim's libraries to train a doc2Vec model. After experimenting with different datasets for training, I am fairly confused about what should be an ideal training data size for do...
Treasurehouse asked 2/1, 2018 at 10:19
1
Solved
I tried to apply doc2vec on 600000 rows of sentences: Code as below:
from gensim import models
model = models.Doc2Vec(alpha=.025, min_alpha=.025, min_count=1, workers = 5)
model.build_vocab(res)
t...
1
Solved
I am using gensim doc2vec as below.
from gensim.models import doc2vec
from collections import namedtuple
import re
my_d = {'recipe__001__1': 'recipe 1 details should come here',
'recipe__001__2'...
1
Solved
My current doc2vec code is as follows.
# Train doc2vec model
model = doc2vec.Doc2Vec(docs, size = 100, window = 300, min_count = 1, workers = 4, iter = 20)
I also have a word2vec code as below.
...
1
Solved
How does doc2vec perform when trained on different sized datasets? There is no mention of dataset size in the original corpus, so I am wondering what is the minimum size required to get good perfor...
1
I'm trying to compare my implementation of Doc2Vec (via tf) and gensims implementation. It seems atleast visually that the gensim ones are performing better.
I ran the following code to train the ...
Incumbency asked 4/10, 2016 at 3:13
1
I'm training a Word2Vec model like:
model = Word2Vec(documents, size=200, window=5, min_count=0, workers=4, iter=5, sg=1)
and Doc2Vec model like:
doc2vec_model = Doc2Vec(size=200, window=5, min...
Supernumerary asked 21/7, 2017 at 9:40
1
Solved
I am trying to run doc2vec library from gensim package. My problem is that when I am training and saving the model the model file is rather large(2.5 GB) I tried using this line :
model.esti...
1
def cosine(vector1,vector2):
cosV12 = np.dot(vector1, vector2) / (linalg.norm(vector1) * linalg.norm(vector2))
return cosV12
model=gensim.models.doc2vec.Doc2Vec.load('Model_D2V_Game')
string='民生 ...
1 Next >
© 2022 - 2024 — McMap. All rights reserved.