doc2vec Questions

1

A few papers on the topics of word and document embeddings (word2vec, doc2vec) mention that they used the Stanford CoreNLP framework to tokenize/lemmatize/POS-tag the input words/sentences: The ...
Evangelical asked 29/5, 2018 at 12:3

10

Solved

I am trying to load my saved model from s3 using joblib import pandas as pd import numpy as np import json import subprocess import sqlalchemy from sklearn.externals import joblib ENV = 'dev' mod...
Boudicca asked 19/5, 2020 at 14:36

3

Solved

I am using gensim doc2vec. I want know if there is any efficient way to know the vocabulary size from doc2vec. One crude way is to count the total number of words, but if the data is huge(1GB or mo...
Hiccup asked 12/1, 2017 at 8:7

2

Solved

I am using Word2Vec and using a wiki trained model that gives out the most similar words. I ran this before and it worked but now it gives me this error even after rerunning the whole program. I tr...
Sexpartite asked 6/8, 2021 at 5:41

2

I am trying to follow the official Doc2Vec Gensim tutorial mentioned here - https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb I modified the code in line 10...
Epexegesis asked 21/1, 2018 at 0:31

3

I have two directories from which I want to read their text files and label them, but I don't know how to do this via TaggedDocument. I thought it would work as TaggedDocument([Strings],[Labels]) b...
Cloninger asked 16/7, 2017 at 6:35

1

I'm trying to find out the similarity between 2 documents. I'm using Doc2vec Gensim to train around 10k documents. There are around 10 string type of tags. Each tag consists of a unique word and co...
Schargel asked 27/5, 2019 at 9:34

1

Solved

I don't understand how word vectors are involved at all in the training process with gensim's doc2vec in DBOW mode (dm=0). I know that it's disabled by default with dbow_words=0. But what happens w...
Chickie asked 9/4, 2019 at 11:46

2

Solved

I have a pyspark dataframe with a corpus of ~300k unique rows each with a "doc" that contains a few sentences of text in each. After processing, I have a 200 dimension vectorized representation of...
Preference asked 2/1, 2018 at 16:20

2

I'm trying to get the text with its punctuation as it is important to consider the latter in my doc2vec model. However, the wikicorpus retrieve only the text. After searching the web I found these ...
Piglet asked 5/6, 2018 at 9:48

2

Is there a pre-trained doc2vec model with a large data set, like Wikipedia or similar?
Impressible asked 2/7, 2018 at 9:25

1

I am trying to find best hyperparameters for my trained doc2vec gensim model which takes a document as an input and create its document embeddings. My train data consists of text documents but it d...
Terylene asked 18/10, 2018 at 14:12

1

Solved

I am trying to understand the epochs parameter in the Doc2Vec function and epochs parameter in the train function. In the following code snippet, I manually set up a loop of 4000 iterations. Is i...
Picaresque asked 9/7, 2018 at 12:32

2

Solved

I've patched the following code from examples I've found over the web: # gensim modules from gensim import utils from gensim.models.doc2vec import LabeledSentence from gensim.models import Doc2Vec...
Cruces asked 8/9, 2016 at 13:4

2

I am using Doc2vec to get vectors from words. Please see my below code: from gensim.models.doc2vec import TaggedDocument f = open('test.txt','r') trainings = [TaggedDocument(words = data.strip()....
Retinol asked 16/5, 2018 at 4:32

1

Solved

I used gensim fit a doc2vec model, with tagged document (length>10) as training data. The target is to get doc vectors of all training docs, but only 10 vectors can be found in model.docvecs. The ...
Com asked 28/2, 2018 at 3:14

1

Solved

I have been using gensim's libraries to train a doc2Vec model. After experimenting with different datasets for training, I am fairly confused about what should be an ideal training data size for do...
Treasurehouse asked 2/1, 2018 at 10:19

1

Solved

I tried to apply doc2vec on 600000 rows of sentences: Code as below: from gensim import models model = models.Doc2Vec(alpha=.025, min_alpha=.025, min_count=1, workers = 5) model.build_vocab(res) t...
Gisser asked 19/12, 2017 at 15:20

1

Solved

I am using gensim doc2vec as below. from gensim.models import doc2vec from collections import namedtuple import re my_d = {'recipe__001__1': 'recipe 1 details should come here', 'recipe__001__2'...
Michelsen asked 16/11, 2017 at 14:28

1

Solved

My current doc2vec code is as follows. # Train doc2vec model model = doc2vec.Doc2Vec(docs, size = 100, window = 300, min_count = 1, workers = 4, iter = 20) I also have a word2vec code as below. ...
Convertite asked 21/10, 2017 at 4:58

1

Solved

How does doc2vec perform when trained on different sized datasets? There is no mention of dataset size in the original corpus, so I am wondering what is the minimum size required to get good perfor...
Barratry asked 30/8, 2017 at 11:48

1

I'm trying to compare my implementation of Doc2Vec (via tf) and gensims implementation. It seems atleast visually that the gensim ones are performing better. I ran the following code to train the ...
Incumbency asked 4/10, 2016 at 3:13

1

I'm training a Word2Vec model like: model = Word2Vec(documents, size=200, window=5, min_count=0, workers=4, iter=5, sg=1) and Doc2Vec model like: doc2vec_model = Doc2Vec(size=200, window=5, min...
Supernumerary asked 21/7, 2017 at 9:40

1

Solved

I am trying to run doc2vec library from gensim package. My problem is that when I am training and saving the model the model file is rather large(2.5 GB) I tried using this line : model.esti...
Outspan asked 19/7, 2017 at 15:37

1

def cosine(vector1,vector2): cosV12 = np.dot(vector1, vector2) / (linalg.norm(vector1) * linalg.norm(vector2)) return cosV12 model=gensim.models.doc2vec.Doc2Vec.load('Model_D2V_Game') string='民生 ...
Subirrigate asked 9/7, 2017 at 5:19

© 2022 - 2024 — McMap. All rights reserved.