word-embedding Questions
3
I am using the SentenceTransformers library (here: https://pypi.org/project/sentence-transformers/#pretrained-models) for creating embeddings of sentences using the pre-trained model bert-base-nli-...
Fandango asked 23/12, 2020 at 5:34
5
Does Embedding make similar words closer to each other? And do I just need to give to it all the sentences? Or it is just a lookup table and I need to code the model?
Autohypnosis asked 7/6, 2018 at 18:29
5
In the dataset I have 5000000 rows, I would like to add a column called 'embeddings' to my dataset.
dataset = dataset.add_column('embeddings', embeddings)
The variable embeddings is a numpy memmap ...
Confectioner asked 22/11, 2021 at 10:56
3
I trained my unsupervised model using fasttext.train_unsupervised() function in python. I want to save it as vec file since I will use this file for pretrainedVectors parameter in fasttext.train_su...
Goodloe asked 11/10, 2019 at 8:46
5
Is it possible to use Google BERT for calculating similarity between two textual documents? As I understand BERT's input is supposed to be a limited size sentences. Some works use BERT for similari...
Somerset asked 11/9, 2019 at 5:3
4
I have trained a Word2Vec model using Gensim 3.8.0. Later I tried to use the pretrained model using Gensim 4.0.o on GCP. I used the following code:
model = KeyedVectors.load_word2vec_format(wv_path...
Villous asked 30/3, 2021 at 9:28
2
Given a sentence of the type 'Roberta is a heavily optimized version of BERT.', I need to get the embeddings for each of the words in this sentence with RoBERTa. I have tried to look at the sample ...
Acea asked 24/3, 2020 at 3:33
4
Solved
I want to store a large n-dimensional vector (e.g. an embedding vector) in SQL Server as a piece of metadata associated with another row.
In this example, it will be a 384-dimensional vector, for e...
Thill asked 3/5, 2023 at 18:44
1
Solved
I am a brand new user of Chroma database (and the associate python libraries).
When I call get on a collection, embeddings is always none, even if embeddings are explicitly set/defined when adding ...
Vassallo asked 15/6, 2023 at 13:27
6
Solved
For ElMo, FastText and Word2Vec, I'm averaging the word embeddings within a sentence and using HDBSCAN/KMeans clustering to group similar sentences.
A good example of the implementation can be see...
Dissension asked 10/4, 2019 at 18:31
1
Solved
I am trying to use CLIP to calculate the similarities between strings. (I know that CLIP is usually used with text and images but it should work with only strings as well.)
I provide a list of simp...
Alejandrinaalejandro asked 3/9, 2022 at 16:13
9
Solved
tf.nn.embedding_lookup(params, ids, partition_strategy='mod', name=None)
I cannot understand the duty of this function. Is it like a lookup table? Which means to return the parameters correspondi...
Loudspeaker asked 19/1, 2016 at 7:14
2
Solved
I would like to create a minibatch by encoding multiple sentences using transform.BertTokenizer. It seems working for a single sentence. How to make it work for several sentences?
from transformers...
Lexi asked 1/7, 2020 at 3:32
2
Solved
In the paper describing BERT, there is this paragraph about WordPiece Embeddings.
We use WordPiece embeddings (Wu et al.,
2016) with a 30,000 token vocabulary. The first
token of every sequen...
Handspike asked 16/9, 2019 at 16:29
4
Solved
I have downloaded the data with wget
!wget http://nlp.stanford.edu/data/glove.6B.zip
- ‘glove.6B.zip’ saved [862182613/862182613]
It is saved as zip and I would like to use glove.6B.300d.txt fi...
Solifluction asked 27/4, 2018 at 10:16
6
Solved
In LDA model generates different topics everytime i train on the same corpus , by setting the np.random.seed(0), the LDA model will always be initialized and trained in exactly the same way.
Is i...
Arnett asked 16/1, 2016 at 20:5
3
Solved
I am using faiss indexflatIP to store vectors related to some words. I also use another list to store words (the vector of the nth element in the list is nth vector in faiss index). I have two ques...
Weksler asked 26/3, 2022 at 12:10
2
Solved
I have roughly 2 million sentences that I want to turn into vectors using Facebook AI's RoBERTa-large,fine-tuned on NLI and STSB for sentence similarity (using the awesome sentence-transformers pac...
Underpinnings asked 4/5, 2020 at 8:50
3
Solved
In this page, it is said that:
[...] skip-gram inverts contexts and targets, and tries to predict each context word from its target word [...]
However, looking at the training dataset it prod...
Halfhour asked 10/7, 2016 at 1:21
2
Solved
I have seen that NLP models such as BERT utilize WordPiece for tokenization. In WordPiece, we split the tokens like playing to play and ##ing. It is mentioned that it covers a wider spectrum of Out...
Azriel asked 27/3, 2019 at 16:52
4
Solved
I've only seen a few questions that ask this, and none of them have an answer yet, so I thought I might as well try. I've been using gensim's word2vec model to create some vectors. I exported them ...
Buerger asked 23/5, 2018 at 15:50
1
Solved
I was interesting in how to get the similarity of word embedding in different sentences from BERT model (actually, that means words have different meanings in different scenarios).
For example:
sen...
Helmsman asked 21/11, 2021 at 19:39
3
I know the meaning and methods of word embedding(skip-gram, CBOW) completely. And I know, that Google has a word2vector API that by getting the word can produce the vector.
but my problem is this:...
Kwasi asked 27/6, 2017 at 17:12
5
Solved
I am using spaCy as part of a topic modelling solution and I have a situation where I need to map a derived word vector to the "closest" or "most similar" word in a vocabulary of word vectors.
I s...
Despite asked 15/2, 2019 at 21:43
1
I need to create a 'search engine' experience : from a short query (few words), I need to find the relevant documents in a corpus of thousands documents.
After analyzing few approaches, I got very...
Susanasusanetta asked 23/12, 2019 at 17:6
1 Next >
© 2022 - 2025 — McMap. All rights reserved.