word-embedding Questions

3

I am using the SentenceTransformers library (here: https://pypi.org/project/sentence-transformers/#pretrained-models) for creating embeddings of sentences using the pre-trained model bert-base-nli-...

5

Does Embedding make similar words closer to each other? And do I just need to give to it all the sentences? Or it is just a lookup table and I need to code the model?
Autohypnosis asked 7/6, 2018 at 18:29

5

In the dataset I have 5000000 rows, I would like to add a column called 'embeddings' to my dataset. dataset = dataset.add_column('embeddings', embeddings) The variable embeddings is a numpy memmap ...
Confectioner asked 22/11, 2021 at 10:56

3

I trained my unsupervised model using fasttext.train_unsupervised() function in python. I want to save it as vec file since I will use this file for pretrainedVectors parameter in fasttext.train_su...
Goodloe asked 11/10, 2019 at 8:46

5

Is it possible to use Google BERT for calculating similarity between two textual documents? As I understand BERT's input is supposed to be a limited size sentences. Some works use BERT for similari...
Somerset asked 11/9, 2019 at 5:3

4

I have trained a Word2Vec model using Gensim 3.8.0. Later I tried to use the pretrained model using Gensim 4.0.o on GCP. I used the following code: model = KeyedVectors.load_word2vec_format(wv_path...
Villous asked 30/3, 2021 at 9:28

2

Given a sentence of the type 'Roberta is a heavily optimized version of BERT.', I need to get the embeddings for each of the words in this sentence with RoBERTa. I have tried to look at the sample ...
Acea asked 24/3, 2020 at 3:33

4

Solved

I want to store a large n-dimensional vector (e.g. an embedding vector) in SQL Server as a piece of metadata associated with another row. In this example, it will be a 384-dimensional vector, for e...
Thill asked 3/5, 2023 at 18:44

1

Solved

I am a brand new user of Chroma database (and the associate python libraries). When I call get on a collection, embeddings is always none, even if embeddings are explicitly set/defined when adding ...
Vassallo asked 15/6, 2023 at 13:27

6

Solved

For ElMo, FastText and Word2Vec, I'm averaging the word embeddings within a sentence and using HDBSCAN/KMeans clustering to group similar sentences. A good example of the implementation can be see...

1

Solved

I am trying to use CLIP to calculate the similarities between strings. (I know that CLIP is usually used with text and images but it should work with only strings as well.) I provide a list of simp...
Alejandrinaalejandro asked 3/9, 2022 at 16:13

9

Solved

tf.nn.embedding_lookup(params, ids, partition_strategy='mod', name=None) I cannot understand the duty of this function. Is it like a lookup table? Which means to return the parameters correspondi...
Loudspeaker asked 19/1, 2016 at 7:14

2

Solved

I would like to create a minibatch by encoding multiple sentences using transform.BertTokenizer. It seems working for a single sentence. How to make it work for several sentences? from transformers...

2

Solved

In the paper describing BERT, there is this paragraph about WordPiece Embeddings. We use WordPiece embeddings (Wu et al., 2016) with a 30,000 token vocabulary. The first token of every sequen...
Handspike asked 16/9, 2019 at 16:29

4

Solved

I have downloaded the data with wget !wget http://nlp.stanford.edu/data/glove.6B.zip - ‘glove.6B.zip’ saved [862182613/862182613] It is saved as zip and I would like to use glove.6B.300d.txt fi...
Solifluction asked 27/4, 2018 at 10:16

6

Solved

In LDA model generates different topics everytime i train on the same corpus , by setting the np.random.seed(0), the LDA model will always be initialized and trained in exactly the same way. Is i...
Arnett asked 16/1, 2016 at 20:5

3

Solved

I am using faiss indexflatIP to store vectors related to some words. I also use another list to store words (the vector of the nth element in the list is nth vector in faiss index). I have two ques...
Weksler asked 26/3, 2022 at 12:10

2

Solved

I have roughly 2 million sentences that I want to turn into vectors using Facebook AI's RoBERTa-large,fine-tuned on NLI and STSB for sentence similarity (using the awesome sentence-transformers pac...
Underpinnings asked 4/5, 2020 at 8:50

3

Solved

In this page, it is said that: [...] skip-gram inverts contexts and targets, and tries to predict each context word from its target word [...] However, looking at the training dataset it prod...
Halfhour asked 10/7, 2016 at 1:21

2

Solved

I have seen that NLP models such as BERT utilize WordPiece for tokenization. In WordPiece, we split the tokens like playing to play and ##ing. It is mentioned that it covers a wider spectrum of Out...
Azriel asked 27/3, 2019 at 16:52

4

Solved

I've only seen a few questions that ask this, and none of them have an answer yet, so I thought I might as well try. I've been using gensim's word2vec model to create some vectors. I exported them ...
Buerger asked 23/5, 2018 at 15:50

1

Solved

I was interesting in how to get the similarity of word embedding in different sentences from BERT model (actually, that means words have different meanings in different scenarios). For example: sen...
Helmsman asked 21/11, 2021 at 19:39

3

I know the meaning and methods of word embedding(skip-gram, CBOW) completely. And I know, that Google has a word2vector API that by getting the word can produce the vector. but my problem is this:...

5

Solved

I am using spaCy as part of a topic modelling solution and I have a situation where I need to map a derived word vector to the "closest" or "most similar" word in a vocabulary of word vectors. I s...
Despite asked 15/2, 2019 at 21:43

1

I need to create a 'search engine' experience : from a short query (few words), I need to find the relevant documents in a corpus of thousands documents. After analyzing few approaches, I got very...
Susanasusanetta asked 23/12, 2019 at 17:6

© 2022 - 2025 — McMap. All rights reserved.