word-embedding

3

Download pre-trained sentence-transformers model locally

I am using the SentenceTransformers library (here: https://pypi.org/project/sentence-transformers/#pretrained-models) for creating embeddings of sentences using the pre-trained model bert-base-nli-...

word-embedding bert-language-model huggingface-tokenizers sentence-transformers

Fandango asked 23/12, 2020 at 5:34

5

Embedding in pytorch

Does Embedding make similar words closer to each other? And do I just need to give to it all the sentences? Or it is just a lookup table and I need to code the model?

python pytorch word-embedding

Autohypnosis asked 7/6, 2018 at 18:29

5

Add new column to a HuggingFace dataset

In the dataset I have 5000000 rows, I would like to add a column called 'embeddings' to my dataset. dataset = dataset.add_column('embeddings', embeddings) The variable embeddings is a numpy memmap ...

python numpy word-embedding pyarrow huggingface-datasets

Confectioner asked 22/11, 2021 at 10:56

3

How to save fasttext model in vec format?

I trained my unsupervised model using fasttext.train_unsupervised() function in python. I want to save it as vec file since I will use this file for pretrainedVectors parameter in fasttext.train_su...

python word-embedding fasttext

Goodloe asked 11/10, 2019 at 8:46

5

Is it possible to use Google BERT to calculate similarity between two textual documents?

Is it possible to use Google BERT for calculating similarity between two textual documents? As I understand BERT's input is supposed to be a limited size sentences. Some works use BERT for similari...

python text scikit-learn nlp word-embedding

Somerset asked 11/9, 2019 at 5:3

4

Gensim 3.8.0 to Gensim 4.0.0

I have trained a Word2Vec model using Gensim 3.8.0. Later I tried to use the pretrained model using Gensim 4.0.o on GCP. I used the following code: model = KeyedVectors.load_word2vec_format(wv_path...

python nlp gensim word2vec word-embedding

Villous asked 30/3, 2021 at 9:28

2

How can I get RoBERTa word embeddings?

Given a sentence of the type 'Roberta is a heavily optimized version of BERT.', I need to get the embeddings for each of the words in this sentence with RoBERTa. I have tried to look at the sample ...

encoding nlp word-embedding

Acea asked 24/3, 2020 at 3:33

4

Solved

How to store n-dimensional vector in Microsoft SQL Server?

I want to store a large n-dimensional vector (e.g. an embedding vector) in SQL Server as a piece of metadata associated with another row. In this example, it will be a 384-dimensional vector, for e...

arrays sql-server vector word-embedding

Thill asked 3/5, 2023 at 18:44

1

Solved

Chroma database embeddings = none when using get()

I am a brand new user of Chroma database (and the associate python libraries). When I call get on a collection, embeddings is always none, even if embeddings are explicitly set/defined when adding ...

word-embedding langchain chromadb

Vassallo asked 15/6, 2023 at 13:27

6

Solved

How to cluster similar sentences using BERT

For ElMo, FastText and Word2Vec, I'm averaging the word embeddings within a sentence and using HDBSCAN/KMeans clustering to group similar sentences. A good example of the implementation can be see...

python nlp artificial-intelligence word-embedding bert-language-model

Dissension asked 10/4, 2019 at 18:31

1

Solved

I am trying to use CLIP to calculate the similarities between strings. (I know that CLIP is usually used with text and images but it should work with only strings as well.) I provide a list of simp...

python word-embedding cosine-similarity

Alejandrinaalejandro asked 3/9, 2022 at 16:13

9

Solved

What does tf.nn.embedding_lookup function do?

tf.nn.embedding_lookup(params, ids, partition_strategy='mod', name=None) I cannot understand the duty of this function. Is it like a lookup table? Which means to return the parameters correspondi...

python tensorflow deep-learning word-embedding nlp

Loudspeaker asked 19/1, 2016 at 7:14

2

Solved

How to encode multiple sentences using transformers.BertTokenizer?

I would like to create a minibatch by encoding multiple sentences using transform.BertTokenizer. It seems working for a single sentence. How to make it work for several sentences? from transformers...

word-embedding huggingface-transformers huggingface-tokenizers

Lexi asked 1/7, 2020 at 3:32

2

Solved

How are the TokenEmbeddings in BERT created?

In the paper describing BERT, there is this paragraph about WordPiece Embeddings. We use WordPiece embeddings (Wu et al., 2016) with a 30,000 token vocabulary. The first token of every sequen...

machine-learning nlp word-embedding

Handspike asked 16/9, 2019 at 16:29

4

Solved

How to use GloVe word-embeddings file on Google colaboratory

I have downloaded the data with wget !wget http://nlp.stanford.edu/data/glove.6B.zip - ‘glove.6B.zip’ saved [862182613/862182613] It is saved as zip and I would like to use glove.6B.300d.txt fi...

python google-colaboratory word-embedding

Solifluction asked 27/4, 2018 at 10:16

6

Solved

Ensure the gensim generate the same Word2Vec model for different runs on the same data

In LDA model generates different topics everytime i train on the same corpus , by setting the np.random.seed(0), the LDA model will always be initialized and trained in exactly the same way. Is i...

python random gensim word2vec word-embedding

Arnett asked 16/1, 2016 at 20:5

3

Solved

Update an element in faiss index

I am using faiss indexflatIP to store vectors related to some words. I also use another list to store words (the vector of the nth element in the list is nth vector in faiss index). I have two ques...

python word-embedding faiss

Weksler asked 26/3, 2022 at 12:10

2

Solved

Speed up embedding of 2M sentences with RoBERTa

I have roughly 2 million sentences that I want to turn into vectors using Facebook AI's RoBERTa-large,fine-tuned on NLI and STSB for sentence similarity (using the awesome sentence-transformers pac...

python nlp word-embedding transformer-model

Underpinnings asked 4/5, 2020 at 8:50

3

Solved

CBOW v.s. skip-gram: why invert context and target words?

In this page, it is said that: [...] skip-gram inverts contexts and targets, and tries to predict each context word from its target word [...] However, looking at the training dataset it prod...

nlp tensorflow deep-learning word2vec word-embedding

Halfhour asked 10/7, 2016 at 1:21

2

Solved

How is WordPiece tokenization helpful to effectively deal with rare words problem in NLP?

I have seen that NLP models such as BERT utilize WordPiece for tokenization. In WordPiece, we split the tokens like playing to play and ##ing. It is mentioned that it covers a wider spectrum of Out...

nlp word-embedding

Azriel asked 27/3, 2019 at 16:52

4

Solved

Visualize Gensim Word2vec Embeddings in Tensorboard Projector

I've only seen a few questions that ask this, and none of them have an answer yet, so I thought I might as well try. I've been using gensim's word2vec model to create some vectors. I exported them ...

python tensorflow gensim tensorboard word-embedding

Buerger asked 23/5, 2018 at 15:50

1

Solved

How to get cosine similarity of word embedding from BERT model

I was interesting in how to get the similarity of word embedding in different sentences from BERT model (actually, that means words have different meanings in different scenarios). For example: sen...

python bert-language-model word-embedding transformer-model

Helmsman asked 21/11, 2021 at 19:39

3

How combine word embedded vectors to one vector?

I know the meaning and methods of word embedding(skip-gram, CBOW) completely. And I know, that Google has a word2vector API that by getting the word can produce the vector. but my problem is this:...

nlp information-retrieval word2vec google-api-python-client word-embedding

Kwasi asked 27/6, 2017 at 17:12

5

Solved

Mapping word vector to the most similar/closest word using spaCy

I am using spaCy as part of a topic modelling solution and I have a situation where I need to map a derived word vector to the "closest" or "most similar" word in a vocabulary of word vectors. I s...

nlp spacy word2vec word-embedding

Despite asked 15/2, 2019 at 21:43

1

Universal sentence encoder for big document similarity

I need to create a 'search engine' experience : from a short query (few words), I need to find the relevant documents in a corpus of thousands documents. After analyzing few approaches, I got very...

machine-learning nlp cosine-similarity word-embedding

Susanasusanetta asked 23/12, 2019 at 17:6

word-embedding Questions

Recommended topics

Hot tags