How to store Word vector Embeddings?
Asked Answered
L

1

9

I am using BERT Word Embeddings for sentence classification task with 3 labels. I am using Google Colab for coding. My problem is, since I will have to execute the embedding part every time I restart the kernel, is there any way to save these word embeddings once it is generated? Because, it takes a lot of time to generate those embeddings.

The code I am using to generate BERT Word Embeddings is -

[get_features(text_list[i]) for text_list[i] in text_list]

Here, gen_features is a function which returns word embedding for each i in my list text_list.

I read that converting embeddings into bumpy tensors and then using np.save can do it. But I actually don't know how to code it.

Limbo answered 3/7, 2020 at 7:51 Comment(0)
B
14

You can save your embeddings data to a numpy file by following these steps:

all_embeddings = here_is_your_function_return_all_data()
all_embeddings = np.array(all_embeddings)
np.save('embeddings.npy', all_embeddings)

If you're saving into google colab, then you can download it to your local computer. Whenever you need it, just upload it and load it.

all_embeddings = np.load('embeddings.npy')

That's it.

Btw, You can also directly save your file to google drive.

Baerl answered 3/7, 2020 at 8:19 Comment(8)
Suppose my all_embeddings is a list of embeddings, since I am more interested in geting embeddings for a list of strings rather than a single string. Wll the np.array still work in that case?Limbo
yes, it will work. But make sure all the embedding shape/size in the list are same. If it is not, you will have to use another parameter to load your data. x= np.load('file.npy', pickle=True).Baerl
Sorry for the delayed reply. I ran it and it worked. I have marked the answer as correct. Also, please upvote the question too. Thank you.Limbo
Wouldn't it work if I directly dump the embeddings to a pickle file? I read that pkl files keeps the state as is.Rudderpost
@theProcrastinator Yes, it will work. Pickle can handle most of the types and objects with the states obviously.Baerl
@NazmulHasan These days people recommend using Vector DB over np, to store vector embeddings. What are your views on current scenario?Concertmaster
@Concertmaster it completely depends on the use cases. If you want to train the model only and will be no use later, I think it's not a good idea to use vector DB. But if you want to use embedding to do some analysis or wanna a build a recommender system or similarity search or some like that, vector db is better choice as it comes with some NN and statistics feature.Baerl
@NazmulHasan Insightful, thanks for sharing your viewsConcertmaster

© 2022 - 2024 — McMap. All rights reserved.