How to store Word vector Embeddings?

About

Asked 3/7, 2020 at 7:51 Answered 3/7, 2020 at 8:19

Solved python-3.x keras nlp word-embedding bert-language-model

I am using BERT Word Embeddings for sentence classification task with 3 labels. I am using Google Colab for coding. My problem is, since I will have to execute the embedding part every time I restart the kernel, is there any way to save these word embeddings once it is generated? Because, it takes a lot of time to generate those embeddings.

The code I am using to generate BERT Word Embeddings is -

[get_features(text_list[i]) for text_list[i] in text_list]

Here, gen_features is a function which returns word embedding for each i in my list text_list.

I read that converting embeddings into bumpy tensors and then using np.save can do it. But I actually don't know how to code it.

Limbo answered 3/7, 2020 at 7:51 Comment(0)

You can save your embeddings data to a numpy file by following these steps:

all_embeddings = here_is_your_function_return_all_data()
all_embeddings = np.array(all_embeddings)
np.save('embeddings.npy', all_embeddings)

If you're saving into google colab, then you can download it to your local computer. Whenever you need it, just upload it and load it.

all_embeddings = np.load('embeddings.npy')

That's it.

Btw, You can also directly save your file to google drive.

Baerl answered 3/7, 2020 at 8:19 Comment(8)

Suppose my all_embeddings is a list of embeddings, since I am more interested in geting embeddings for a list of strings rather than a single string. Wll the np.array still work in that case? – Limbo 3/7, 2020 at 8:23

yes, it will work. But make sure all the embedding shape/size in the list are same. If it is not, you will have to use another parameter to load your data. x= np.load('file.npy', pickle=True). – Baerl 3/7, 2020 at 8:26

Sorry for the delayed reply. I ran it and it worked. I have marked the answer as correct. Also, please upvote the question too. Thank you. – Limbo 6/7, 2020 at 10:41

Wouldn't it work if I directly dump the embeddings to a pickle file? I read that pkl files keeps the state as is. – Rudderpost 6/1, 2022 at 9:21

@theProcrastinator Yes, it will work. Pickle can handle most of the types and objects with the states obviously. – Baerl 7/1, 2022 at 1:53

@NazmulHasan These days people recommend using Vector DB over np, to store vector embeddings. What are your views on current scenario? – Concertmaster 1/11, 2023 at 8:14

@Concertmaster it completely depends on the use cases. If you want to train the model only and will be no use later, I think it's not a good idea to use vector DB. But if you want to use embedding to do some analysis or wanna a build a recommender system or similarity search or some like that, vector db is better choice as it comes with some NN and statistics feature. – Baerl 4/11, 2023 at 5:58

@NazmulHasan Insightful, thanks for sharing your views – Concertmaster 6/11, 2023 at 6:36

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags