Update an element in faiss index
Asked Answered
W

3

8

I am using faiss indexflatIP to store vectors related to some words. I also use another list to store words (the vector of the nth element in the list is nth vector in faiss index). I have two questions:

  1. Is there a better way to relate words to their vectors?
  2. Can I update the nth element in the faiss?
Weksler answered 26/3, 2022 at 12:10 Comment(0)
L
9

You can do both.

  1. Is there a better way to relate words to their vectors?

Call index.add_with_ids(vectors, ids)

Some index types support the method add_with_ids, but flat indexes don't.

If you call the method on a flat index, you will receive the error add_with_ids not implemented for this type of index

If you want to use IDs with a flat index, you must use index2 = faiss.IndexIDMap(index)

  1. Can I update the nth element in the faiss?

If you want to update some encodings, first remove them, then add them again with add_with_ids

If you don't remove the original IDs first, you will have duplicates and search results will be messed up.

To remove an array of IDs, call index.remove_ids(ids_to_replace)

Nota bene: IDs must be of np.int64 type.

Lade answered 19/4, 2022 at 15:4 Comment(3)
Are you sure it works with np.int64 as far as I know, there is yet a bug?Vulcanism
I'm explicitly casting it with int64_ids = np.array(ids, dtype=np.int64) and it works fineLade
You use 'np.array'. That might be why it works. I had to work with Python int to avoid type errors. But I do not use 'np.array' for IDs. Thanks.Vulcanism
H
4

You can use the add_with_ids method to add vectors with integer ID values, and I believe this will allow you to update the specific vector too - but you will need to build some sort of added layer of vector-ID mapping and management outside of Faiss because it isn't supported otherwise. I've done this before and it isn't very fun.

If you're open to Faiss alternatives, I'd recommend looking into Pinecone. It manages everything for you so you you just insert your (id, vector) pairs using their upsert method, then to update the vectors you just upsert the new vector with the same ID. It takes maybe 5-10 minutes to set up, this guide explains how.

Hindgut answered 28/3, 2022 at 18:32 Comment(0)
B
0

faiss is only an ann algorithm library, and cannot be used for data persistence and management

There are some open source vector databases on the github, they may be able to help you. such like milvus, vespa, and so on

milvus is the one with the most stars

https://milvus.io

https://github.com/milvus-io/milvus

Badly answered 28/3, 2022 at 6:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.