B Tree Index vs Inverted Index?
Asked Answered
A

1

9

Here is mine understanding about both

B Tree index :- It is generally used database column. It keeps the column content as key and row_id as value . It keeps the key in sorted fashion to quickly find the key and row location

Inverted Index :- Generally used in full text search. Here also word in document works as key, stored in sorted fashion along with doucument location/id as value.

So what's the difference b/w B tree index and Inverted index . To me they looks same

Arachne answered 28/11, 2017 at 17:14 Comment(0)
N
20

Short answer:

  • yes, they have the same purpose - finding things fast
  • difference: what are they useful for / particularly good at
  • and btw the naming is just awfully confusing too

Long answer:

The naming

My knowledge comes from practice with SQL world, so for me the data storage used to be equal to "database" and the structure that allows to find things quick - an "index".

The trick is - search engines already call their storage "index", so how do you call that index-of-"index"? "Inverted Index", of course! Why inverted? Because, as I can already see in your question, it just inverts the the primary storage. Storage is like primary key --> values, that helper-structure inverts it to values --> primary key and helps quickly finding documents by values.

Purpose

Your question has a mix of Ideas. "Inverted index" means actually more like "a data structure that helps finding documents that are already in storage" whereas B-Tree is just an implementation of such structure.

An index could be theoretically implemented with any data structure you want. Hashes, Graphs, Trees, Arrays, Bitmaps.. it just depends on your usecase.

The differences

B-Tree is good for data that changes, so it's used e.g. in databases and filesystems. Downside: multiple indices cannot be used together in one query (I guess because this structure is dynamic and thus references back to documents are not sorted) and it's data tends to become scattered, so the IO can become an issue.

"Inverted index" uses Bitmaps/Arrays and everything's sorted (list of values and the list of references to documents). These are good for static data sets. And because of sorted nature, multiple indices can be used together. Downside: updating is not performant (new document means inserting values somewhere in a sorted list), tricks are used like keeping batches of data together as it comes in and merging into bigger batches in a background process.

Nidifugous answered 28/3, 2018 at 22:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.