B Tree Index vs Inverted Index?

Short answer:

yes, they have the same purpose - finding things fast
difference: what are they useful for / particularly good at
and btw the naming is just awfully confusing too

Long answer:

The naming

My knowledge comes from practice with SQL world, so for me the data storage used to be equal to "database" and the structure that allows to find things quick - an "index".

The trick is - search engines already call their storage "index", so how do you call that index-of-"index"? "Inverted Index", of course! Why inverted? Because, as I can already see in your question, it just inverts the the primary storage. Storage is like primary key --> values, that helper-structure inverts it to values --> primary key and helps quickly finding documents by values.

Purpose

Your question has a mix of Ideas. "Inverted index" means actually more like "a data structure that helps finding documents that are already in storage" whereas B-Tree is just an implementation of such structure.

An index could be theoretically implemented with any data structure you want. Hashes, Graphs, Trees, Arrays, Bitmaps.. it just depends on your usecase.

The differences

B-Tree is good for data that changes, so it's used e.g. in databases and filesystems. Downside: multiple indices cannot be used together in one query (I guess because this structure is dynamic and thus references back to documents are not sorted) and it's data tends to become scattered, so the IO can become an issue.

"Inverted index" uses Bitmaps/Arrays and everything's sorted (list of values and the list of references to documents). These are good for static data sets. And because of sorted nature, multiple indices can be used together. Downside: updating is not performant (new document means inserting values somewhere in a sorted list), tricks are used like keeping batches of data together as it comes in and merging into bigger batches in a background process.

Recommended topics

Hot tags