Postgres usage of btree indexes vs MySQL B+trees

Asked 8/10, 2015 at 7:21 Answered 1/11, 2015 at 17:2

mysql performance postgresql b-tree b-tree-index

We are in the process of migrating from MySQL to PGSQL and we have a 100 million row table.

When I was trying to ascertain how much space both systems use, I found much less difference for tables, but found huge differences for indexes.

MySQL indexes were occupying more size than the table data itself and postgres was using considerably lesser sizes.

When digging through for the reason, I found that MySQL uses B+ trees to store the indexes and postgres uses B-trees.
MySQL usage of indexes was a little different, it stores the data along with the indexes (due to which the increased size), but postgres doesn't.

Now the questions:

Comparing B-tree and B+ trees on database speak, it is better to use B+trees since they are better for range queries O(m) + O(logN) - where m in the range and lookup is logarithmic in B+trees?

Now in B-trees the lookup is logarithmic for range queries it shoots up to O(N) since it does not have the linked list underlying structure for the data nodes. With that said, why does postgres uses B-trees? Does it perform well for range queries (it does, but how does it handle internally with B-trees)?
The above question is from a postgres point of view, but from a MySQL perspective, why does it use more storage than postgres, what is the performance benefit of using B+trees in reality?

I could have missed/misunderstood many things, so please feel free to correct my understanding here.

Edit for answering Rick James questions

I am using InnoDB engine for MySQL
I built the index after populating the data - same way I did in postgres
The indexes are not UNIQUE indexes, just normal indexes
There were no random inserts, I used csv loading in both postgres and MySQL and only after this I created the indexes.
Postgres block size for both indexes and data is 8KB, I am not sure for MySQL, but I didn't change it, so it must be the defaults.
I would not call the rows big, they have around 4 text fields with 200 characters long, 4 decimal fields and 2 bigint fields - 19 numbers long.
The P.K is a bigint column with 19 numbers,I am not sure if this is bulky? On what scale should be differentiate bulky vs non-bulky?
The MySQL table size was 600 MB and Postgres was around 310 MB both including indexes - this amounts to 48% bigger size if my math is right.But is there a way that I can measure the index size alone in MySQL excluding the table size? That can lead to better numbers I guess.
Machine info : I had enough RAM - 256GB to fit all the tables and indexes together, but I don't think we need to traverse this route at all, I didn't see any noticeable performance difference in both of them.

Additional Questions

When we say fragmentation occurs ? Is there a way to do de-fragmentation so that we can say that beyond this, there is nothing to be done.I am using Cent OS by the way.
Is there a way to measure index size along in MySQL, ignoring the primary key as it is clustered, so that we can actually see what type is occupying more size if any.

Tijuana answered 8/10, 2015 at 7:21 Comment(8)

It's not often I see a MySQL-vs-PostgreSQL question that's relevant, specific, and not mainly a matter of opinion. I'm interested in responses myself, though I think you'll have trouble finding deep expertise in both DBMS's innards. – Quotient 8/10, 2015 at 9:33

Related: B trees, B+ trees difference. – Alenealenson 8/10, 2015 at 9:59

@CraigRinger : Let's hope that we find answers :) – Tijuana 8/10, 2015 at 11:32

Something's wrong here. AFAIK PostgreSQL uses B+tree. At least you may be sure that index leaf nodes are connected with bidirectional links and there is no performance problem with range queries. – Yangyangtze 8/10, 2015 at 12:4

@ЕгорРогов : Your comment made me curious,so I went and took a look at the source code particularly a readme file which talks about forward and backward pointers to the btree pages.This enables range queries to use the indexes I suppose.I would appreciate if some contributor confirmed on this. – Tijuana 8/10, 2015 at 13:51

How did you measure the index size in MySQL? Everything I have found is grossly pessimistic, apparently due to including pre-allocated chunks ("extents") that are not yet in use. – Seymourseys 10/10, 2015 at 17:22

@RickJames : I used the technique mentioned here. First I measure with indexes, then dropped them and then measured again. – Tijuana 12/10, 2015 at 6:17

That technique may have rebuild (defragmented) the table and other indexes, too. So the measurement may be flawed. (I do not have a "good" way to measure index size, especially if there are multiple indexes.) – Seymourseys 13/10, 2015 at 15:2

First, and foremost, if you are not using InnoDB, close this question, rebuild with InnoDB, then see if you need to re-open the question. MyISAM is not preferred and should not be discussed.

How did you build the indexes in MySQL? There are several ways to explicitly or implicitly build indexes; they lead to better or worse packing.

MySQL: Data and Indexes are stored in B+Trees composed of 16KB blocks.

MySQL: UNIQUE indexes (including the PRIMARY KEY) must be updated as you insert rows. So, a UNIQUE index will necessarily have a lot of block splits, etc.

MySQL: The PRIMARY KEY is clustered with the data, so it effectively takes zero space. If you load the data in PK order, then the block fragmentation is minimal.

Non-UNIQUE secondary keys may be built on the fly, which leads to some fragmentation. Or they can be constructed after the table is loaded; this leads to denser packing.

Secondary keys (UNIQUE or not) implicitly include the PRIMARY KEY in them. If the PK is "large" then the secondary keys are bulky. What is your PK? Is this the 'answer'?

In theory, totally random inserts into a BTree lead to a the blocks being about 69% full. Maybe this is the answer. Is MySQL 45% bigger (1/69%)?

With 100M rows, probably many operations are I/O-bound because you don't have enough RAM to cache all the data and/or index blocks needed. If everything is cached, then B-Tree versus B+Tree won't make much difference. Let's analyze what needs to happen for a range query when things are not fully cached.

With either type of Tree, the operation starts with a drill-down in the Tree. For MySQL, 100M rows will have a B+Tree of about 4 levels deep. The 3 non-leaf nodes (again 16KB blocks) will be cached (if they weren't already) and be reused. Even for Postgres, this caching probably occurs. (I don't know Postgres.) Then the range scan starts. With MySQL it walks through the rest of the block. (Rule of Thumb: 100 rows in a block.) Ditto for Postgres?

At the end of the block something different has to happen. For MySQL, there is a link to the next block. That block (with 100 more rows) is fetched from disk (if not cached). For a B-Tree the non-leaf nodes need to be traversed again. 2, probably 3 levels are still cached. I would expect the need for another non-leaf node to be fetched from disk only 1/10K rows. (10K = 100*100) That is, Postgres might hit the disk 1% more often than MySQL, even on a "cold" system.

On the other hand, if the rows are so fat that only 1 or 2 can fit in a 16K block, the "100" I kept using is more like "2", and the 1% becomes maybe 50%. That is, if you have big rows this could be the "answer". Is it?

What is the block size in Postgres? Note that many of the computations above depend on the relative size between the block and the data. Could this be an answer?

Conclusion: I've given you 4 possible answers. Would you like to augment the question to confirm or refute that each of these apply? (Existence of secondary indexes, large PK, inefficient building of secondary indexes, large rows, block size, ...)

Addenda about PRIMARY KEY

For InnoDB, another thing to note... It is best to have a PRIMARY KEY in the definition of the table before loading the data. It is also best to sort the data in PK order before LOAD DATA. Without specifying any PRIMARY KEY or UNIQUE key, InnoDB builds a hidden 6-byte PK; this is usually sub-optimal.

Seymourseys answered 31/10, 2015 at 1:18 Comment(4)

I have updated my question with answers to your questions. – Tijuana 2/11, 2015 at 6:11

I added a note about PRIMARY KEY. It sounds like you don't have a PK? If not, 6 bytes is added to each secondary index. If your PK should be something shorter, this is another case of extra space. – Seymourseys 2/11, 2015 at 16:58

I am new to MySQL, so I got to re-learn things here as well.But there is one point that seems to be wrong,you have said primary key takes zero space.Well I created a brand new table with 61.5 MB in size and after I created the primary key it bumped up to 94 MB in size.Whats wrong here? and this size is even greater than a normal index on that column. – Tijuana 4/11, 2015 at 13:4

In "adding" your PK, it removed the 6-byte PK and declared your column(s) to be the PK. So, that saved 6B/row. But... The table was (I assume) not in the order of the new PK, so the table needed to be shuffled around to make it in PK order. To do this, it simply inserted the rows wherever they were needed. This led to block splits, etc., hence it is not packed as well. 69% is about what you can expect from random inserts into a B-Tree; that's about what you got. – Seymourseys 26/11, 2015 at 3:22

At databases you have often queries who delivers some data ranges like id's from 100 to 200.
In this case

B-Tree needs to follow the path from the root to the leafs for every single entry to get the data-pointer.
B+-Trees can 'walk' through the leafs and has to follow the path to the leafs only the first time (i.e. for the id 100)

This is because B+-Trees stores only the data (or data-pointer) in the leafs and the leafs are linked so that you can perform a rapid in-order-traversal.

B+-Tree

Another point is:
At B+Trees the inner nodes stores only pointer to other nodes without any data-pointer, so you have more space for pointers and you need less IO-Operations and you can store more node-pointers at a memory-page.

So for range-queries B+-Trees are the optimum data-strucure. For single selections B-Trees might be better (causes of the depth/size of the tree), cause the data-pointer are located also inside the tree.

Photophobia answered 23/10, 2015 at 11:18 Comment(7)

Yes, I am aware of this.But my question was 1) If postgres is using B-Trees how does it handle range queries ? 2) As a counter argument to the previous point, if postgres is indeed using some form of modified B-Trees which has the ability to do range queries, then why MySQL index size is bigger in comparison with postgres? – Tijuana 23/10, 2015 at 12:8

1) Postrges uses B-Tree and (i suggest) it perfomrs range queries with an inorder-traversal, cause this is the easiest/fastest way to do so. 2) Cause B+-Trees stores data-pointer only in the leafs and every (upper) node has duplicated keys (see image above). – Photophobia 23/10, 2015 at 12:27

I don't think that's true since a postgres range query uses an index only scan if the column is indexed and In-order traversal is O(n).So if it is as you say, then postgres should choose sequential scan since that is also O(n) but of course uses more memory but uses less disk I/O by avoiding index hits.And please observe my second question closely, if postgres uses b-trees with range query optimization, then why MySQL uses indexes with more size can't it just use postgres style trees for indexing? – Tijuana 23/10, 2015 at 12:37

I'm not sure what you are saying/meaning. Sometimes B-Trees are faster than B+-Trees, specially when you search for single entries, cause you mustn't walk to the leaf cause each node got data-pointer. So if you hit the root in B-Tree you're done, in B+-Tree you've to walk to the leaf. – Photophobia 23/10, 2015 at 13:1

Yes you have to walk to the leaf, but it is O(log(n)) which is really fast and it is just one level for the data nodes, and I don't believe that postgres just uses plain b-trees since range queries with b-trees is very costly - O(n) in order traversal - so something is happening behind the scenes and thus my question. – Tijuana 23/10, 2015 at 13:28

I looked into the source-code of postgres and under 'backend/access/nbtree/nbtsearch.c' you'll find the implementation of the search function and it looks like a standard implementation of that. see postgre src – Photophobia 23/10, 2015 at 14:7

Let us continue this discussion in chat. – Tijuana 23/10, 2015 at 14:12

MySQL and PostgreSQL aren't really comparable here Innodb uses an index to store table data (and secondary indexes just point at the pkey). This is great for single row pkey lookups and with B+ trees, do ok with range queries on the pkey field, but have performance drawbacks for everything else.

PostgreSQL uses heap tables and puts indexes as separate. It supports a number of different indexing algorithms. Depending on your range query, a btree index may not help you and you may need a GiST Index instead. Similarly GIN indexes work well with member lookups (for arrays, fts etc).

I think btree is used because it excels at the simple use case: what roes contain the following data? This becomes a building block of GIN for example.

But it isn't true that PostgreSQL cannot use B+ trees. GiST is built on B+ Tree indexes in a generalized format. So PostgreSQL gives you the option to use B+ trees where they come in handy.

Linkous answered 1/11, 2015 at 17:2 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags