SQL explain plan: what is Materialize?
Asked Answered
B

5

62

I asked PostgreSQL to explain my query. Part of the explanation was:

table_name --> Materialize

What does materialize do? I'm joining two tables, not views or anything like that.

Bencher answered 12/6, 2010 at 19:25 Comment(0)
I
83

A materialize node means the output of whatever is below it in the tree (which can be a scan, or a full set of joins or something like that) is materalized into memory before the upper node is executed. This is usually done when the outer node needs a source that it can re-scan for some reason or other.

So in your case, the planner is determining that the result of a scan on one of your tables will fit in memory, and it till make it possible to choose an upper join operation that requires rescans while still being cheaper.

Impudence answered 13/6, 2010 at 10:14 Comment(4)
I like this answer better - there's more reason to materializing than just not using an index.Dilute
An easy way to explain it is that it stores the intermediate result.Anselma
You can affect the threshold at which this behavior is triggered using the various buffer config vars, but also "set from_collapse_limit=1; set join_collapse_limit=1" Very usefully for testing or for runtime query plan optimization.Extrasystole
Do you have any link to documentation? I couldn't find any.Friction
M
10

It means that it can't use any index (or similar method) to make the join efficient, so as a last resort is materializes the result from one of the tables to have a smaller set to work with when joining against the other table.

Mindymine answered 12/6, 2010 at 19:51 Comment(6)
That's only partially correct. In many (most) cases it will not put a materialize node in just because it can't use an index - it will just work off a sequential scan on each table. Materializing the output of a seqscan doesn't make a huge difference - I find it more common to show up higher up in the tree than just above a scan node.Impudence
@Magnus: Yes, if there is any other method to join the tables, it will use it, but in this case there obviously isn't, and it will put in the materialize as the last option.Mindymine
No, it won't do that as the last option. The last option is to just do the sequential scan and not materialize at all. It does it because it can drive a smarter plan that way, and doesn't have to fall back to the last-resort.Impudence
@Magnus: The materialize is always inserted as the last option: "If the inner input set of a Merge Join operator is not produced by a Seq Scan, an Index Scan, a Sort, or a Materialize operator, the planner/optimizer will insert a Materialize operator into the plan." iphelp.ru/faq/15/ch04lev1sec3.htmlMindymine
@Guffa: That's only true if the join is a mergejoin. Which is not likely to be chosen as a last-resort plan. It can happen as last-resort, but not with a seq-scan (as we have here) - which the text you're quoting actually tells you. BTW, The version of the book you're referring to is for ancient versions of PostgreSQL - I'd check up at least twice on most of the things you find in there.Impudence
@Guffa: My understanding is that a Materialise node is inserted only if (a) there is a cheap subplan that would need to scan its input rows more than once (such as the inner side of a plain ol' nested loop join) and (b) it would be expensive to regenerate those rows (such as if the input comes from a sequential scan of a large table but is expected to produce a small number of rows).Muss
D
7

In merge join and nested loop join, the database will "rescan" the inner loop. Basically like:

for each row in outer table:
    for each row in inner table:
        # do something

The planner will materializes the inner loop table which means load the whole table in an in-memory buffer to avoid the expensive disk IO cost.

A useful link.

Dipietro answered 10/3, 2019 at 22:51 Comment(0)
M
2

From a more empirical point of view, you can do your EXPLAIN ANALYZE and keep those results, then change the flag:

set enable_material=off;

Go and run the same EXPLAIN ANALYZE again to compare the results line by line and you will see exactly what changes, whether the query time is better or worse, etc. Tinker around with the long list of query config flags and observe the effects.

https://www.postgresql.org/docs/current/runtime-config-query.html

Michi answered 11/12, 2019 at 22:42 Comment(0)
B
1

We can say the Materialize command will create a View of a table(just like a virtual table OR a SnapShot of table in Memory)

It is used to enhance visibility of the data by presenting it in a more informational context and to control access to the data by hiding critical or sensitive data from users who don't need to see it/or to whom we don't what to show it.

The main benefit of using Materialize/Snap-Shot is to reduce the cost of query on that table afterwards. In execution plan the same states that how much benefit we may fetch if we use materialize compare to when it is not used!

Explain to check execution plan

->  Materialize  (cost=0.29..8.51 rows=10 width=244)
         ->  Index Scan using tenk2_unique2 on tenk2 t2  (cost=0.29..8.46 rows=10 width=244)
               Index Cond: (unique2 < 10)

I have ignored this --> as there is not such operator in my knowledge, plus the -- will comment the expression afterwards, I have assumed you meant this -> only.

I hope this helps..

Bulletproof answered 12/7, 2013 at 12:10 Comment(1)
Oh! , it makes sense now, :) I am sorry for confusion LinkBulletproof

© 2022 - 2024 — McMap. All rights reserved.