Refreshing materialized view CONCURRENTLY causes table bloat
Asked Answered
E

2

26

In PostgreSQL 9.5 I've decided to create a materialized view "effects" and scheduled an hourly concurrent refresh, since I wanted it to be always available:

REFRESH MATERIALIZED VIEW CONCURRENTLY effects;

In the beginning everything worked well, my materialized view was refreshing and disk space usage remained more or less constant.


The Issue

After some time though, disk usage started to linearly grow.

I've concluded that the reason for this growth is the materialized view and ran the query from this answer to get the following result:

               what                |  bytes/ct   | bytes_pretty | bytes_per_row
-----------------------------------+-------------+--------------+---------------
 core_relation_size                | 32224567296 | 30 GB        |         21140
 visibility_map                    |      991232 | 968 kB       |             0
 free_space_map                    |     7938048 | 7752 kB      |             5
 table_size_incl_toast             | 32233504768 | 30 GB        |         21146
 indexes_size                      | 22975922176 | 21 GB        |         15073
 total_size_incl_toast_and_indexes | 55209426944 | 51 GB        |         36220
 live_rows_in_text_representation  |   316152215 | 302 MB       |           207
 ------------------------------    |             |              |
 row_count                         |     1524278 |              |
 live_tuples                       |      676439 |              |
 dead_tuples                       |     1524208 |              |
(11 rows)

Then, I found that the last time this table was autovacuumed was two days ago, by running:

SELECT relname, n_dead_tup, last_vacuum, last_autovacuum FROM pg_stat_user_tables ORDER BY n_dead_tup desc;

I decided to manually call vacuum (VERBOSE) effects. It ran for about half an hour and produced the following output:

vacuum (VERBOSE) effects;
INFO:  vacuuming "public.effects"
INFO:  scanned index "effects_idx" to remove 129523454 row versions
DETAIL:  CPU 12.16s/55.76u sec elapsed 119.87 sec

INFO:  scanned index "effects_campaign_created_idx" to remove 129523454 row versions
DETAIL:  CPU 19.11s/154.59u sec elapsed 337.91 sec

INFO:  scanned index "effects_campaign_name_idx" to remove 129523454 row versions
DETAIL:  CPU 28.51s/151.16u sec elapsed 315.51 sec

INFO:  scanned index "effects_campaign_event_type_idx" to remove 129523454 row versions
DETAIL:  CPU 38.60s/373.59u sec elapsed 601.73 sec

INFO:  "effects": removed 129523454 row versions in 3865537 pages
DETAIL:  CPU 59.02s/36.48u sec elapsed 326.43 sec

INFO:  index "effects_idx" now contains 1524208 row versions in 472258 pages
DETAIL:  113679000 index row versions were removed.
463896 index pages have been deleted, 60386 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.

INFO:  index "effects_campaign_created_idx" now contains 1524208 row versions in 664910 pages
DETAIL:  121637488 index row versions were removed.
41014 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO:  index "effects_campaign_name_idx" now contains 1524208 row versions in 711391 pages
DETAIL:  125650677 index row versions were removed.
696221 index pages have been deleted, 28150 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO:  index "effects_campaign_event_type_idx" now contains 1524208 row versions in 956018 pages
DETAIL:  127659042 index row versions were removed.
934288 index pages have been deleted, 32105 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO:  "effects": found 0 removable, 493 nonremovable row versions in 3880239 out of 3933663 pages
DETAIL:  0 dead row versions cannot be removed yet.

There were 666922 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 180.49s/788.60u sec elapsed 1799.42 sec.

INFO:  vacuuming "pg_toast.pg_toast_1371723"
INFO:  index "pg_toast_1371723_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO:  "pg_toast_1371723": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

VACUUM

At this point I thought the problem was resolved and started thinking what could interfere with the autovacuum. To be sure, I ran again the query to find space usage by that table and to my surprise it didn't change.

Only after I called REFRESH MATERIALIZED VIEW effects; not concurrently. Only now the output of the query to check table size was:

               what                | bytes/ct  | bytes_pretty | bytes_per_row
-----------------------------------+-----------+--------------+---------------
 core_relation_size                | 374005760 | 357 MB       |           245
 visibility_map                    |         0 | 0 bytes      |             0
 free_space_map                    |         0 | 0 bytes      |             0
 table_size_incl_toast             | 374013952 | 357 MB       |           245
 indexes_size                      | 213843968 | 204 MB       |           140
 total_size_incl_toast_and_indexes | 587857920 | 561 MB       |           385
 live_rows_in_text_representation  | 316175512 | 302 MB       |           207
 ------------------------------    |           |              |
 row_count                         |   1524385 |              |
 live_tuples                       |    676439 |              |
 dead_tuples                       |   1524208 |              |
(11 rows)

And everything went back to normal...


Questions

The problem is solved but there's still a fair amount of confusion

  1. Could anyone please explain what was the issue I experienced?
  2. How could I avoid this in the future?
Eccentric answered 20/9, 2018 at 9:7 Comment(0)
T
32

First, let's explain the bloat

REFRESH MATERIALIZED CONCURRENTLY is implemented in src/backend/commands/matview.c, and the comment is enlightening:

/*
 * refresh_by_match_merge
 *
 * Refresh a materialized view with transactional semantics, while allowing
 * concurrent reads.
 *
 * This is called after a new version of the data has been created in a
 * temporary table.  It performs a full outer join against the old version of
 * the data, producing "diff" results.  This join cannot work if there are any
 * duplicated rows in either the old or new versions, in the sense that every
 * column would compare as equal between the two rows.  It does work correctly
 * in the face of rows which have at least one NULL value, with all non-NULL
 * columns equal.  The behavior of NULLs on equality tests and on UNIQUE
 * indexes turns out to be quite convenient here; the tests we need to make
 * are consistent with default behavior.  If there is at least one UNIQUE
 * index on the materialized view, we have exactly the guarantee we need.
 *
 * The temporary table used to hold the diff results contains just the TID of
 * the old record (if matched) and the ROW from the new table as a single
 * column of complex record type (if matched).
 *
 * Once we have the diff table, we perform set-based DELETE and INSERT
 * operations against the materialized view, and discard both temporary
 * tables.
 *
 * Everything from the generation of the new data to applying the differences
 * takes place under cover of an ExclusiveLock, since it seems as though we
 * would want to prohibit not only concurrent REFRESH operations, but also
 * incremental maintenance.  It also doesn't seem reasonable or safe to allow
 * SELECT FOR UPDATE or SELECT FOR SHARE on rows being updated or deleted by
 * this command.
 */

So the materialized view is refreshed by deleting rows and inserting new ones from a temporary table. This can of course lead to dead tuples and table bloat, which is confirmed by your VACUUM (VERBOSE) output.

In a way, that's the price you pay for CONCURRENTLY.

Second, let's debunk the myth that VACUUM cannot remove the dead tuples

VACUUM will remove the dead rows, but it cannot reduce the bloat (that can be done with VACUUM (FULL), but that would lock the view just like REFRESH MATERIALIZED VIEW without CONCURRENTLY).

I suspect that the query you use to determine the number of dead tuples is just an estimate that gets the number of dead tuples wrong.

An example that demonstrates all that

CREATE TABLE tab AS SELECT id, 'row ' || id AS val FROM generate_series(1, 100000) AS id;

-- make sure autovacuum doesn't spoil our demonstration
CREATE MATERIALIZED VIEW tab_v WITH (autovacuum_enabled = off)
AS SELECT * FROM tab;

-- required for CONCURRENTLY
CREATE UNIQUE INDEX ON tab_v (id);

Use the pgstattuple extension to accurately measure table bloat:

CREATE EXTENSION pgstattuple;

SELECT * FROM pgstattuple('tab_v');
-[ RECORD 1 ]------+--------
table_len          | 4431872
tuple_count        | 100000
tuple_len          | 3788895
tuple_percent      | 85.49
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 16724
free_percent       | 0.38

Now let's delete some rows in the table, refresh and measure again:

DELETE FROM tab WHERE id BETWEEN 40001 AND 80000;

REFRESH MATERIALIZED VIEW CONCURRENTLY tab_v;

SELECT * FROM pgstattuple('tab_v');
-[ RECORD 1 ]------+--------
table_len          | 4431872
tuple_count        | 60000
tuple_len          | 2268895
tuple_percent      | 51.19
dead_tuple_count   | 40000
dead_tuple_len     | 1520000
dead_tuple_percent | 34.3
free_space         | 16724
free_percent       | 0.38

Lots of dead tuples. VACUUM gets rid of these:

VACUUM tab_v;

SELECT * FROM pgstattuple('tab_v');
-[ RECORD 1 ]------+--------
table_len          | 4431872
tuple_count        | 60000
tuple_len          | 2268895
tuple_percent      | 51.19
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 1616724
free_percent       | 36.48

The dead tuples are gone, but now there is a lot of empty space.

Travis answered 20/9, 2018 at 18:24 Comment(9)
Thank you for the elaborate answer! "I suspect that the query you use to determine the number of dead tuples is just an estimate that gets the number of dead tuples wrong." - So you're saying that my VACUUM VERBOSE effects; does in fact remove the dead tuples but doesn't reduce the space the table consumes? "VACUUM will remove the dead rows, but it cannot reduce the bloat" - Is there a way to reduce bloat without locking? Choosing between bloat and locking renders the mat_view rather useless in my case.Eccentric
Additionally, it's not quite clear why the bloat started happening 'suddenly'. Up to some point, the view was refreshing concurrently and deleted rows were released and only at some point it started growing and growing.Eccentric
You understood me. There are tools like pg_squeeze and pgrepack to reorganize tables without VACUUM (FULL). Even with some bloat, the materialized view should not grow indefinitely, because the empty space after a (normal) VACUUM will be reused.Travis
From what I saw in my case it did grow linearly and indefinitely. Over 100+ GB instead of 1 GB. Perhaps this is a special case and it is good to be aware of this possibility. Perhaps some other condition triggered it. I decided to abandon the materialized view approach because of this and because the table I'm scanning every time is only inserted to - hence I can do the refresh by adding from it incrementally (save a lot of work load as well as avoid the aforementioned issue).Eccentric
Maybe there is something in your database that blocks autovacuum - long transactions, prepared transactions or replication slots.Travis
Hmmm I thought about this as well, but figured that if this was the case - I'd have see bloat in more tables, not only in the materialized view when refreshed concurrently. Even though I agree that it would most noticeable with concurrent refresh...Eccentric
Might also be an issue with replication slots. Maybe something in my DB provider (Aiven) changed as this issue started to happen seemingly randomly.Eccentric
Maybe my blog post can help.Travis
Actually, that was the place where I started. And yes, I did rule out the possibility of a long running transaction blocking the vacuum. Great post btw.Eccentric
L
4

I'm adding to @Laurenz Albe full answer provided above. There is another possibility for the bloating. Consider the following scenario:

You have a view that rarely changes in most (1000000 records, 100 records change per request) and yet you still get 500000 dead tuples. The reason for that can be null in the index column.

As described in the answer above when views materialized concurrently, a copy is recreated and compared. The comparison uses the mandatory unique index but, what about nulls? nulls are never equal to each other in sql. So if your index key allow nulls, the records that include nulls even if unchanged will be always recreated and added to the table

In order to fix this what you can do to remove the bloat is to add additional column that will coalesce the null column to some never used value (-1, to_timestamp(0), ...) and use this column only for the primary index

Lindyline answered 3/9, 2020 at 10:46 Comment(3)
I think that was actually the cause of the original issue.Eccentric
"So if you primary key allow nulls ..." PK columns are always NOT NULL. See: https://mcmap.net/q/338712/-why-can-i-create-a-table-with-primary-key-on-a-nullable-columnUnipod
the unique index for materialized view doents have to be a primary, and although unique, contain nulls. fixed the answearLindyline

© 2022 - 2024 — McMap. All rights reserved.