Check last refreshed time for materialized view
Asked Answered
F

6

37

I've got a materialized view called price_changes used for some reporting. I've also got a cron job refreshing the materialized view with refresh materialized view price_changes. Everything is working great.

I'd like to give users looking at the report a message "Data is fresh as of X". I could store it somewhere when cron runs, but does postgres already store this metadata somewhere?

Foilsman answered 16/1, 2014 at 15:35 Comment(1)
More detail on this: dba.stackexchange.com/q/58214/104401Install
H
21

I don't think there is anything built in the system that provides this as of 9.3.4. When I need to provide the date of last refresh I add a column called 'last_refresh' to the select query in the materialized view since data in the materialized view won't change until it is refreshed.

I also prefer this for security reasons as you may not want to give the sql user access to the system tables, if the information is being stored there.

Depending if you need the time, you can use either:

  1. CURRENT_DATE
  2. now()

Just date:

CREATE MATERIALIZED VIEW mv_address AS 
SELECT *, CURRENT_DATE AS last_refresh FROM address;

With date and time:

CREATE MATERIALIZED VIEW mv_address AS 
SELECT *, now() AS last_refresh FROM address;

Update 2017-02-17:

PostgreSQL version 9.4+ now includes CONCURRENTLY option. If you use REFRESH MATERIALIZED VIEW CONCURRENTLY option be aware of what @Smudge indicated in the comments. This would really only be an issue for large and frequently updated data sets. If your data set is small or infrequently updated then you should be fine.

Heredes answered 3/4, 2014 at 7:12 Comment(4)
Heads up: This approach will cause REFRESH MATERIALIZED VIEW CONCURRENTLY to update every row, every time. For smaller datasets with infrequent refreshes this won't be a huge issue, but for lots of data and/or frequent updates, the rate of DELETE and INSERT might overtake the autovacuum daemon's ability to VACUUM the table, causing basic query performance to nosedive.Puerile
Speaking from real world experience - the issue Smudge is referring to very easily snowballs until your database is spending 100% of its time dealing with the mountain of dead tuples and cannot recover until you DROP and re-CREATE the materialized view. Use this solution with extreme caution.Sammiesammons
@Puerile Could you explain why this doesn't work with REFRESH MATERIALIZED VIEW CONCURRENTLY? Will CURRENT_DATE cause problems as much as now()?Aeciospore
@Aeciospore The CONCURRENTLY option creates a temporary table for the refreshed view data, which it then compares with the real materialized view in order to produce UPDATE and INSERT statements to efficiently update the view in place, but if every row has a new timestamp, then every row gets updated. Combine this with the fact that UPDATEs actually write new versions of rows and leave dead tuples to be vacuumed, this means that your entire table needs to be vacuumed on every refresh.Puerile
A
18
WITH
        pgdata AS (
                SELECT
                        setting AS path
                FROM
                        pg_settings
                WHERE
                        name = 'data_directory'
        ),
        path AS (
                SELECT
                        CASE
                                WHEN pgdata.separator = '/' THEN '/'    -- UNIX
                                ELSE '\'                                -- WINDOWS
                        END AS separator
                FROM 
                        (SELECT SUBSTR(path, 1, 1) AS separator FROM pgdata) AS pgdata
        )
SELECT
        ns.nspname||'.'||c.relname AS mview,
        (pg_stat_file(pgdata.path||path.separator||pg_relation_filepath(ns.nspname||'.'||c.relname))).modification AS refresh
FROM
        pgdata,
        path,
        pg_class c
JOIN
        pg_namespace ns ON c.relnamespace=ns.oid
WHERE
        c.relkind='m'
;
Aragonite answered 4/4, 2022 at 20:13 Comment(1)
Thanks, works great. I gave it a vote. Do you know if the timestamp gets updated as well when you refresh concurrently. Curiously it seems not on my server.Nosing
S
12

Since materialized views are segments of data stored on disk, they will have a corresponding file in the filesystem. When you call REFRESH MATERIALIZED VIEW, the data on disk gets re-created under a new filename. As such, you could look up the modification/creation timestamp for the view by cross-referencing with relfilenode in pg_class:

[user@server /]# psql -c "create materialized view myview as select aid from pgbench_accounts where aid < 100"
SELECT 99
[user@server /]# psql -c "select relfilenode from pg_class where relname = 'myview'"
 relfilenode 
-------------
       16445
(1 row)

[user@server /]# ls -l /var/lib/edb/as12/data/base/15369/16445
-rw------- 1 enterprisedb enterprisedb 8192 Jun 14 23:28 /var/lib/edb/as12/data/base/15369/16445
[user@server /]# date
Mon Jun 14 23:29:16 UTC 2021
[user@server /]# psql -c "refresh materialized view myview"
REFRESH MATERIALIZED VIEW
[user@server /]# psql -c "select relfilenode from pg_class where relname = 'myview'"
 relfilenode 
-------------
       16449
(1 row)

[user@server /]# ls -l /var/lib/edb/as12/data/base/15369/16449
-rw------- 1 enterprisedb enterprisedb 8192 Jun 14 23:29 /var/lib/edb/as12/data/base/15369/16449
[user@server /]# 
Sheers answered 14/6, 2021 at 23:32 Comment(3)
Wow, what an amazing answer 7 years later!Foilsman
Doesn't dba.stackexchange.com/a/58246/218196 show that this approach is at best very unreliable?Inimical
Good question! That post is regarding a table, not a materialized view. Tables are updated very frequently, and their timestamps vary depending on vacuum and extension activity. For materialized views, the files only change when refreshed. Any indexes on the materialized view are kept in other files, so vacuum activity on a materialized view's indexes will not alter the materialized view's relfilenode creation timestampSheers
Q
5

I think @richyen solution is the best if you have access to the server.

If you don't you have to resort to manually storing last refresh date as metadata somehow. Some options are:

  • As column as in @thames's response, being the main drawback the extra used storage for the column (repeated as many times as rows have the MV) and the refresh problems in big tables stated by @smudge.

  • In a custom table. The main drawback is that you have to ensure that all refreshed include the table update.

-- Create table
CREATE TABLE pg_matviews_last_refreshed (
  matviewowner NAME NOT NULL,
  matviewname NAME NOT NULL,
  schemaname NAME NOT NULL,
  last_refreshed TIMESTAMP NOT NULL,
  CONSTRAINT pk UNIQUE (matviewowner, matviewname, schemaname)
)

CREATE VIEW pg_matviews_extra AS
SELECT * 
FROM pg_matviews
NATURAL JOIN pg_matviews_last_refreshed

-- Set initial values
INSERT INTO pg_matviews_last_refreshed (
  matviewowner,
  matviewname,
  schemaname,
  last_refreshed
)
SELECT 
  matviewowner,
  matviewname,
  schemaname,
  CURRENT_TIMESTAMP
FROM pg_matviews;

-- Consult dates
SELECT * FROM pg_matviews_extra

-- Refresh 
BEGIN;
REFRESH MATERIALIZED VIEW CONCURRENTLY my_materialized_view;

-- Insert update date into last_refreshed table. Of course, if 
-- more complex permissions are into place, the query becomes more
-- complex to discover owner, schema, etc.
INSERT INTO pg_matviews_last_refreshed (matviewname, matviewowner, schemaname, last_refreshed)
SELECT matviewname, matviewowner, schemaname, CURRENT_TIMESTAMP
FROM pg_matviews
WHERE matviewname='my_materialized_view'
ON CONFLICT (matviewname, matviewowner, schemaname)
DO UPDATE SET last_refreshed=EXCLUDED.last_refreshed;
COMMIT;
  • Similar to having a dedicated table to store last refreshed date, you may store that info implicitly in the MV comment:
-- Refresh operation
BEGIN;
REFRESH MATERIALIZED VIEW CONCURRENTLY my_materialized_view;
-- This statement can be wrapped into a function to allow using CURRENT_TIMESTAMP
COMMENT ON MATERIALIZED VIEW my_materialized_view IS '{"last_refreshed": "2021-07-07T09:30:59Z"}'
COMMIT;


-- Retrieve refresh date
SELECT (pg_catalog.obj_description('my_materialized_view'::regclass, 'pg_class')::json->>'last_refreshed')::timestamp;
Quirites answered 7/7, 2021 at 9:39 Comment(1)
PD: I think is is possible to register triggers in materialized views. I wonder if it would be possible to register metadata update operations as triggers when refreshing the MVQuirites
N
4

Thank you @ajaest for your help. We had production environment with many instances. We had to prevent materialized vied, not to be refresh quicker then 50 sec. Its our solution. We used COMMENT ON MATERIALIZED VIEW

-- dodanie funkcji do odświeżania widoku
CREATE OR REPLACE FUNCTION public.refresh_mv_users()
    RETURNS timestamp AS $$
DECLARE
    last_refreshed  TIMESTAMP;
    next_refresh    TIMESTAMP;
    comment_sql     text;
BEGIN

    SELECT ((pg_catalog.obj_description('public.mv_users'::regclass, 'pg_class')::json->>'last_refreshed')::timestamp) into last_refreshed;
    SELECT (last_refreshed + (50 * interval '1 seconds')) into next_refresh;

    IF next_refresh < now() THEN
        REFRESH MATERIALIZED VIEW CONCURRENTLY public.mv_users;
        comment_sql := 'COMMENT ON MATERIALIZED VIEW public.mv_users is '|| quote_literal('{"last_refreshed": "' || now() || '"}');
        execute comment_sql;
        SELECT now() into last_refreshed;
    END IF;

    RETURN last_refreshed;
END;
$$  LANGUAGE plpgsql;
Normalcy answered 1/3, 2022 at 7:16 Comment(6)
I guess that the goal of having the last_refreshed timestamp is to know how updated is the data. Nevertheless, if you use CONCURRENTLY you will only have the timestamp the view started the update process in background right ? You need to remove CONCURRENLTY, i guess.Epididymis
We allow to refresh view only after 50 sec after REFRESH VIEW copleted. In postgres documentation I found : CONCURRENTLY option, PostgreSQL creates a temporary updated version of the materialized view, compares two versions, and performs INSERT and UPDATE only the differences. In this case CONCURRENTLY doesn't means many threads.Normalcy
When you say "We allow to refresh view only after 50 sec after REFRESH VIEW copleted" I guess you mean that you ensure in the software that calls of this procedure. The procedure it self doesn't wait if there is another refresh of the view running. In order to ensure in the SP what you want to do, you need to remove the CONCURRENTLY.Epididymis
Maybe you don't have problems because querying the table public.mv_users take less than 50 seconds. If that table starts to take 100 seconds to update, and you make a second call 60 seconds after the first one... then the second call will give you data outdated by 40 seconds. The CONCURRENTLY clause makes what you describe, but it start updating the view it in the background and return whatever it has in the moment of the query.Epididymis
Only removing the CONCURRENTLY waits until any ongoing refresh end, before giving you the result. And I guess that is you want, because the timestamp saved in the comment is supposed to give you the notion of when the data was last updated, not when the data started the update.Epididymis
Check the behavior of CONCURRENTLY in the docs: postgresql.org/docs/current/sql-refreshmaterializedview.html Refresh the materialized view without locking out concurrent selects on the materialized view. Without this option a refresh which affects a lot of rows will tend to use fewer resources and complete more quickly, but could block other connections which are trying to read from the materialized view. This option may be faster in cases where a small number of rows are affected.Epididymis
K
-2

From CLI

Let's start by declaring some variables for the materialized view we want to check, and the database it belongs to.

materialized_view_name='my_materialized_view'
database_name='my_database'

Next, we need to find the id of the materialized view. We can do this by querying the pg_class table.

In the followingw query, replace <materialized_view_name> with the name of your materialized view

id=$(psql -d $database_name -c "SELECT relfilenode FROM pg_class WHERE relname = '<materialized_view_name>';" -tA)

Now we can find the file path of the materialized view and, therefore, the last time it was updated.

file_path=$(find / -name "${id}" 2>/dev/null)
ls -l $file_path

Bonus: sometimes servers can have a different time zone to your local machine. You can get a reference to the server's time zone by running the following command.

date
Knitting answered 28/9, 2022 at 17:48 Comment(1)
downvoted for basically being a duplicate of https://mcmap.net/q/416012/-check-last-refreshed-time-for-materialized-viewSimple

© 2022 - 2024 — McMap. All rights reserved.