PostgreSQL how to find any changes in the last n-minutes
Asked Answered
S

2

22

I am writing a program that syncs PostgreSQL and MS SQL server databases (and adds some changes in this transition). With multi million records, it takes a long time, and loads server pretty bad with select *; it also takes more resources to parse unchanged records and validate them against MS SQL server.

Are there any logs in PostgreSQL that I can parse to find out the changes that took place in the last n-minutes? This would allow me to select only those records that I need to work; improving performance.

Striper answered 7/12, 2012 at 17:42 Comment(4)
Have you considered xmin?Klopstock
how can I use xmin to get changed records from the table (if the table has 1million records?)Striper
sqlfiddle.com/#!1/58507/2Klopstock
but essential 'with great care'—did you read the notes about wrap-around and frozen xids?Klopstock
L
41

Postgresql, find changes in the last n minutes:

Postgresql does not automatically store the date or time that rows were added/updated/deleted (it would really slow things down to handle timestamps like this if you didn't want it to).

You'll have to do it yourself: Add a timestamp column to the table. When you insert a row into the table, have it update the timestamp column to the current_timestamp. When you are selecting the row, use a select statement that filters down where timestamp is greater than N minutes ago as follows:

Get rows where a timestamp is greater than a date:

SELECT * from yourtable 
WHERE your_timestamp_field > to_date('05 Dec 2000', 'DD Mon YYYY');

Get rows that have been changed in the last n minutes:

SELECT * from yourtable 
WHERE your_timestamp_field > current_timestamp - interval '5 minutes'

Walkthrough example

drop table foo; 
CREATE TABLE foo( 
    msg character varying(20), 
    created_date date, 
    edited_date timestamp 
); 
insert into foo values( 'splog adfarm coins', '2015-01-01', current_timestamp);
insert into foo values( 'execute order 2/3', '2020-03-15', current_timestamp);
insert into foo values( 'deploy wessels', '2038-03-15', current_timestamp);
 
select * from foo where created_date < to_date('2020-05-05', 'YYYY-mm-DD'); 
    ┌────────────────────┬──────────────┬────────────────────────────┐ 
    │        msg         │ created_date │        edited_date         │ 
    ├────────────────────┼──────────────┼────────────────────────────┤ 
    │ splog adfarm coins │ 2015-01-01   │ 2020-12-29 11:46:27.968162 │ 
    │ execute order 2/3  │ 2020-03-15   │ 2020-12-29 11:46:27.96918  │ 
    └────────────────────┴──────────────┴────────────────────────────┘ 
 
select * from foo where edited_date > to_timestamp(
    '2020-12-29 11:42:37.719412', 'YYYY-MM-DD HH24_MI_SS.US'); 
    ┌────────────────────┬──────────────┬────────────────────────────┐ 
    │        msg         │ created_date │        edited_date         │ 
    ├────────────────────┼──────────────┼────────────────────────────┤ 
    │ execute order 2/3  │ 2020-03-15   │ 2020-12-29 11:46:27.96918  │ 
    │ deploy wessels     │ 2038-03-15   │ 2020-12-29 11:46:27.969988 │ 
    └────────────────────┴──────────────┴────────────────────────────┘ 
Lowrance answered 7/12, 2012 at 17:50 Comment(3)
OK, inserts can be taken care of with timestamps, what about edits?Striper
When you edit the table, you could either update that same timestamp, or make an additional column called "date_updated" and when you update the row, mark that column as current time. Watchout for timezone changes, daylight savings times and other shenanigans with the time.Lowrance
Use trigger to update your_timestamp_field automatically when any INSERT/UPDATE is proceeded. You don't more need to remember 'SET your_timestamp' in all queries.Nabokov
C
4

You can use a trigger based approach described here:

http://wiki.postgresql.org/wiki/Audit_trigger

Essentially every table change fires a trigger which can write some info in a log table.

Chaulmoogra answered 7/12, 2012 at 18:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.