Postgres LISTEN/NOTIFY - low latency, realtime?
Asked Answered
T

2

3

I am planning to use postgres LISTEN/NOTIFY aproach to get insert time(actual transaction commit time) of records in a table. To achieve this, I plan to do the following. I issue a notification during insert time as shown below.

BEGIN;
  INSERT INTO table_name(id, ...) values (id,....);
  select pg_notify('test_channel', 'id - ' || id || ' trans start time - ' || now() || ' notify start time - ' || clock_timestamp()); 
END;

And then I plan to use https://pythonhosted.org/psycopg2/advanced.html#asynchronous-notifications to receive those notifications.

What I would like to find out is the exact time the transaction commit happens(the record is available to read) down to micro secods

I understand that NOTIFY(pg_notify) actually sends notification right after the commit of the transaction but I couldnt figure out how to find out the exact time when it happens. The clock timestamp value I have in NOTIFY, is not the acutal transaction commit time.

I guess the time I listen to notification will be close to transaction commit time but I am not sure how close it is. First, there is some time between polls in my code while listening(however small it is) and second, I am not sure if there is any lag between NOTIFY/LISTEN communication itself.

Any ideas?

UPDATE(Complete description of problem): We have a reader selecting rows in batches using a “checkpoint” time, where each batch gets the rows after the last timestamp in the previous batch, and we are missing rows. (Reason: The timestamp value is based on the time INSERT happens(00.00.00). At heavy loads, if the transaction takes longer time, it gets inserted let say 10 sec later(00.00.10), the reader will miss this row(row1) if it reads during that 10 seconds and finds a row which had its INSERT time at a later time(00.00.05) than row1. The complete description of the problem is similar to the one written in this blog. http://blog.thefourthparty.com/stopping-time-in-postgresql/)

Tracheotomy answered 2/10, 2015 at 19:12 Comment(4)
Er ... why? What are you trying to achieve with this? What's the underlying problem you're trying to solve with this, the reason you need it?Icosahedron
I updated the problem we are trying to solve in the description.Tracheotomy
So you're just trying to implement a reliable queue with multiple writers and a single reader? Trying to fix out-of-order commit and visibility issues this way will not work. Consider posting a new, separate question on the actual underlying topic, i.e. how to avoid missing rows when a queue reader scans a table then in-progress transactions commit after it reads. It's frustrating that you ignored my question on this earlier.Icosahedron
Posted a separate question here. #32947352Tracheotomy
I
2

What I would like to find out is the exact time the transaction commit happens(the record is available to read) down to micro secods

Handily, PostgreSQL 9.5 just added that, in the form of support for commit timestamps. See commit timestamps. Note that you must have track_commit_timestamp enabled to use this, and that information about commit timestamps isn't kept forever, so fairly old rows will just get a null result.

You can get the transaction ID with txid_current() at any point during the transaction. Perhaps using insert ... returning ... for example. Then you can look up the commit timestamp in a subsequent query, after commit.

For older versions, you should just include clock_timestamp in your insert ... returning ... clause. It will be the time the record was inserted, not the commit time, but that's really the closest it is possible to get.

I guess the time I listen to notification will be close to transaction commit time but I am not sure how close it is.

"fairly". It will depend on network latency, CPU scheduling lag, etc. It sure won't be microsecond-accurate.

For example, on Windows it will be at best accurate to the millisecond, but by default it'll be accurate to the nearest 15-millisecond timer tick.

First, there is some time between polls in my code while listening(however small it is)

Don't poll. select() the socket so you're woken the instant there's data to read. On Linux you'd ideally use the epoll() system call for this.

and second, I am not sure if there is any lag between NOTIFY/LISTEN communication itself.

Some, yes, because a transaction commit takes time. So there's some non-zero time between when you issue the NOTIFY and when the event is sent to listeners.

Icosahedron answered 3/10, 2015 at 10:31 Comment(7)
To summarize the last three responses in your answer.....So if we exclude the time taken for the transaction commit, as notification gets sent right after the transaction commit and if I use select() instead of polling, the only time-lag between a NOTIFY and LISTEN are due to reasons like network latency, CPU scheduling etc. Is my understanding correct? I understand that the clock_timestamp value in the payload of NOTIFY does not have the transaction end time.Tracheotomy
Well, commit isn't instantaneous, it's just atomic as far as visibility is concerned. If you're talking microseconds that starts to matter. Notifications are sent after the commit is set in the clog, during post-commit work. See the source code for details. Also, select() and epoll() aren't necessarily zero latency; in fact, they can easily have milliseconds of latency between data becoming readable and your application being notified, due to latencies within the kernel, system management interrupts, etc. You'd need a realtime process and low-latency kernel to have anything sensible-ish.Icosahedron
It sounds like you're trying to use PostgreSQL as a realtime system, but it just isn't. It's not designed to be. It can have long lock latencies in LWLocks. It can block on a relation extension lock during filesystem operations. It can be delayed by buffer pins during a checkpoint. All sorts of things that mean that even if latencies are usually low, they are not bounded and are not always low. You might get a big delay between commit and processing of a notification if you have lots of load and bad luck. You will never know with your approach. It is not reliable.Icosahedron
Also, psycopg2 and python? For ultra-low latency / realtime? Seriously?Icosahedron
Do you have any other suggestions other than using psyconpg2 and python for this purpose which might do better?Tracheotomy
@Tracheotomy libpq and C. But really, its hard to say when I can't figure out what the real goal here is, the problem you are trying to solve, the why for this how. What you're doing makes no sense to me so it's hard to suggest ways to do it better....Icosahedron
I had updated the description with the problem I am trying to solve.Tracheotomy
M
1

now() will always be less than then the actual time where the transaction is commited and visible to readers (unless you have dirty read as isolation level).

A better approach (one that does not have race-conditions) is call pg_notify() and use clock_timestamp() right after the end of the transaction (and only if the transaction commits).

Manifestation answered 2/10, 2015 at 20:28 Comment(3)
Using NOTIFY vs pg_notify() makes no difference at all. Using clock_timestamp instead of now() (which is the current_timestamp for the transaction) is a good suggestion though. I don't understand what you mean by "right after the end of the transaction". You can't do anything after the end of the tx, unless you do another transaction.Icosahedron
Do you mean setting up a trigger 'after insert' which would have the NOTIFY statement? As per my understanding trigger is part of the current transaction and will add time to the total transation time? How else would I add a NOTIFY right after the commit?Tracheotomy
You are correct that an after insert trigger fires during, and as part of, the transaction. The notification is enqueued by the trigger, then delivered after commit.Icosahedron

© 2022 - 2024 — McMap. All rights reserved.