If we compare multiple types of replication (Single-leader, Multi-leader or Leaderless), Single-leader replication has the possibility to be Linearizable. In my understanding, Linearizability means that once a write completes, all later reads should return that value, or of a later write. Or said in other words, there should be an impression if there is only one database, but no more. So i guess, no stale reads.
PostgreSQL in his streaming replication, has the ability to make all it's replicas synchronous using the synchronous_standby_names
and it also has the ability to fine tune with the synchronous_commit
option, where it can be set to remote_apply
, so the leader waits until the transaction is replayed on the standby (making it visible to queries). In the documentation, in the paragraph where it talks about the remote_apply option, it states that this allows load-balancing in simple cases with causal consistency.
Few pages back, it says this:
,,Some solutions are synchronous, meaning that a data-modifying transaction is not considered committed until all servers have committed the transaction. This guarantees that a failover will not lose any data and that all load-balanced servers will return consistent results no matter which server is queried,,
So i'm struggling to understand what can there be guaranteed, and what anomalies can happen if we load-balance read queries to the read replicas. Can still there be stale reads? Can it happen when i query different replicas to get different results even no write after happend on the leader? My impression is yes, but i'm not really sure. If no, how PostgreSQL prevents stale reads? I did not find anything with more details how it fully works under the hood. Does it use two-phase commit, or some modification of it, or it uses some other algorithm to prevent stale reads?
If it does not provide option of no stale reads, is there a way to accomplish that? I saw, PgPool has to option to load-balance to the replicas that are behind no more than a defined threshold, but i did not understand if it could be defined to load-balance to replicas that are up with the leader.
It's really confusing to me to really understand if there anomalies can happen in a fully synchronous replication in PostgreSQL.
I understand that setup like this has problems with availability, but that is now not a concern.