I remember once finding the BerkeleyDB docs actually very useful to get an idea of how these implementations can work, because that is/was a quite low-level database that implemented transactions without the whole relational/query planning infrastructure.
Not all databases (even just the ones you mention) work in quite the same way. PostgreSQL's bottom-level implementation is quite different to both Oracle and SQL Server's snapshot implementation, even though they are all based on the same approach (MVCC: multi-version concurrency control).
One way to implement the ACID properties is to write all the changes that you ("you" here is some transaction making changes) make to the database into a "transaction log", as well as locking each row (unit of atomicity) to ensure no other transaction can mutate it at until you have either committed or rolled back. At the end of the transaction, if committing, you simply write a record into the log saying you committed and release the locks. If you roll back, you need to walk back through the transaction log undoing all your changes-- so each change written to the log file contains a "before image" of how the data looked originally. (In practice it will also contain an "after image" because transaction logs are replayed for crash recovery too). By locking each row you are changing, concurrent transactions do not see your changes until you release the locks after ending the transaction.
MVCC is a method by which concurrent transactions that want to read rows, rather than being blocked by you updating, can access the "before image" instead. Every transaction has an identity and has a way of determining which transactions' data it can "see" and which it can't: different rules to produce this set are used to implement different isolation levels. So to get "repeatable read" semantics, a transaction must find the "before image" for any row that was updated by a transaction that was started after it, for example. You could naively implement this by having transactions look back through the transaction log for the before images, but in practice they are stored somewhere else: hence Oracle has separate redo and undo spaces- redo is the transaction log, undo are before images of blocks for concurrent transactions to use; SQL Server stores the before images in tempdb. By contrast, PostgreSQL always creates a new copy of a row whenever it is updated, so the before images live in the data blocks themselves: this has some advantages (commit and rollback are both very simple operations, no additional space to manage) with tradeoffs (those outdated row versions have to be vacuumed up in the background).
In PostgreSQL's case (and this is the DB I'm most familiar with the internals of) each row version on disk has some additional properties the transactions must examine to decide if that row version is "visible" to them. For simplicity, consider that they have "xmin" and "xmax"- "xmin" specifies the transaction ID that created the row version, "xmax" the (optional) transaction ID that deleted it (which may include creating a new row version to represent an update to the row). So you start with a row created by txn#20:
xmin xmax id value
20 - 1 FOO
and then txn#25 performs update t set value = 'BAR' where id = 1
20 25 1 FOO
25 - 1 BAR
Until txn#25 is finished, new transactions will know to regard its changes as not visible. So a transaction scanning this table will take the "FOO" version, since its xmax is a non-visible transaction.
If txn#25 is rolled back, new transactions will not immediately skip it, but will consider whether txn#25 committed or rolled back. (PostgreSQL manages a "commit status" lookup table to serve this, pg_clog
) Since txn#25 rolled back, its changes are not visible, so again the "FOO" version is taken. (And the "BAR" version is skipped since its xmin transaction is invisible)
If txn#25 is committed, then the "FOO" row version is now not taken, since its xmax transaction is visible (that is, the changes made by that transaction are now visible). By contrast, the "BAR" row version is taken, since its xmin transaction is visible (and it has no xmax)
While txn#25 is still in progress (again this can be read from pg_clog
) any other transaction that wants to update the row will wait until txn#25 completes, by trying to take a shared lock on the transaction ID. I'm highlighting this point, it's why PostgreSQL doesn't usually have "row locks" as such, only transaction locks: there is no in-memory lock for each row changed. (Locking using select ... for update
is done by setting xmax and a flag to indicate xmax just indicates locking not deletion)
Oracle... does something somewhat similar but my knowledge of the details are much hazier. In Oracle each transaction is issued a System Change Number, and that is recorded in the top of each block. When a block changes, its original contents are put in the undo space with the new block pointing at the old block: so you essentially have a linked list of versions of block N- the latest version in the data file, with the progressively older versions in the undo tablespace. And at the top of the block is a list of the "interested transactions" which somehow implements locking (again not having an in-memory lock for each row changed), and I can't remember the details beyond that.
SQL Server's snapshot isolation mechanism I believe is largely similar to Oracle's, using tempdb to store blocks that are being changed rather than a dedicated file.
Hope this rambling answer was useful. It's all from memory so large swathes of misinformation are possible, particularly for the non-postgresql implementations.