As you said the common approach for creating audit trails are application-side libraries like envers or Javers. These ones are hooked into the persistence library, they’d maintain specific column(s) in the data tables ("createdBy", "lastUpdated" etc.), and/or copy earlier record versions into some form of history tables.
There are some disadvantages to this, though:
writing records in history tables as part of OLTP transactions increases the number of executed statements within the transaction -> may cause longer response times of the application
support for bulk updates and deletes
changes done directly in the database cannot be tracked
Change Data Capture in 3 steps
- Capture change data
- Transform change data into a format your destination database supports for upload
- Upload the data to the destination database
Change Data Capture via database triggers
Another technique are database triggers. They won’t miss any operations, no matter whether issued from the application or the database itself. Bulk Statement will be processed too.
Another advantage of the trigger-based CDC is that application stays unaware of the fact that you have added the whole audit layer.
On the downside, there’s still is the problem of increased latency when executing triggers as part of OLTP transactions.
Alternative solution for triggers (Postgresql here) :
Use logical Replication to stream the database changes(decoded WAL messages) from MASTER server to SLAVE server using WAL’s and enable Audit trigger to capture the changes to replicated tables on the Slave server.
Change Data Capture via via a log-based approach
The aforementioned problems don’t exist when leveraging the transaction log
as the source for auditing and using change data capture for retrieving the change information and sending it into a message broker or persistent log-based such as Apache Kafka
. Generally considered the superior approach to change data capture but not the easiest solution to setup.
Running asynchronously, the CDC process can extract the change data without impacting OLTP transactions.
Whenever there’s a data change an entry is added to the transaction log.
There’ll be a log entry for each record updated or deleted in a bulk operation, so a change event for each of them can be produced.
Remains the question of how can CDC access the metadata like the application user that performed a data change, their IP address , a tracing span id or any kind of correlationID.
One approach is to have a separate table where this metadata is stored. The application can store for each transaction a record inside this table with a specific transactionId
. Data change events will contain the transactionID linked to the change so the data change events and the metadata records can so be correlated.