Efficient way for Auditing
Asked Answered
K

3

6

I am trying to implement an audit layer for my spring boot application.I tried two approaches as of now.

1) Created 1 audit table with fields user_name, table_name,column_name, old_value, new_value, uuid, event_type.

whenever any changes are saved, populate the audit entity and save it too.

Advantages:

  • quick

  • easy to manage since only 1 audit table

Falls:

  • sometimes it involved too much of mappings from business entity to audit entity

  • the old value has to be fetched from db to populate the audit entity

  • manual creation of audit entity
  • retrieval can be a pain

2) Used javers for auditing

Advantages:

  • automatic creation and updation of audit entity

  • less number of tables to manage

  • old value retrieval is not needed

Falls:

  • Time taken was too much since the size of the transaction is high

Benchmark

Dealing with 10 rows in an table(entity) with 20 columns(fields),

time taken using approach 1: 24328 ms => 24 s

time taken using approach 2: 311292 ms => 311 s (nearly 12 times)


3) Didn't go with Hibernate envers since the number of tables created would be high

Can somebody propose a better idea for auditing with the above pros and cons. We aim for,

  • less number of tables to manage

  • less response time

  • modular audit layer, that is more of automated than manual

with count of 10 to 25 columns in each table.

Keener answered 21/4, 2020 at 12:57 Comment(3)
which db provider ?Decibel
Have you tried to run the auditing in a background thread with low priority ? I'll use spring AOP with annotations, and run the updates of the batch in a separate background thread. also the datasource of this table should be separated so that the auditing will not modify in nay way the BU of the application.Meredithmeredithe
Using Oracle dbKeener
D
3

As you said the common approach for creating audit trails are application-side libraries like envers or Javers. These ones are hooked into the persistence library, they’d maintain specific column(s) in the data tables ("createdBy", "lastUpdated" etc.), and/or copy earlier record versions into some form of history tables.

There are some disadvantages to this, though:

  • writing records in history tables as part of OLTP transactions increases the number of executed statements within the transaction -> may cause longer response times of the application

  • support for bulk updates and deletes

  • changes done directly in the database cannot be tracked

Change Data Capture in 3 steps

  1. Capture change data
  2. Transform change data into a format your destination database supports for upload
  3. Upload the data to the destination database

Change Data Capture via database triggers

Another technique are database triggers. They won’t miss any operations, no matter whether issued from the application or the database itself. Bulk Statement will be processed too. Another advantage of the trigger-based CDC is that application stays unaware of the fact that you have added the whole audit layer. On the downside, there’s still is the problem of increased latency when executing triggers as part of OLTP transactions.

Alternative solution for triggers (Postgresql here) : Use logical Replication to stream the database changes(decoded WAL messages) from MASTER server to SLAVE server using WAL’s and enable Audit trigger to capture the changes to replicated tables on the Slave server.

Change Data Capture via via a log-based approach

The aforementioned problems don’t exist when leveraging the transaction log as the source for auditing and using change data capture for retrieving the change information and sending it into a message broker or persistent log-based such as Apache Kafka. Generally considered the superior approach to change data capture but not the easiest solution to setup.

Running asynchronously, the CDC process can extract the change data without impacting OLTP transactions.

Whenever there’s a data change an entry is added to the transaction log.

There’ll be a log entry for each record updated or deleted in a bulk operation, so a change event for each of them can be produced.

Remains the question of how can CDC access the metadata like the application user that performed a data change, their IP address , a tracing span id or any kind of correlationID.

One approach is to have a separate table where this metadata is stored. The application can store for each transaction a record inside this table with a specific transactionId. Data change events will contain the transactionID linked to the change so the data change events and the metadata records can so be correlated.

Decibel answered 21/4, 2020 at 13:51 Comment(0)
M
1

I would adopt this approach:

1) auditing in a background thread with low priority

2) use spring AOP with annotation to separate the BU from the AUDITING

3) Write the full jsonized object inside a NOSQL database document with UID every time an entity is inserted or updated; also the old value is not useful as you can backtrack the changes on an entity running simple queries

Meredithmeredithe answered 21/4, 2020 at 13:35 Comment(0)
A
0

Obviously you are not using Javers in the right way, your numbers are not reliable. Maybe you are creating new Javers instance for for each commit?

Javers is as fast as an audit tool could be. Javers simply creates Snapshots for changed objects and inserts them to DB. One DB record/document per one snapshot.

So if you change one object, typically Javers executes one DB read to get its previous snapshot and one DB insert to write new snapshot.

Argentous answered 21/4, 2020 at 14:53 Comment(5)
Each time, there will be a difference check to see the changes between the previous snapshot and the current one right. Is there any way to ignore the difference check and just create snapshots?Keener
No I used a singleton javers instance.Keener
Javers executes one DB read to get its previous snapshot - Is there any way to skip this?Keener
Not a good idea. Data audit is about tracking changes. If you commit without a difference check - it would mean writing a lot of duplicated snapshots (and performance degradation, in most databases, writes are more costly than reads).Argentous
Actually we add last updated time to each entity. So , diff would always be there.Keener

© 2022 - 2024 — McMap. All rights reserved.