Sync Elasticsearch Postgresql on a Springboot application

Asked 23/2, 2021 at 19:43 Answered 3/4 at 6:52

postgresql elasticsearch search architecture search-engine

I have Postgresql as my primary database and I would like to take advantage of the Elasticsearch as a search engine for my SpringBoot application.

Problem: The queries are quite complex and with millions of rows in each table, most of the search queries are timing out.

Partial solution: I utilized the materialized views concept in the Postgresql and have a job running that refreshes them every X minutes. But on systems with huge amounts of data and with other database transactions (especially writes) in progress, the views tend to take long times to refresh (about 10 minutes to refresh 5 views). I realized that the current views are at it's capacity and I cannot add more.

That's when I started exploring other options just for the search and landed on Elasticsearch and it works great with the amount of data I have. As a POC, I used the Logstash's Jdbc input plugin but then it doesn't support the DELETE operation (bummer). From here the soft delete is the option which I cannot take because: A) Almost all the tables in the postgresql DB are updated every few minutes and some of them have constraints on the "name" key which in this case will stay until a clean-up job runs. B) Many tables in my Postgresql Db are referenced with CASCADE DELETE and it's not possible for me to update 220 table's Schema and JPA queries to check for the soft delete boolean.

The same question mentioned in the link above also provides PgSync that syncs the postgresql with elasticsearch periodically. However, I cannot go with that either since it has LGPL license which is forbidden in our organization.

I'm starting to wonder if anyone else encountered this strange limitation of elasticsearch and RDMS.

I'm open to other options rather than elasticsearch to solve my need. I just don't know what's the right stack to use. Any help here is much appreciated!

Tyne answered 23/2, 2021 at 19:43 Comment(1)

did you end up using Logstash or PgSync? – Formant 8/9, 2022 at 5:50

Then on the diagram it will look like this

In service A, where the relational database is connected, it is necessary to implement an event listener for changing, deleting, updating data.

Then, when there is a change in data that needs to be deleted or updated, for example, the event listener is called by the producer, which distributes the data in a queue and sends it to a certain topic, depending on where the data needs to be written.

In service B, in the consumer, the queue is listening, which takes the necessary data and writes it to elastic search.

Redvers answered 10/11, 2023 at 11:22 Comment(0)

I'd use Debezium (https://debezium.io/) in order to listen all kind of events from the Postgresql tables.

Debezium already converts those events and streams them into Kafka. After that it's just a matter to create a consumer for those kafka topics (or use an existing sink (https://docs.confluent.io/kafka-connectors/elasticsearch/current/overview.html) in order to insert them into ElasticSearch.

The advantage of using Debezium and an existing sink is that these are very reliable and provide metrics that are really useful in order to see that the solution is working + debugging.

Sheff answered 3/4 at 6:52 Comment(0)

Recommended topics

Hot tags