Querying / Pagination Across Microservices

P

4

26

Our shop has recently started taking on an SOA approach to application development. We are seeing some great benefits with the separation of concerns, reusability, and other benefits of SOA/microservices.

However, one big item we're stuck on is aggregating, filtering, and paginating results across services. Let me describe the issue with a scenario.

Say we have 3 services:

PersonService - Stores information on people (names, addresses, etc)
ItemService - Stores information on items that are purchasable.
PaymentService - Stores information regarding payments that people have made for different items.

Now, say we want to build a reporting/admin tool that can display / report on multiple services in aggregate. For instance, we want to display a paginated list of Payments, along with the Person and Item that each payment was for. This is pretty straightforward: Grab the list of payments, then query PersonService and ItemService for the respective Person and Item records.

However, the issue comes into play when we want to then filter down that data: For instance, displaying a paginated list of payments made by people with the first name 'Bob', who have purchased the item 'Car'. This makes things much more complicated, because we need to filter results from 3 different services without knowing how many results each service is going to return.

From a performance perspective, querying all of the services over and over again to narrow down the results would be costly, so I've been researching better solutions. However, I cannot find concrete solutions to this problem (or at least a "best practice"). In a monolithic application, we'd simply use SQL joins across the different tables. I'm having a ton of trouble figuring out how/if something similar is possible across services.

My question to the community is: What would your approach be? Things I've considered:

Using some sort of search index (Elasticsearch, Solr) that contains all data for all services (updated via events pushed out by services), and then querying the search index for results.
Attempting to understand how projects like GraphQL and Neo4j may assist us with these issues.

Pru answered 1/12, 2016 at 0:48 Comment(6)

Chris Richardson has some insight on database per service verse shared database between services: microservices.io/patterns/data/database-per-service.html – Icily 2/12, 2016 at 3:54

Hi Mike. How did you resolve this issue? I am also stuck with similar scenario. I have ES as the datastore. – Actomyosin 20/5, 2018 at 13:7

@Actomyosin We really haven't. We still do REST calls to different services to retrieve the data we want. However, we are moving towards an event sourcing pattern, where any number of services can listen for data, and store the data they need. So all updates go through the event bus, and services duplicate the data they need for their business processes. You can look at it similar to a NoSQL approach (look at how Firebase recommends storing objects that are reused in several areas). – Pru 21/5, 2018 at 11:21

@Richa: do you by any chance resolve this issue by now ? I have also run into the same scenario that you describe, and could not find a concrete solution. I posted my question on SOF early, but haven't got any response yet. – Phipps 5/1, 2021 at 7:4

@Mike: can you elaborate more about the event sourcing pattern ? Thanks – Phipps 5/1, 2021 at 7:5

@Hoang Minh, here is a link to find out more about the CQRS pattern that could be using Event Sourcing. learn.microsoft.com/en-us/azure/architecture/patterns/cqrs This is an overview of the Event Sourcing pattern: learn.microsoft.com/en-us/azure/architecture/patterns/… – Choli 1/6, 2022 at 17:24

C

3

The answer to this question is that you need a separate Read Database or Materialized View that aggregates data from multiple databases, and makes it ready for fast retrieval. See the CQRS pattern: https://learn.microsoft.com/en-us/azure/architecture/patterns/cqrs

The data in the Materialized View might not be "the most up to date", meaning there might be a small delay between when the change is made by the respective microservice, and when time the "Materialized View" is updated, but this is fine, as retrieving the data fast is more important than if the data is stale for a few seconds or even minutes (there are systems where the Materialized View can take 2-5 minutes to be updated, and yet that might still be acceptable)

The best pattern to implement this Read Database or Materialized View from CQRS, is typically the Event Sourcing pattern, where we can listen to a queue for new updates and update the Read Database immediately. See the Event Sourcing pattern: https://learn.microsoft.com/en-us/azure/architecture/patterns/event-sourcing

Choli answered 1/6, 2022 at 17:48 Comment(0)

L

9

I stick with Sam Newman who says in Chapter 4 "The shared Database" of his book something like:

Remember when we talked about the core principles behind good microservices? Strong cohesion and loose coupling --with database integration, we lose both things. Database integration makes it very easy for services to share data, but does nothing about sharing behaviour. Our internal representation is exposed over the wire to our consumers, and it can be very difficult to avoid making breaking changes, wich inevitably leads to fear of any changes at all. Avoid at (nearly) all costs.

This is the point I make when I curse at Content-Management-Systems.

In my view a microservice is autonomous, what it cannot be if it shares things or consumes shared things. The only exception I make here are Domain-Objects, those represent the shared understanding of the business model and must be used in communication between microservices solely.

It depends on the microservice itself if an ER or AggregationOriented database (divided into document based or graph based) better suits the needs. The funny thing is, by being loosley coupled and by being autonomus you are able to do just that!

If an PaymentService shares the behaviour of "how many payments for Person A" He needs to know Person A in order to fullfill this. But Everything he knows about Person A must origin from the PersonService, maybe at runtime (the PaymentService maybe just stores an id) or event based (the PaymentService stores the data it needs up to the Domain-Object user, what gets updated triggered and supplied by the PersonService). The PaymentService itself does not share users itself.

Leverrier answered 27/9, 2017 at 15:21 Comment(0)

C

3

The answer to this question is that you need a separate Read Database or Materialized View that aggregates data from multiple databases, and makes it ready for fast retrieval. See the CQRS pattern: https://learn.microsoft.com/en-us/azure/architecture/patterns/cqrs

The data in the Materialized View might not be "the most up to date", meaning there might be a small delay between when the change is made by the respective microservice, and when time the "Materialized View" is updated, but this is fine, as retrieving the data fast is more important than if the data is stale for a few seconds or even minutes (there are systems where the Materialized View can take 2-5 minutes to be updated, and yet that might still be acceptable)

The best pattern to implement this Read Database or Materialized View from CQRS, is typically the Event Sourcing pattern, where we can listen to a queue for new updates and update the Read Database immediately. See the Event Sourcing pattern: https://learn.microsoft.com/en-us/azure/architecture/patterns/event-sourcing

Choli answered 1/6, 2022 at 17:48 Comment(0)

S

2

Storing this data in elasticsearch/solr/cognitivesearch type service in addition to SQL could help solve some of these problems.

In your given example,

In the search index(elasticsearch/solr/cognitivesearch) person object will have a property called "items" that will contain a list of items that are paid for by that person.

That way, you can filter across objects, get a paginated list that is sorted by any property of the person. You can add similar information on other documents to better suit your business needs.

Using a GraphDatabase would seem to solve your problem from a 10000ft, but you will run into pagination problems when you operate at scale. GraphDatabases do not do pagination well(they will have to visit all the nodes anyway, even when you need a paginated list) and will cause timeouts/performance issues.

Shelby answered 22/9, 2020 at 14:37 Comment(0)

F

0

You can use replication tables. All databases have replication feature If you have personService that has person table and PaymentService that has payment table then create reportService that has person and payment tables, that they filled by replication feature.

Funiculate answered 13/6, 2020 at 13:0 Comment(0)

Recommended topics

Hot tags