Master Data Management Strategies in Microservices paradigm

Asked 31/8, 2019 at 14:20 Answered 4/1, 2023 at 5:29

Working on migrating a huge monolithic application to microservices paradigm, needless to say the domains identification and mapping then to different microservices and the orchestration has been quite a task. Now as the previous application shared the master data in the same schema, in the new paradigm it gets difficult for me to manage that, my choices are:

Replicate the same master data in each microservice: Pros: when cached in the application works fast and no looksup, application within itself acts as a true source of truth. Cons: Any updates on master data in a particular service could lead to inconsistencies while the services are trying communicate among each other using this data, the updates to the master data can cause serious consistency problems across.
Have the master data hosted as a seperate microservice: Pros: Single source of master data. Cons: Hit on performance since it always a service call over the wire when a lookup happens.
Create a distributed cache and expose it to multiple microservices: would break the "Single Source o Truth" principle of data for microservices but could ensure performance and consistency being a write through implementation.

Any thoughts on above or any implementation strategies would really help...

Vaibhav

Postbox answered 31/8, 2019 at 14:20 Comment(1)

You are building a distributed monolith. Either of your three approaches will lead to failure, because the microservices are tied to each other by shared data. Only if you manage to chop your application into separate and independent parts you will truly reap the benefits of microservices. – Suddenly 31/8, 2019 at 18:2

Solution for this particular problem or dilemma depends on some information about your current Architecture.

How do your micro-services communicate with each other? Are you using Commands/Queries as direct calls and events over some queue?
How big is your master-data? Is it some sort of configuration or small amount of cashed data which is used as some sort of constants or settings?

If one of your communication mechanisms is done asynchronous with Events coming from some Queue and you are not dealing with huge amount of data which is very frequently changed then my recommendation would be to:

1. Create a dedicated master-data-micro-service. This micro-service would be the owner of your master-data. It would be the only one which would allow direct changes on the Entities inside it.

2. Publish events to a queue on changes on every Entity in master-data-micro-service. Whenever someone creates, updates or deletes entities in master-data-micro-service you would publish events to some queue about those changes.

3. Subscribe to master-data-micro-service events. All other micro-services who need the master-data-micro-service data would subscribe to the Events of the Entities it uses and saves them locally in its database. This data or subset of master-data would be saved as a copy for local usage. This master-data Entities can only be changed with these events when their "source of truth" the master-data-micro-service publishes events that they have been changed. Any other type of change would be forbidden as it would create a difference between local copy of that data and its source of truth in the master-data-micro-service.

Pros:

With this approach you would only have one source of truth for your master data. All other micro-services would only use the data or subset of data from the master-data-micro-service which they need. Other data they can simply ignore. Other advantage is that your micro-service would be able to operate on its own without calling the master-data-micro-service directly to get some data it needs.

Cons

The drawback is that you would need to duplicate data in multiple micro-services. The other problem is that you need to deal with the complexity of a distributed system, but you are already doing this ;)

Some comments on your provided choices:

Replicate the same master data in each microservice: Pros: when cached in the application works fast and no looksup, application within itself acts as a true source of truth. Cons: Any updates on master data in a particular service could lead to inconsistencies while the services are trying communicate among each other using this data, the updates to the master data can cause serious consistency problems across.

My suggestion from above already covers this approach partly, only without the direct calls. Assumption was that you would use a queue. Even if you don't use a queue you could notify the micro-services which use the master-data-micro-service with some notification system and then and only then let them call your master-data-micro-service the get the latest data. And not do a call on every operation which is inside micro-service which requires master-data. That would be very inefficient.

Have the master data hosted as a seperate microservice: Pros: Single source of master data. Cons: Hit on performance since it always a service call over the wire when a lookup happens.

My suggested approach from above is a joined approach with this and your first point about replicating data in each micro-service.

Create a distributed cache and expose it to multiple microservices: would break the "Single Source o Truth" principle of data for microservices but could ensure performance and consistency being a write through implementation.

I would not recommend doing this. There are many reasons why not. Some you already mentioned. One thing to consider when doing this that you will have 1 joined single point of failure for multiple micro-services. That is something which goes against one of the main principles of micro-services.

Mover answered 31/8, 2019 at 17:56 Comment(2)

The complexity of the distributed system :) indeed and i believe this is one of the only way out where in system has more control over the data, the eventual consistency is something to look out for, but in my case since the updates are very less frequent or sometimes even non existent, this works! – Postbox 10/9, 2019 at 4:0

Great, hope it helped :) – Mover 16/9, 2019 at 14:38

One of the approach that we followed, was something like below;

Create logical grouping of master data entities , so that we don't end up creating a SUPER BIG MONOLITH Microservice.
Provide management (Create / Update / Delete / Read) of logically grouped master entities thru a Microservice. So we had 5 to 6 microservices managing different logical group of master data entities.
Whenever any of the functional modules was requesting for master data entity , it first used to lookup for it in the Redis Cache, if not found, then it used to invoke fetch API Microservice corresponding to the Reference data logical group.
The Fetch API of Microservice had the implementation to put the master data entity in Redis Cache. This way any subsequent request for same entity will be available in the Redis Cache for other functional modules.
The Redis cache was getting updated either when the request is coming for Fetch API or when the value objects were getting updated.

Pros

All value objects were getting accessed centrally from Redis Cache.
Redis cache provided faster read, rather going thru the Fetch API of master data entity every time.

Cons

The key to identify individual master data entity needs to be communicated to other functional module.

Artful answered 21/5, 2021 at 8:45 Comment(0)

There is no one-size fit all to manage all kinds of Master-Data Entities. This is what I have set and followed in my previous projects:

Segregating master data based on their behavior/ usage pattern rather Entties design itself. Eg. Users search for a master data using 'type as you go' feature, like city-name or zip-code. Use Text based search tool like Redis/Elastic. Both are persistent storage.
Don't mix functionalities with master data. Eg. Lets say we have created City-Master data , Don't combine with business feature like is this city serviceable.
A tip: Many time, multiple master data type can be stored in two entities (this is useful, if we are using RDBMS) for storying records.
Eg. Say we have to City , State and ZipCode master data. Create two table , master-Data types/Category and Master Data itself. Thus we would need only one query to fetch each type of record

Pros of Above approach

Consistency in accessing pattern for Master Data bases on behavior by limiting number of services required to access them.
Reduces need to define multiple Entities for different master data by converting different master data entities into a consistent template of master data. Think for #3 above.
Reduce efforts to modify code every time a new entity is added.
Supports 1 level of hierarchy. Filter city based on selected state.

Cons:

Only support 1 level of hierarchy. In case more complex filtering technique is required, I would suggest to use separate the model based on behavior. (#1)

I don't see a benefit of using with use #1 and #3. I also don't see performance hit as most of the time, these calls can be made parallel to other service calls. Use choreography rather than orchestration.

Antispasmodic answered 4/1, 2023 at 5:29 Comment(0)

Recommended topics

Hot tags