When any one of the microservice is down, Interaction between services becomes very critical as isolation of failure, resilience and fault tolerance are some of key characteristics for any microservice based architecture.
Totally agreed what @jayant had answered, in your case Implementing proper fallback mechanism makes more sense and you can implement required logic you wanna write based on use case and dependencies between M1, M2 and M3.
you can also raise events in your fallback if needed.
Since you are new to microservice, you need to know below common techniques and architecture patterns for resilience and fault tolerance against the situation which you have raised in your question. And here you are using Spring-Boot, you can easily add Netflix-OSS in your microservices.
Netflix has released Hystrix, a library designed to control points of access to remote systems, services and 3rd party libraries, providing greater tolerance of latency and failure.
It include below important characteristics:
- Importance of Circuit breaker and Fallback Mechanism:
Hystrix implements the circuit breaker pattern which is useful when a
service failure can cause cascading failure all the way up to the user.
When calls to a particular service exceed
circuitBreaker.requestVolumeThreshold
(default: 20 requests) and the
failure percentage is greater than
circuitBreaker.errorThresholdPercentage
(default: >50%) in a rolling
window defined by metrics.rollingStats.timeInMilliseconds
(default: 10
seconds), the circuit opens and further calls are not made.
In cases of error and an open circuit, a fallback can be provided by the
developer. Fallbacks may be chained so that the first fallback makes
some other business call. check out Fallback Implementation of Hystrix
When a request fails, you may want to have the request be retried
automatically. Ribbon does this job for us.
In distributed system, a microservices system retry can trigger multiple
other requests or retries and start a cascading effect
here are some properties to look of Ribbon
sample-client.ribbon.MaxAutoRetries=1
Max number of next servers to retry (excluding the first server)
sample-client.ribbon.MaxAutoRetriesNextServer=1
Whether all operations can be retried for this client
sample-client.ribbon.OkToRetryOnAllOperations=true
Interval to refresh the server list from the source
sample-client.ribbon.ServerListRefreshInterval=2000
More details :- ribbon properties
In general, the goal of the bulkhead pattern is to avoid faults in one
part of a system to take the entire system down. bulkhead pattern
The bulkhead implementation in Hystrix limits the number of concurrent
calls to a component. This way, the number of resources (typically
threads) that is waiting for a reply from the component is limited.
Assume you have a request based, multi threaded application (for example
a typical web application) that uses three different components, M1, M2,
and M3. If requests to component M3 starts to hang, eventually all
request handling threads will hang on waiting for an answer from M3.
This would make the application entirely non-responsive. If requests to
M3 is handled slowly we have a similar problem if the load is high
enough.
Implementation details can be found here
So, These are some factors you need to consider while handling microservice Interaction when one of the microservice is down.