How should I healthcheck an event-driven service
Asked Answered
L

4

2

Suppose I have a service which rather than listening for http request, or gRPC procedure calls only consumes messages from a broker (Kafka, rabbitMQ, Google Pub/Sub, what have you). How should I go about healthchecking the service (eg. k8s liveness and readyness probes) ?

Should the service also listen for http solely for the purpose of healthchecking or is there some other technique which can be used ?

Luing answered 22/7, 2021 at 12:4 Comment(0)
F
3

Having the service listen to HTTP solely to expose a liveness/readiness check (although in services that pull input from a message broker, readiness isn't necessarily something that a container scheduler like k8s would be concerned with) isn't really a problem (and it also opens up the potential to expose diagnostic and control endpoints).

Freedwoman answered 22/7, 2021 at 13:2 Comment(0)
C
1

Kubernetes supports three different types of probes, see also Kubernetes docs:

  • Running a command
  • Making an HTTP request
  • Checking a TCP socket

So, in your case you can run a command that fails when your service is unhealthy.

Also be aware that liveness probes may be dangerous to use.

Contuse answered 22/7, 2021 at 12:23 Comment(0)
P
1

I'm going to give my take rather than a proved solution since I don't see much documentation about this use case in kubernetes community.

I think answers provided so far are a bit misleading since they are not really focusing on the actual challenge. Event-Based services do not get any incoming traffic, they are not receiving any request as in regular web service approaches, so Kubernetes Ingress Controller semantics do not apply. Rather, these services are sending outgoing request to queue systems to poll for next message to process (pull mode).

In k8s doc readiness is defined as "A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services." https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-readiness-probes

So, we can see that Readiness is not fully covering Event-Bases services, because there is no traffic to receive.

I see 3 possible solutions to this challenge:

  1. Internally design a Service Status facility with heartbeats to external collaborators. Exposing metrics, and stopping queue listeners accordingly. This is completely ad-hoc approach with no K8s involvement. Service may expose internal heartbeat status via metrics to facilitate monitoring/alerting.
  2. Introduce some sort of message broker, turning your service into push based instead of pull. Now you service is actually getting incoming traffic, so you can apply regular K8s approach. This is decoupling the message processing logic from the message polling one. This would be a radical redesign of your solution, and adds an additional layer to be scaled/monitored...
  3. Implement regular readiness probe based on external resource states. Use some sort of outgoing proxy agent similar to [Envoy proxy][1], where proxy will deny the next queue poll request based on the readiness state. This is finally some k8s oriented approach with light impact, and resulting agent can be reused for multiple pull event-based services as cross cutting concern.

My preference is option 3, but the lack of specific tech/tools for each messaging system out in the market depending on network protocol (Kafka, MQTT, AMQP, CoAP, NATs...) it's an issue. For messaging systems supporting HTTP interface it maybe more straightforward, but the the solution will not benefit from the specific protocol improvements. [1]: https://www.envoyproxy.io/

Pochard answered 29/1 at 10:37 Comment(0)
V
0

For Eclipse/Californium - CoAP/DTLS 1.2 CID I've implemented such a https endpoint for k8s support. It's not a too big thing, though the https endpoint needs no route to external stuff. It reflects the internal diagnose state, so it's not directly the "CoAP/DTLS 1.2 CID" health, but close enough to be very helpful.

Veiled answered 31/1 at 12:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.