We have our project hosted in OpenShift (OKD to be precise. We host it ourselves). The setup is as follows:
Routing server (Spring Boot 1.5.8 with Zuul): This one takes all the incoming traffic and routes it to the correct services
Multiple services (all with Spring Boot): Here is all the business logic
We use SOAP for calling other services in this project.
Currently, when we call the application, the call goes to the routing server, which then routes it to the main business service.
After a short inactivity of about one hour, our main business service is not reachable via the external call. The edge server however is available and callable 100% of the time. We do get a 504 Gateway Timeout
exception from the system when we call it. We already figured out that this is the timeout of the route in openshift (haproxy.router.openshift.io/timeout
in the route).
The core problem is, that OpenShift seems to hibernate the main business service after an inactivity of about one hour. After a delay of 15 minutes however the calls seem to find their destination and the data gets processed correctly.
How can we turn this behaviour off?
Edit 1:
- We have the same application in normal "old fashioned" VMs in production. We don't have any problems there.
- We noticed that the services can be "kept alive" when we call them regulary. We built a small service which calls theme regulary (every 15 min). This way it seems to work. But this is not a production ready workaround IMO.
Edit 2:
Our pod config (some names are anonymized):
https://gist.github.com/moritzluedtke/6867499b0acbb2d7b5a9a70e49b0d45c
We do not use autoscaler.
Edit 3:
Our deployment configs (some names are anonymized):
https://gist.github.com/moritzluedtke/dc7c1078fe9cc7e4aeb737094849fc1b
OpenShift Master: v3.11.0+1c3e643-87
Kubernetes Master: v1.11.0+d4cacc0
OpenShift Web Console: v3.11.0+ea42280
Edit 4:
It seems that this is not a problem with OpenShift but rather our tech stack. I will update this question, as soon as we have a solution.