Our on-premise Kubernetes/Kubespray cluster has suddenly stopped routing traffic between the nginx-ingress and node port services. All external requests to the ingress endpoint return a "504 - gateway timeout" error.
How do I diagnose what has broken?
I've confirmed that the containers/pods are running, the node application has started and if I exec into the pod then I can run a local curl command and get a response from the app.
I've checked the logs on the ingress pods and traffic is arriving and nginx is trying to forward the traffic on to the service endpoint/node port but it is reporting an error.
I've also tried to curl directly to the node via the node port but I get no response.
I've looked at the ipvs configuration and the settings look valid (e.g. there are rules for the node to forward traffic on the node port the service endpoint address/port)