Some context about the setup:
We're switching from NTLM to Kerberos (Negotiate) for service-to-service authentication between various .NET workloads (e.g. IIS-hosted web API, or simple .NET command line program).
For any call from client to server, there is an API gateway in the middle. We have some custom logic in the gateway for doing the authentication and enforcing Kerberos (rejecting Negotiate headers with an NTLM ticket). The healthy client-server flow looks something like this:
- Client (C) sends request to server (S)
- Gateway (G) intercepts the request
- (G) returns a 401 challenge with WWW-Authenticate: Negotiate
- (C) sends the request again, with an Authorization: Negotiate [ticket] header
- (G) inspects [ticket] and:
5.a If [ticket] is NTLM: "reject" the request (return non-success status code)
5.b If [ticket] is Kerberos: validate ticket and (if valid) pass the request onto (S)
Now, to not do a big-bang change, we are able to configure (in the gateway) which requests should this Kerberos-check happen for, based on the original destination of the request from (C), which should be the roughly hostname and port of (S).
This setup works fine, but there is this occasional hard-to-replicate issue:
- Occasionaly, for some (S), when we enable the Kerberos-check in (G), the client (C) keeps sending NTLM tickets (therefore getting rejected).
- This is despite the fact that all the prerequisites for (C) being able to talk Kerberos to (G) being met - e.g. it is possible to do a
klist get HTTP/spn-of-G
from (C) and receive a proper Kerberos ticket, even when impersonating the exact same user that (C) would normally run as - On top of that, there are often other applications on the same server as (C) that go through the same flow just fine
- A reboot of the Windows Server instance that (C) is running on fixes this, making (C) send proper Kerberos ticket to (G) after the restart
My question is: Is there any other possibility to fix such a situation, without rebooting the server ?
Things I already tried without success:
- Restarting the application running on (C). In case (C) is an IIS application I tried restarting app pool, or
iisreset
. But I've seen this issue also happen e.g. in a case where (C) is a C# command-line program running to completion every 15min. - Flushing DNS on the server where (C) runs, with
ipconfig /flushdns
- Purging all cached Kerberos tickets on the server where (C) runs (with
klist purge
executed for all logon sessions using a powershell script)
WWW-Authenticate: NTLM
. Then, we flip a switch in the gateway, which then starts sendingWWW-Authenticate: Negotiate
instead ofWWW-Authenticate: NTLM
. At this point, the application (client) starts receivingWWW-Authenticate: Negotiate
and should start sending a Kerberos ticket, but it doesn't. When the server is restarted, that fixes it. – Adameslocaldatetime
using powershell andgwmi
and the clocks looked pretty synced to me (with 1s precision at least, not sure whether that's enough). – Adames