Client sometimes negotiates NTLM after Kerberos has been enabled, until client server rebooted. How to avoid the reboot?
Asked Answered
A

0

7

Some context about the setup:

We're switching from NTLM to Kerberos (Negotiate) for service-to-service authentication between various .NET workloads (e.g. IIS-hosted web API, or simple .NET command line program).

For any call from client to server, there is an API gateway in the middle. We have some custom logic in the gateway for doing the authentication and enforcing Kerberos (rejecting Negotiate headers with an NTLM ticket). The healthy client-server flow looks something like this:

  1. Client (C) sends request to server (S)
  2. Gateway (G) intercepts the request
  3. (G) returns a 401 challenge with WWW-Authenticate: Negotiate
  4. (C) sends the request again, with an Authorization: Negotiate [ticket] header
  5. (G) inspects [ticket] and:
    5.a If [ticket] is NTLM: "reject" the request (return non-success status code)
    5.b If [ticket] is Kerberos: validate ticket and (if valid) pass the request onto (S)

Now, to not do a big-bang change, we are able to configure (in the gateway) which requests should this Kerberos-check happen for, based on the original destination of the request from (C), which should be the roughly hostname and port of (S).

This setup works fine, but there is this occasional hard-to-replicate issue:

  • Occasionaly, for some (S), when we enable the Kerberos-check in (G), the client (C) keeps sending NTLM tickets (therefore getting rejected).
  • This is despite the fact that all the prerequisites for (C) being able to talk Kerberos to (G) being met - e.g. it is possible to do a klist get HTTP/spn-of-G from (C) and receive a proper Kerberos ticket, even when impersonating the exact same user that (C) would normally run as
  • On top of that, there are often other applications on the same server as (C) that go through the same flow just fine
  • A reboot of the Windows Server instance that (C) is running on fixes this, making (C) send proper Kerberos ticket to (G) after the restart

My question is: Is there any other possibility to fix such a situation, without rebooting the server ?

Things I already tried without success:

  • Restarting the application running on (C). In case (C) is an IIS application I tried restarting app pool, or iisreset. But I've seen this issue also happen e.g. in a case where (C) is a C# command-line program running to completion every 15min.
  • Flushing DNS on the server where (C) runs, with ipconfig /flushdns
  • Purging all cached Kerberos tickets on the server where (C) runs (with klist purge executed for all logon sessions using a powershell script)
Adames answered 22/7, 2019 at 9:0 Comment(9)
The only reason Windows will switch from Kerberos to NTLM (when Kerberos already succeeded) is because the client cannot reach the DC to get a ticket. You can potentially enable netlogon logs to see if it indicates why it fell back. Also, more importantly, check that it's actually an NTLM token being sent, not something that just doesn't look like Kerberos. This is important because Negotiate has layers and may wrap a Kerberos ticket differently depending on context.Maurer
Ok, a bit of clarification perhaps. In this case Windows is not switching back from Kerberos to NTLM. At first, the application does NTLM, because in our gateway logic it is receiving WWW-Authenticate: NTLM. Then, we flip a switch in the gateway, which then starts sending WWW-Authenticate: Negotiate instead of WWW-Authenticate: NTLM. At this point, the application (client) starts receiving WWW-Authenticate: Negotiate and should start sending a Kerberos ticket, but it doesn't. When the server is restarted, that fixes it.Adames
Other clients (different applications), even those hosted on the same exact server, switch from sending NTLM tickets to sending Kerberos tickets automatically, without any issues, towards the same gateway.Adames
Why send NTLM in the first place, then flip? Just always send Negotiate. Negotiate will do NTLM internally if it needs to.Maurer
It is necessary due to the way the gateway and the downstream web services are set up, in order to be able to toggle kerberos on/off without also having to remove the SPN. I don't believe it's relevant to the issue, however. Switching from NTLM to Negotiate works fine 99% of time. And even then, the issue is with the fact that during Negotiate, an NTLM ticket is being sent when it has no reason to not be Kerberos. I'm suspecting some caching in Windows, since a restart always fixes it. I'm looking for things I could do (except the ones I listed) that would help me avoid the restart.Adames
Have you checked the clock on (C)? System clock, time-of-day. It's possible that it's only doing an NTP update on reboot and not periodically, and that the hardware clock on (C) drifts. Mismatched time stamps kill Kerberos tickets and are sometimes the cause of intermittent problems.Provost
@StephanSamuel Thanks for the suggestion. I believe I tried that last time it happened, checking the localdatetime using powershell and gwmi and the clocks looked pretty synced to me (with 1s precision at least, not sure whether that's enough).Adames
Have you been able to get any insights into it, why does the downgrade to NTLM happens? We are also facing exact same issue. We have a Constrained Kerberos delegation setup with Kerberos only authentication allowed. So everything works fine until a client(ex: firefox) which isnt configured for kerberos starts sending NTLM and gets it request rejected. What follows after that is that all subsequent requests from every other client starts sending NTLM header and gets rejected. It appears that Once a NTLM authentication is attempted, the kerberos ticket generation is broken or somethingMolehill
@Molehill I don't remember how we solved this one in particular. But to this day we have occasional annoying issues with it, due this whole SSPI implementation in Windows being a total black box with some caching going on etc. We explicitly reject NTLM tickets on the API Gateway level so the upstream servers basically never get a NTLM request. I'm not sure how to help with your particular issue though.Adames

© 2022 - 2024 — McMap. All rights reserved.