Zookeeper/SASL Checksum failed
Asked Answered
A

1

2

How do I fix the problem that generates this error:

WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@1040] - Client failed to SASL authenticate: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)]
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)]
    at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:199)
    at org.apache.zookeeper.server.ZooKeeperSaslServer.evaluateResponse(ZooKeeperSaslServer.java:50)

I have set up Zookeeper on an AWS EC2 instance. I have outlined the steps I followed to set up Kerberos and Zookeeper here. Zookeeper seems to be working:

zookeeper@zookeeper-server-01:~/zk/zookeeper-3.4.11$ JVMFLAGS="-Djava.security.auth.login.config=/home/zookeeper/jaas/jaas.conf -Dsun.security.krb5.debug=true" bin/zkServer.sh start-foreground
...
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbAsRep cons in KrbAsReq.getReply zookeeper/zookeeper-server-01
2017-12-22 00:21:52,308 [myid:] - INFO  [main:Login@297] - Server successfully logged in.
2017-12-22 00:21:52,312 [myid:] - INFO  [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181
2017-12-22 00:21:52,313 [myid:] - INFO  [Thread-1:Login$1@130] - TGT refresh thread started.
2017-12-22 00:21:52,313 [myid:] - INFO  [Thread-1:Login@305] - TGT valid starting at:        Fri Dec 22 00:21:52 UTC 2017
2017-12-22 00:21:52,313 [myid:] - INFO  [Thread-1:Login@306] - TGT expires:                  Fri Dec 22 10:21:52 UTC 2017
2017-12-22 00:21:52,314 [myid:] - INFO  [Thread-1:Login$1@185] - TGT refresh sleeping until: Fri Dec 22 08:25:59 UTC 2017

When I try, however, to connect a zkCli.sh (running on a different EC2 instance) to it, the server closes the connection and outputs the checksum error above.

The Zookeeper client appears to be able to connect to the Zookeeper server:

JVMFLAGS="-Djava.security.auth.login.config=/home/admin/Downloads/zookeeper-3.4.11/conf/zookeeper-test-client-jaas.conf -Dsun.security.krb5.debug=true" bin/zkCli.sh -server zookeeper-server-01.eigenroute.com:2181
Connecting to zookeeper-server-01.eigenroute.com:2181
2017-12-22 00:27:12,779 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=
3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0, built on 11/01/2017 18:06 GMT
...
2017-12-22 00:27:12,788 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/home/admin/Downloads/zookeeper-3.4.11
2017-12-22 00:27:12,789 [myid:] - INFO  [main:ZooKeeper@441] - Initiating client connection, connectString=zookeeper-server-01.eigenroute.com:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@1de0aca6
Welcome to ZooKeeper!
JLine support is enabled
...
>>> KrbAsReq creating message
[zk: zookeeper-server-01.eigenroute.com:2181(CONNECTING) 0] >>> KrbKdcReq send: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000, number of retries =3, #bytes=166
>>> KDCCommunication: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000,Attempt =1, #bytes=166
>>> KrbKdcReq send: #bytes read=310
>>>Pre-Authentication Data:
...

The client receives an error about needing preauthorization, but then appears to be successfully logged in (does this mean successfully authenticated?) to ...the Zookeeper server? Or logged into Kerberos?:

...
KRBError received: NEEDED_PREAUTH
KrbAsReqBuilder: PREAUTH FAILED/REQ, re-send AS-REQ
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 18 17 16 23.
Looking for keys for: zktestclient/[email protected]
Added key: 17version: 3
Added key: 18version: 3
Looking for keys for: zktestclient/[email protected]
Added key: 17version: 3
Added key: 18version: 3
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 18 17 16 23.
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbAsReq creating message
>>> KrbKdcReq send: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000, number of retries =3, #bytes=253
>>> KDCCommunication: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000,Attempt =1, #bytes=253
>>> KrbKdcReq send: #bytes read=742
>>> KdcAccessibility: remove kerberos-server-01.eigenroute.com
Looking for keys for: zktestclient/[email protected]
Added key: 17version: 3
Added key: 18version: 3
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbAsRep cons in KrbAsReq.getReply zktestclient/eigenroute.com
2017-12-22 00:27:13,286 [myid:] - INFO  [main-SendThread(35.169.37.216:2181):Login@297] - Client successfully logged in.
...

The client then opens a socket connection to the Zookeeper server, and attempts to SASL authenticate to it:

...
2017-12-22 00:27:13,312 [myid:] - INFO  [main-SendThread(35.169.37.216:2181):ClientCnxn$SendThread@103
5] - Opening socket connection to server 35.169.37.216/35.169.37.216:2181. Will attempt to SASL-authen
ticate using Login Context section 'Client'
2017-12-22 00:27:13,317 [myid:] - INFO  [main-SendThread(35.169.37.216:2181):ClientCnxn$SendThread@877
] - Socket connection established to 35.169.37.216/35.169.37.216:2181, initiating session
2017-12-22 00:27:13,359 [myid:] - INFO  [main-SendThread(35.169.37.216:2181):ClientCnxn$SendThread@1302] - Session establishment complete on server 35.169.37.216/35.169.37.216:2181, sessionid = 0x1000436873a0001, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
Found ticket for zktestclient/[email protected] to go to krbtgt/EIGENROUTE.COM@EIGENROUTE.
COM expiring on Fri Dec 22 10:27:13 UTC 2017
Entered Krb5Context.initSecContext with state=STATE_NEW
Found ticket for zktestclient/[email protected] to go to krbtgt/EIGENROUTE.COM@EIGENROUTE.
COM expiring on Fri Dec 22 10:27:13 UTC 2017
Service ticket not found in the subject
>>> Credentials acquireServiceCreds: same realm
Using builtin default etypes for default_tgs_enctypes
default etypes for default_tgs_enctypes: 18 17 16 23.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbKdcReq send: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000, number of retries =3, #bytes=712
>>> KDCCommunication: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000,Attempt =1, #bytes=712
>>> KrbKdcReq send: #bytes read=678
>>> KdcAccessibility: remove kerberos-server-01.eigenroute.com
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbApReq: APOptions are 00000000 00000000 00000000 00000000
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
Krb5Context setting mySeqNumber to: 50687702
Krb5Context setting peerSeqNumber to: 0
Created InitSecContextToken:
0000: 01 00 6E 82 02 6B 30 82   02 67 A0 03 02 01 05 A1  ..n..k0..g......
...
0260: 33 25 94 1F 60 93 E9 CF   7E EF 15 82 F8 6D ED 06  3%..`........m..
0270: 43                                                 C

2017-12-22 00:27:13,405 [myid:] - INFO  [main-SendThread(35.169.37.216:2181):ClientCnxn$SendThread@1161] - Unable to read additional data from server sessionid 0x1000436873a0001, likely server has closed socket, closing socket connection and attempting reconnect

WATCHER::

WatchedEvent state:Disconnected type:None path:null

So SASL authentication is not a complete failure, but the Zookeeper server closes the connection (on account of a checksum failure).

UPDATE #1. In response to T-Heron's comment, the result of nslookup zookeeper-server-01.eigenroute.com on the client machine is:

Server:     172.31.0.2
Address:    172.31.0.2#53

Non-authoritative answer:
Name:   zookeeper-server-01.eigenroute.com
Address: 35.169.37.216

The DNS entry for zookeeper-server-01.eigenroute.com is:

zookeeper-server-01.eigenroute.com  30 minutes  A       
35.169.37.216

enter image description here

On the client machine, /etc/hosts contains:

127.0.1.1 ip-172-31-95-211.ec2.internal ip-172-31-95-211
127.0.0.1 localhost
34.239.197.36 kerberos-server-02

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

(kerberos-server-02 is misnamed, it is not a KDC, when I comment this line out the result is the same) and on the ZooKeeper server, zookeeper-server-01.eigenroute.com, /etc/hosts contains:

127.0.1.1 ip-172-31-88-14.ec2.internal ip-172-31-88-14
127.0.0.1 localhost
34.225.180.212 kerberos-server-01

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

(the entry for kerberos-server-01 doesn't need to be there - when I remove it the result is the same).

Can someone explain how to solve the checksum failure? Thanks!

Atworth answered 22/12, 2017 at 0:43 Comment(0)
A
2

My KDC had the following principals:

zookeeper/[email protected]
zookeeper/[email protected]

In the JAAS configuration for the ZooKeeper server, whose host name is zookeeper-server-01.eigenroute.com, I used a keytab that I created for zookeeper/[email protected].

When I instead created a keytab for zookeeper/[email protected] and used this keytab in the JAAS configuration for the ZooKeeper server, everything worked - SASL authentication from the client succeeded.

I would rather use the fully qualified domain name (zookeeper-server-01.eigenroute.com) in the name of the Kerberos principal, rather than the IP address. If anyone can tell me how to get that working, I'll accept that as the answer. Until then, this will suffice.

UPDATE: I figured it out. The Zookeeper client takes the FQDN from the -server argument, looks up the IP Address of this FQDN, and creates an InetSocketAddress object from this (org.apache.zookeeper.client.StaticHostProvider). Then to get the host name, it calls .getHostName (org.apache.zookeeper.ClientCnxn.SendThread.startConnect). On my local machine, this returns:

ec2-35-169-37-216.compute-1.amazonaws.com

and on my client AWS EC2 instance, this returns:

35.169.37.216

when instead I expected it to return the FQDN. This is why on my AWS EC2 client machine, the ZooKeeper client tries to get a ticket for:

zookeeper/[email protected]

and on my local machine, the ZooKeeper client tries to get a ticket for:

zookeeper/[email protected]

So I need AWS to make sure that a reverse DNS lookup on 35.169.37.216 yields zookeeper-server-01.eigenroute.com. The solution I found so far is to ask AWS to set up the mapping for the reverse DNS.

Ideally, ZooKeeper would have an option to skip this reverse DNS lookup and just use the FQDN as the host name (maybe it does and I haven't found it).

Atworth answered 23/12, 2017 at 21:51 Comment(7)
From your client machine, what is the result of "nslookup zookeeper-server-01.eigenroute.com"? If the result is not an A record for 35.169.37.216, then that is the problem. I'm guessing this, since IP works inside the Kerberos service principal entry but the hostname does not. Secondly, the result should be an A record, not a CNAME. Fix DNS if you need to, if DNS is already correct, then check for bad hosts file entry for zookeeper-server-01.eigenroute.com on both the client and server machines. The KDC sounds like it is fine, btw.Fatidic
@T-Heron: Thanks - I have edited the question to add the information you instructed to provide. Indeed the DNS entry is an A record, not a CNAME. The /etc/hosts files on both the client and the server machines seem fine (or should I add an entry in them?). Does the result of nslookup zookeeper-server-01.eigenroute.com seem OK?Atworth
You might have a duplicate SPN. Run this command and see if more than one result returns: setspn -Q zookeeper/zookeeper-server-01.eigenroute.com. If you see more than one result, that's the problem, delete the bad entry.Fatidic
@T-Heron: For my KDC, I am using MIT Kerberos on a Debian Stretch (Linux) OS, and therefore the setspn command is not available to me. I couldn't find an equivalent command for Linux. Is there a command on Linux (or within kadmin.local or ktutil) that would be equivalent? Thanks.Atworth
Hmmm....I'm sorry, I don't know the equivalent command to check for duplicate SPNs in MIT Kerberos. A quick google search didn't find anything. Maybe there is one, but that would take deeper digging to find out. I'm unfamiliar with MIT Kerberos; don't have the time.Fatidic
This issues should have been addressed in Zookeeper 3.4.6: issues.apache.org/jira/browse/ZOOKEEPER-1666 Which version of Zookeeper are you using?Goethe
It happens with version 3.4.11 (WatchedEvent state:AuthFailed type:None path:null if reverse DNS does not resolve to the FQDN.) The issue you cited is actually different - it avoids a reverse DNS lookup if the host name in the connection string is a literal IP address. I would like an option to avoid a reverse DNS lookup if the host name is not a literal IP address.Atworth

© 2022 - 2024 — McMap. All rights reserved.