Question
On a Flink standalone cluster, running on a server, I am developing a Flink streaming job in Scala. The job consumes data from more than 1 Kafka topics, (do some formatting,) and write results to HDFS.
One of the Kafka topic, and HDFS, they both require separate Kerberos authentications (because they belong to completely different clusters).
My questions are:
- Is it possible (if yes, how?) to use two Kerberos keytabs (one for Kafka, the other for HDFS) from a Flink job on a Flink cluster, running on a server? (so the Flink job can consume from Kafka topic and write to HDFS at the same time)
- If not possible, what is a reasonable workaround, for the Kafka-Flink-HDFS data streaming when Kafka and HDFS are both Kerberos protected?
Note
- I am quite new to the most of the technologies mentioned here.
- The Flink job can write to HDFS if it doesn't need to consume the Kerberos-requiring topic. In this case, I specified the information of HDFS to
security.kerberos.login.keytab
andsecurity.kerberos.login.principal
inflink-conf.yaml
- I am using HDFS Connector provided from Flink to write to HDFS.
Manually switching the Kerberos authentication between the two principals was possible. In [realm] section in
krb5.conf
file, I specified two realms, one for Kafka, the other for HDFS.kinit -kt path/to/hdfs.keytab [principal: [email protected]...]
kinit -kt path/to/kafka.keytab [principal: [email protected]...]
Environment
- Flink (v1.4.2) https://ci.apache.org/projects/flink/flink-docs-stable/
- Kafka client (v0.10.X)
- HDFS (Hadoop cluster HDP 2.6.X)
Thanks for your attentions and feedbacks!
UserGroupInformation
which bypasses part of the standard Java implementation of Kerberos, and in general (...) the Kafka client uses raw JAAS configuration for standard Java impl. They don't play well together. Or at least, they did not when I tested HDFS + Hive JDBC in custom Java code a few years ago ; maybe Flink bypasses the UGI to avoid side effects. – Entropy