Hadoop Security
Asked Answered
G

2

6

Scenario : - User logs on to his computer gets authenticated by Kerberos Authentication and  submits a map reduce job

I am trying to learn " How Kerberos can be implemented in Hadoop ?" I have gone through this doc https://issues.apache.org/jira/browse/HADOOP-4487 I have also gone through Basic Kerberos stuff ( https://www.youtube.com/watch?v=KD2Q-2ToloE)

After learning from these resources I have come to a conclusion which I am representing through a diagram. Scenario : - User logs on to his computer gets authenticated by Kerberos Authentication and submits a map reduce job (Please read the description of the diagram it hardly needs 5 minutes of your time) I would like to explain the diagram and ask questions related with few steps (in bold) Numbers in yellow background represents the entire flow (Numbers 1 to 19) DT (with red background ) represents Delegation Token BAT (with green Background) represents Block Access Token JT (with Brown Background) represents Job Token

Steps 1,2,3 and 4 represents :- Request for a TGT (Ticket Granting Ticket) Request for a service Ticket for Name Node. Question1) Where should be KDC located ? Can it be on the machine where my name node or job tracker is present ?

Steps 5,6,7,8 and 9 represents :- Show service ticket to name node , get an Acknowledgement . Name Node will issue a Delegation Token (red) User will tell about the Token renewer (In this case it is Job Tracker)

Question2) User submits thisDelegation Token along with the job to Job Tracker. Will Delegation Token be shared with Task tracker ?

Steps 10,11,12,13 and 14 represents:- Ask a service ticket for Job tracker , get the service ticket from KDC Show this ticket to Job Tracker and get an ACK from JobTracker Submit Job + Delegation Token to JobTracker.

Steps 15,16 and 17 represents:- Generate Block Access Token and spread across all Data Nodes. Send blockID and Block Access Token to Job Tracker and Job Tracker will pass it on to TaskTracker

Question 3)Who will ask for the BlockAccessToken and Block ID from the Name Node ? JobTracker or TaskTracker

Sorry, I missed number 18 by mistake. Step19 represents:- Job tracker generates Job Token (brown) and passes it to the TaskTrackers.

Question4)Can I conclude that there will be one Delegation Token per user which will be distributed throughout the cluster and there will be one Job token per job ? So a user will have only one Delegation Token and many Job Tokens(equal to the number of Jobs submitted by him) .

Please tell me if I missed something or I was wrong at some point in my explanation.

Garnishee answered 28/2, 2013 at 16:15 Comment(2)
Rohit! Good Analysis; Did you got any solution to secure Hadoop? If yes, can you share the answer in this post?Cumbrance
Please Update answer if you got it @Rohit SarewarCay
B
0

Steps to follow to make sure Hadoop is secure

  1. Install Kerberos in any server accessible to all cluster nodes. yum install krb5-server yum install krb5-workstation yum install krb5-libs

  2. Modify Configuration file in KDC server configuration to setup acl files, admin keytab files, for the host. /var/kerberos/krb5kdc/kdc.conf

  3. Modify Configuration file /etc/krb5.conf to setup kdc host and admin server

  4. Creating database in KDC host

    $ kdb5_util create –r host_name -s

  5. Add administrators to the ACL file

    1. vi /etc/kdamin.acl
    2. Add admin principal ‘admin/admin@host_name’ in that file
  6. Add Admin principal $addprinc admin/admin@host_name

Install Kerberos clients on all Cluster Nodes

yum install krb5-workstation

Copy krb5.conf to all cluster nodes

Make sure to enable Secure mode in Hadoop by setting required configurations https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html

Verify :

  • Login as normal user to cluster gateway or node where user keytabs are deployed
  • Run “kinit –k –t /location/of/keytab file username@host_name”
  • And run HDFS commands or mapreduce jobs to verify cluster is secured

These are the basic steps to make sure kerberos is enabled in your cluster.

Beriosova answered 17/4, 2015 at 20:14 Comment(0)
B
0

Hadoop security mostly used Kerberos for authentication, sentry for authorization. Ranger like gateways, knox is used for security aspects http://commandstech.com/latest-hadoop-admin-interview-questions/

Buhrstone answered 21/1, 2019 at 5:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.