I am trying to learn " How Kerberos can be implemented in Hadoop ?" I have gone through this doc https://issues.apache.org/jira/browse/HADOOP-4487 I have also gone through Basic Kerberos stuff ( https://www.youtube.com/watch?v=KD2Q-2ToloE)
After learning from these resources I have come to a conclusion which I am representing through a diagram. Scenario : - User logs on to his computer gets authenticated by Kerberos Authentication and submits a map reduce job (Please read the description of the diagram it hardly needs 5 minutes of your time) I would like to explain the diagram and ask questions related with few steps (in bold) Numbers in yellow background represents the entire flow (Numbers 1 to 19) DT (with red background ) represents Delegation Token BAT (with green Background) represents Block Access Token JT (with Brown Background) represents Job Token
Steps 1,2,3 and 4 represents :- Request for a TGT (Ticket Granting Ticket) Request for a service Ticket for Name Node. Question1) Where should be KDC located ? Can it be on the machine where my name node or job tracker is present ?
Steps 5,6,7,8 and 9 represents :- Show service ticket to name node , get an Acknowledgement . Name Node will issue a Delegation Token (red) User will tell about the Token renewer (In this case it is Job Tracker)
Question2) User submits thisDelegation Token along with the job to Job Tracker. Will Delegation Token be shared with Task tracker ?
Steps 10,11,12,13 and 14 represents:- Ask a service ticket for Job tracker , get the service ticket from KDC Show this ticket to Job Tracker and get an ACK from JobTracker Submit Job + Delegation Token to JobTracker.
Steps 15,16 and 17 represents:- Generate Block Access Token and spread across all Data Nodes. Send blockID and Block Access Token to Job Tracker and Job Tracker will pass it on to TaskTracker
Question 3)Who will ask for the BlockAccessToken and Block ID from the Name Node ? JobTracker or TaskTracker
Sorry, I missed number 18 by mistake. Step19 represents:- Job tracker generates Job Token (brown) and passes it to the TaskTrackers.
Question4)Can I conclude that there will be one Delegation Token per user which will be distributed throughout the cluster and there will be one Job token per job ? So a user will have only one Delegation Token and many Job Tokens(equal to the number of Jobs submitted by him) .
Please tell me if I missed something or I was wrong at some point in my explanation.