Connect to HDFS with Kerberos Authentication using Python
Asked Answered
B

2

11

I am trying to connect to HDFS protected with Kerberos authentication. I have following details but dont know how to proceed.

User
Password
Realm
HttpFs Url

I tried below code but getting Authentication error:

from hdfs.ext.kerberos import KerberosClient
import requests
import logging

logging.basicConfig(level=logging.DEBUG)

session = requests.Session()
session.verify = False

client = KerberosClient(url='http://x.x.x.x:abcd', session=session, 
mutual_auth='REQUIRED',principal='abcdef@LMNOPQ')

print(client.list('/'))

Error

INFO:hdfs.client:Instantiated   
<KerberosClient(url=http://x.x.x.x:abcd)>.
INFO:hdfs.client:Listing '/'.
DEBUG:hdfs.client:Resolved path '/' to '/'.
DEBUG:hdfs.client:Resolved path '/' to '/'.
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 
DEBUG:urllib3.connectionpool:http://x.x.x.x:abcd "GET /webhdfs/v1/? 
op=LISTSTATUS HTTP/1.1" 401 997
DEBUG:requests_kerberos.kerberos_:handle_401(): Handling: 401
ERROR:requests_kerberos.kerberos_:generate_request_header(): authGSSClientInit() failed:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests_kerberos/kerberos_.py", line 213, in generate_request_header
gssflags=gssflags, principal=self.principal)
kerberos.GSSError: ((' No credentials were supplied, or the credentials were unavailable or inaccessible.', 458752), ('unknown mech-code 0 for mech unknown', 0))
ERROR:requests_kerberos.kerberos_:((' No credentials were supplied, or the credentials were unavailable or inaccessible.', 458752), ('unknown mech-code 0 for mech unknown', 0))
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests_kerberos/kerberos_.py", line 213, in generate_request_header
gssflags=gssflags, principal=self.principal)
kerberos.GSSError: ((' No credentials were supplied, or the credentials were unavailable or inaccessible.', 458752), ('unknown mech-code 0 for mech unknown', 0))
DEBUG:requests_kerberos.kerberos_:handle_401(): returning <Response [401]>
DEBUG:requests_kerberos.kerberos_:handle_response(): returning <Response [401]>

I have password also, but dont know where to provide it.

Banana answered 15/7, 2019 at 4:52 Comment(1)
were you able to solve this? I am facing the same issue.Lycaon
C
2

from my understanding, you have to use kinit command to do the kerberos authentication at first, and then to run the code you attached

Contemplation answered 12/3, 2020 at 3:48 Comment(1)
how to run that?Kayleigh
L
2

Let's say you have priniciple : hdfs/[email protected] and your keytab file is : /var/run/cloudera-scm-agent/process/39-hdfs-NAMENODE/hdfs.keytab and if you wish to read a hdfs csv file already available at : /hadoop_test_data/filecount.csv, then use the following code and you will get the pandas dataframe with the contents of filecount.csv

More over here, I have used python version : 3.7.6

import io 
from csv import reader
from krbcontext import krbcontext
import subprocess 
import pandas as pd

try:
    with krbcontext(using_keytab=True,
                    principal='hdfs/[email protected]',
                    keytab_file='/var/run/cloudera-scm-agent/process/39-hdfs-NAMENODE/hdfs.keytab') as krb:
                    print(krb)
                    print('kerberos authentication successful') 
                    output = subprocess.Popen(["hadoop", "fs", "-cat", "/hadoop_test_data/filecount.csv"], stdout=subprocess.PIPE)
                    stdout,stderr = output.communicate()
                    data = str(stdout,'utf-8').split('\r\n')
                    df = pd.DataFrame( list(reader(data[1:])),columns=data[0].split(','))
                    print(df.shape)
                    print(df)

except Exception as e:
    print("Kerberos authentication unsuccessful")
    print("Detailed error is : "+e)

Let me know if you wish to know more about it.

Levator answered 13/5, 2020 at 9:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.