How to specify username when putting files on HDFS from a remote machine?
Asked Answered
C

5

34

I have a Hadoop cluster setup and working under a common default username "user1". I want to put files into hadoop from a remote machine which is not part of the hadoop cluster. I configured hadoop files on the remote machine in a way that when

hadoop dfs -put file1 ...

is called from the remote machine, it puts the file1 on the Hadoop cluster.

the only problem is that I am logged in as "user2" on the remote machine and that doesn't give me the result I expect. In fact, the above code can only be executed on the remote machine as:

hadoop dfs -put file1 /user/user2/testFolder

However, what I really want is to be able to store the file as:

hadoop dfs -put file1 /user/user1/testFolder

If I try to run the last code, hadoop throws error because of access permissions. Is there anyway that I can specify the username within hadoop dfs command?

I am looking for something like:

hadoop dfs -username user1 file1 /user/user1/testFolder
Christachristabel answered 7/7, 2012 at 0:5 Comment(2)
I think stackoverflow.com/questions/11041253 answers perfectly.Kevel
I think you need to change right answer to HADOOP_USER_NAME variant with most upvotes. whoami hack is not right thing to do when you can set env variable.Imaginary
U
13

By default authentication and authorization is turned off in Hadoop. According to the Hadoop - The Definitive Guide (btw, nice book - would recommend to buy it)

The user identity that Hadoop uses for permissions in HDFS is determined by running the whoami command on the client system. Similarly, the group names are derived from the output of running groups.

So, you can create a new whoami command which returns the required username and put it in the PATH appropriately, so that the created whoami is found before the actual whoami which comes with Linux is found. Similarly, you can play with the groups command also.

This is a hack and won't work once the authentication and authorization has been turned on.

Unreality answered 7/7, 2012 at 1:30 Comment(9)
Yes - read somewhere that Hadoop was initially used between a small trusted users and security was not really a concern, later as the usage grew security was added on top of Hadoop. Actually, security should be a concern from ground up in software design and not an after thought. Just my 2c.Unreality
thanks. could you please elaborate on how I should create a new "whoami" command and put it in the path? maybe with an example. thanksChristachristabel
create a text file file whoami with echo yourname and give it executable permissions. Add the folder of the whoami as the first thing to the PATH variable in .bashrc file.Unreality
nice hack but it doesn't work. I created the whoamifile and updated my path. now when I run whoami it returns the user1. But when I try to put files into hadoop using: "hadoop dfs -put file1 /user/user1/testFolder" it throws error due to permission and specifies username as user2:(Christachristabel
For some reason Hadoop is not picking the whoami which you created. Set the path properly and it should work.Unreality
Could you please elaborate on how I should set the path properly? I've set the path through ~/.profile and when executing whoami it works as expected. Any idea on why Hadoop is not picking the whoami?Christachristabel
post another query in SO and someone will help youUnreality
I wonder, in this case, who the actual client calling "whoami" is. I beleive its in the hadoop Shell wrapper class. That wrapper is probably called either by the data node which is attempting to create a file or by the client itslef.Discontinue
I think there's a caveat with the idea that it's running 'groups' to get the group to use in the hdfs file. I'm climbing the learning curve, but here's an example: Right now my regular account does not belong to any hadoop-related groups (e.g., hdfs, hive, or hadoop). When I -put a file with myuser:mygroup owner:group into hdfs, it shows up with myuser:myuser there. Any thoughts?Haematogenesis
C
92

If you use the HADOOP_USER_NAME env variable you can tell HDFS which user name to operate with. Note that this only works if your cluster isn't using security features (e.g. Kerberos). For example:

HADOOP_USER_NAME=hdfs hadoop dfs -put ...
Carnap answered 1/10, 2013 at 20:6 Comment(1)
is there an env variable to set the HDFS group?Sepalous
I
20

This may not matter to anybody, but I am using a small hack for this.

I'm exporting the HADOOP_USER_NAME in .bash_profile, so that every time I'm logging in, the user is set.

Just add the following line of code to .bash_profile:

export HADOOP_USER_NAME=<your hdfs user>
Izak answered 27/1, 2016 at 16:17 Comment(0)
U
13

By default authentication and authorization is turned off in Hadoop. According to the Hadoop - The Definitive Guide (btw, nice book - would recommend to buy it)

The user identity that Hadoop uses for permissions in HDFS is determined by running the whoami command on the client system. Similarly, the group names are derived from the output of running groups.

So, you can create a new whoami command which returns the required username and put it in the PATH appropriately, so that the created whoami is found before the actual whoami which comes with Linux is found. Similarly, you can play with the groups command also.

This is a hack and won't work once the authentication and authorization has been turned on.

Unreality answered 7/7, 2012 at 1:30 Comment(9)
Yes - read somewhere that Hadoop was initially used between a small trusted users and security was not really a concern, later as the usage grew security was added on top of Hadoop. Actually, security should be a concern from ground up in software design and not an after thought. Just my 2c.Unreality
thanks. could you please elaborate on how I should create a new "whoami" command and put it in the path? maybe with an example. thanksChristachristabel
create a text file file whoami with echo yourname and give it executable permissions. Add the folder of the whoami as the first thing to the PATH variable in .bashrc file.Unreality
nice hack but it doesn't work. I created the whoamifile and updated my path. now when I run whoami it returns the user1. But when I try to put files into hadoop using: "hadoop dfs -put file1 /user/user1/testFolder" it throws error due to permission and specifies username as user2:(Christachristabel
For some reason Hadoop is not picking the whoami which you created. Set the path properly and it should work.Unreality
Could you please elaborate on how I should set the path properly? I've set the path through ~/.profile and when executing whoami it works as expected. Any idea on why Hadoop is not picking the whoami?Christachristabel
post another query in SO and someone will help youUnreality
I wonder, in this case, who the actual client calling "whoami" is. I beleive its in the hadoop Shell wrapper class. That wrapper is probably called either by the data node which is attempting to create a file or by the client itslef.Discontinue
I think there's a caveat with the idea that it's running 'groups' to get the group to use in the hdfs file. I'm climbing the learning curve, but here's an example: Right now my regular account does not belong to any hadoop-related groups (e.g., hdfs, hive, or hadoop). When I -put a file with myuser:mygroup owner:group into hdfs, it shows up with myuser:myuser there. Any thoughts?Haematogenesis
Q
1

Shell/Command way:

Set HADOOP_USER_NAME variable , and execute the hdfs commands

  export HADOOP_USER_NAME=manjunath
  hdfs dfs -put <source>  <destination>

Pythonic way:

  import os 
  os.environ["HADOOP_USER_NAME"] = "manjunath"
Question answered 14/8, 2021 at 3:21 Comment(0)
G
0

There's another post with something similar to this that could provide a work around for you using streaming via ssh:

cat file.txt | ssh user1@clusternode "hadoop fs -put - /path/in/hdfs/file.txt"

See putting a remote file into hadoop without copying it to local disk for more information

Graniteware answered 7/7, 2012 at 16:42 Comment(1)
thanx. but that is my own post too. After trying that, I noticed that not using piping is faster. In fact, copying files to one of the hadoop machines using "sep" and then using "ssh" to copy the files from local drive into hadoop turned out to be faster. I am not sure about the reason but probably it has to do with limitations in terms of the amount of available buffer. Anyway, I am trying to skip both these steps and just use "hadoop" directly from a remote machine. It works in terms of copying files but I am facing having files under two different username.Christachristabel

© 2022 - 2024 — McMap. All rights reserved.