Set hadoop system user for client embedded in Java webapp
Asked Answered
A

3

29

I would like to submit MapReduce jobs from a java web application to a remote Hadoop cluster but am unable to specify which user the job should be submitted for. I would like to configure and use a system user which should be used for all MapReduce jobs.

Currently I am unable to specify any user and no matter what the hadoop job runs under the username of the currently logged in user of the client system. This causes an error with the message

Permission denied: user=alice, access=WRITE, inode="staging":hduser:supergroup:rwxr-xr-x

... where "alice" is the local, logged in user on the client machine.

I have tried

  1. various combinations of creating UserGroupInformation instances (both proxies and normal user) and
  2. setting the Java System property with -Duser.name=hduser, changing the USER envar and as a hard coded System.setProperty("user.name", "hduser") call.

... to no avail. Regarding 1) I admit to having no clue on how these classes are supposed to be used. Also please note that changing the Java System property is obviously not a real solution for use in the web application.

Does any body know how you specify which user Hadoop uses to connect to a remote system?

PS/ Hadoop is using the default configuration meaning that no authentication is used when connecting to the cluster and that Kerberos is not used in communicating with the remote machines.

Areaway answered 14/6, 2012 at 20:59 Comment(0)
A
49

Finally I stumbled on the constant

static final String HADOOP_USER_NAME = "HADOOP_USER_NAME";`

in the UserGroupInformation class.

Setting this either as an environment variable, as a Java system property on startup (using -D) or programmatically with System.setProperty("HADOOP_USER_NAME", "hduser"); makes Hadoop use whatever username you want for connecting to the remote Hadoop cluster.

Areaway answered 16/6, 2012 at 10:8 Comment(3)
While trying to resolve the issue I discovered how the UserGroupInformation should be used. It could be of interest that it is possible to run a Hadoop jobs as any user over a common system user. This is called impersonation in Hadoop parlance. Note that this requires additional configuration of the hadoop cluster. Also note that I have not yet managed to get this work... :-)Areaway
You can also just set environment variable HADOOP_USER_NAME. That is also sufficient :)Gombroon
It worked for me today and saved my a lot of hours work. Thanks buddyGrafton
S
6

The code below works for me the same as

System.setProperty("HADOOP_USER_NAME", "hduser")
UserGroupInformation ugi = UserGroupInformation.createRemoteUser("hduser"); 
ugi.doAs(new PrivilegedExceptionAction<Void>() {
    public Void run() throws Exception {
        Configuration configuration = new Configuration(); 
        configuration.set("hadoop.job.ugi", "hduser");
        int res = ToolRunner.run(configuration, new YourTool(), args);
        return null; 
    }
});
Streetman answered 21/3, 2013 at 11:57 Comment(0)
O
2

I am able to resolve similar issue by using secure impersonation feature http://hadoop.apache.org/docs/stable1/Secure_Impersonation.html

following is code snippet

    UserGroupInformation ugi = UserGroupInformation.createProxyUser("hduser", UserGroupInformation.getLoginUser()); 

    ugi.doAs(new PrivilegedExceptionAction() { 
    public Void run() throws Exception { 
      Configuration jobconf = new Configuration(); 
      jobconf.set("fs.default.name", "hdfs://server:hdfsport"); 
      jobconf.set("hadoop.job.ugi", "hduser"); 
      jobconf.set("mapred.job.tracker", "server:jobtracker port"); 
      String[] args = new String[] { "data/input", "data/output" }; 
      ToolRunner.run(jobconf, WordCount.class.newInstance(), args); 
      return null; 
    } });

The remote (windows desktop host in my case) login user id should be added in core-site.xml as mentioned in above mentioned URL

Oxidation answered 26/6, 2012 at 17:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.