I would like to submit MapReduce jobs from a java web application to a remote Hadoop cluster but am unable to specify which user the job should be submitted for. I would like to configure and use a system user which should be used for all MapReduce jobs.
Currently I am unable to specify any user and no matter what the hadoop job runs under the username of the currently logged in user of the client system. This causes an error with the message
Permission denied: user=alice, access=WRITE, inode="staging":hduser:supergroup:rwxr-xr-x
... where "alice" is the local, logged in user on the client machine.
I have tried
- various combinations of creating
UserGroupInformation
instances (both proxies and normal user) and - setting the Java System property with
-Duser.name=hduser
, changing theUSER
envar and as a hard codedSystem.setProperty("user.name", "hduser")
call.
... to no avail. Regarding 1) I admit to having no clue on how these classes are supposed to be used. Also please note that changing the Java System property is obviously not a real solution for use in the web application.
Does any body know how you specify which user Hadoop uses to connect to a remote system?
PS/ Hadoop is using the default configuration meaning that no authentication is used when connecting to the cluster and that Kerberos is not used in communicating with the remote machines.
UserGroupInformation
should be used. It could be of interest that it is possible to run a Hadoop jobs as any user over a common system user. This is called impersonation in Hadoop parlance. Note that this requires additional configuration of the hadoop cluster. Also note that I have not yet managed to get this work... :-) – Areaway