Apache Pig permissions issue
Asked Answered
M

2

7

I'm attempting to get Apache Pig up and running on my Hadoop cluster, and am encountering a permissions problem. Pig itself is launching and connecting to the cluster just fine- from within the Pig shell, I can ls through and around my HDFS directories. However, when I try and actually load data and run Pig commands, I run into permissions-related errors:

grunt> A = load 'all_annotated.txt' USING PigStorage() AS (id:long, text:chararray, lang:chararray);
grunt> DUMP A;
2011-08-24 18:11:40,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - You don't have permission to perform the operation. Error from the server: org.apache.hadoop.security.AccessControlException: Permission denied: user=steven, access=WRITE, inode="":hadoop:supergroup:r-xr-xr-x
2011-08-24 18:11:40,977 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias A
Details at logfile: /Users/steven/Desktop/Hacking/hadoop/pig/pig-0.9.0/pig_1314230681326.log
grunt> 

In this case, all_annotated.txt is a file in my HDFS home directory that I created, and most definitely have permissions to; the same problem occurs no matter what file I try to load. However, I don't think that's the problem, as the error itself indicates Pig is trying to write somewhere. Googling around, I found a few mailing list posts suggesting that certain Pig Latin statements (order, etc.) need write access to a temporary directory on the HDFS file system whose location is controlled by the hadoop.tmp.dir property in hdfsd-site.xml. I don't think load falls into that category, but just to be sure, I changed hadoop.tmp.dir to point to a directory within my HDFS home directory, and the problem persisted.

So, anybody out there have any ideas as to what might be going on?

Maziar answered 25/8, 2011 at 16:38 Comment(1)
For people who found this post when looking for ERROR 1066: Unable to open iterator for alias here is a generic solution.Shutter
P
13

Probably your pig.temp.dir setting. It defaults to /tmp on hdfs. Pig will write temporary result there. If you don't have permission to /tmp, Pig will complain. Try to override it by -Dpig.temp.dir.

Postage answered 26/8, 2011 at 6:51 Comment(1)
Yup, that did it! I didn't realize that Pig had its own tmp directory. Thanks very much!Maziar
K
0

A problem might be that hadoop.tmp.dir is a directory on your local filesystem, not HDFS. Try setting that property to a local directory you know you have write access to. I've run into the same error using regular MapReduce in Hadoop.

Kuban answered 25/8, 2011 at 16:53 Comment(4)
Huh. Well, in that case, the error makes even less sense. I definitely have write access to /tmp on my local filesystem. Just to be sure, I changed it back, and the problem still occurs. I really think that whatever's going on is HDFS-related somehow. Thanks for the suggestion, though...Maziar
inode="":hadoop:supergroup:r-xr-xr-x means that the user hadoop is trying to write to the HDFS directory /. Try hadoop fs -chmod 755 /, which will add write permissions to the hadoop user. You may need to use 775 if you are not executing as hadoop, but are in the supergroup group.Kuban
Thanks for the reply! I don't actually have permissions to "/"; I'm not the administrator of the cluster I'm using, so I don't think I'll be able to chmod anything at that level of the file system. Do you happen to know why Pig would be trying to write to the HDFS root?Maziar
As per Daniel's answer, it looks like it was trying to create the directory /tmp in HDFS, thus it needed to write to / to create that directory.Kuban

© 2022 - 2024 — McMap. All rights reserved.