mrjob: Invalid bootstrap action path, must be a location in Amazon S3
Asked Answered
U

2

12

I am on windows 7. I installed mrjob and when I run the example word_count file from the website, it works fine on the local machine. However, I get the error when attempting to run it on Amazon EMR. I even tested connecting to amazon s3 with just boto and it works.

mrjob.conf file

runners:
  emr:
    aws_access_key_id: xxxxxxxxxxxxx
    aws_region: us-east-1
    aws_secret_access_key: xxxxxxxx
    ec2_key_pair: bzy
    ec2_key_pair_file: C:\aa.pem
    ec2_instance_type: m1.small
    num_ec2_instances: 3
    s3_log_uri: s3://myunique/
    s3_scratch_uri: s3://myunique/

running the following in my cmd

python word_count.py -c mrjob.conf -r emr mytext.txt

it produces

enter image description here

Upon suggestions that it was a windows path related issue, I double checked the parse.py in the source code, and it seems to have the relevant check for dealing with window file types

# Used to check if the candidate candidate uri is actually a local windows path.
WINPATH_RE = re.compile(r"^[aA-zZ]:\\")


def is_windows_path(uri):
    """Return True if *uri* is a windows path."""
    if WINPATH_RE.match(uri):
        return True
    else:
        return False


def is_uri(uri):
    """Return True if *uri* is any sort of URI."""
    if is_windows_path(uri):
        return False

    return bool(urlparse(uri).scheme)

What I don't understand is that I am still getting the error even after the updated code, and I'm not sure how to move forward with this.

Unharness answered 22/4, 2014 at 7:24 Comment(4)
I wish I could help you, but I don't work on Windows and currently don't have easy access to AWS/EMR. One thing I suggest though is to look at the error logs. The ones Hadoop spews out are still quite cryptic, but they often give you enough clues as to what are going wrong.Adan
Please re-run with -v and post the whole thing to paste.pound-python.org, after redacting the keys, of course. Do you not have bootstrap-action configured?Mastoidectomy
@Mastoidectomy paste.pound-python.org/show/rL6lwzD3tsASsQMXeq13Unharness
@KJW: it says your config yaml is malformed.Mastoidectomy
Z
3

The problems you are experiencing is due to the windows file system using the escape character \ (backslash) in its path. Just double it up and you should not have any more problems.

Change your mrjob.conf file to:

runners:
  emr:
    aws_access_key_id: xxxxxxxxxxxxx
    aws_region: us-east-1
    aws_secret_access_key: xxxxxxxx
    ec2_key_pair: bzy
    ec2_key_pair_file: C:\\aa.pem
    ec2_instance_type: m1.small
    num_ec2_instances: 3
    s3_log_uri: s3://myunique/
    s3_scratch_uri: s3://myunique/

for more information go visit: http://yaml.org/spec/1.2/spec.html#id2770814

Zerline answered 9/5, 2014 at 12:31 Comment(0)
M
1

I was having a similar problem, and found that my issue was that I had included code from various files with file paths inside of my job. If that is the case, the error noted will also occur.

Merrifield answered 2/6, 2014 at 18:17 Comment(2)
not sure if I under this compeletely, how did you manage to sort it outUnharness
If you have a python script that tries to access a local file, like a helper function in a different file, or data that is in that file, the file you reference that works locally does not exist on the remote machine - so it will fail, giving that error. I added the answer in case another searcher finds this question, but is getting the error for my reason instead of yours.Merrifield

© 2022 - 2024 — McMap. All rights reserved.