How does one specify the input file for a runner from Python?
Asked Answered
D

1

6

I am writing an external script to run a mapreduce job via the Python mrjob module on my laptop (not on Amazon Elastic Compute Cloud or any large cluster).

I read from the mrjob documentation that I should use MRJob.make_runner() to run a mapreduce job from a separate python script as follows.

mr_job = MRYourJob(args=['-r', 'emr'])
with mr_job.make_runner() as runner:
    ...

However, how do I specify which input file to use? I want to use a file "datalines.txt" in the same directory as my mapreduce script and other python script that runs the map reduce. Furthermore, how do I specify the output?

I could not find a function in the mrjob documentation that allows me to specify these parameters.

Dredger answered 24/9, 2012 at 16:38 Comment(0)
S
5

Getting started guide suggests that the input is read from stdin or files supplied at the command-line:

mr_job = MRYourJob(args=["datalines.txt"])
Smew answered 24/9, 2012 at 16:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.