Hadoop command line -D options not working
Asked Answered
C

2

8

I am trying to pass a variable (not property) using -D command line option in hadoop like -Dmapred.mapper.mystring=somexyz. I am able to set a conf property in Driver program and read it back in mapper. So I can use this to pass my string as additional parameter and set it in Driver. But I want to see if -D option can be used to do the same

My command is:

$HADOOP_HOME/bin/hadoop jar  /home/hduser/Hadoop_learning_path/toolgrep.jar /home/hduser/hadoopData/inputdir/ /home/hduser/hadoopData/grepoutput -Dmapred.mapper.mystring=somexyz

Driver program

String s_ptrn=conf.get("mapred.mapper.regex");

System.out.println("debug: in Tool Class mapred.mapper.regex "+s_ptrn + "\n"); Gives NULL

BUT this works

conf.set("DUMMYVAL","100000000000000000000000000000000000000"); in driver is read properly in mapper by get method. 

My question is if all of Internet is saying i can use -D option then why cant i? is it that this cannot be used for any argument and only for properties? whihc we can read by putitng in file that i should read in driver program then use it?

Something like

Configuration conf = new Configuration();
conf.addResource("~/conf.xml"); 

in driver program and this is the only way.

Cardiomegaly answered 8/7, 2014 at 12:39 Comment(0)
P
4

As Thomas wrote, you are missing the space. You are also passing variable mapred.mapper.mystring in your CLI, but in the code you are trying to get mapred.mapper.regex. If you want to use -D parameter, you should be using Tool interface. More about it is here - Hadoop: Implementing the Tool interface for MapReduce driver.

Or you can parse your CLI arguments like this:

@Override
public int run(String[] args) throws Exception {
Configuration conf = this.getConf();

String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
while (i<otherArgs.length) {
        if (otherArgs[i].equals("-x")) {
            //Save your CLI argument
            yourVariable = otherArgs[++i];
}
//then save yourVariable into conf for using in map phase

Than your command can be like this:

$HADOOP_HOME/bin/hadoop jar /home/hduser/Hadoop_learning_path/toolgrep.jar /home/hduser/hadoopData/inputdir/ /home/hduser/hadoopData/grepoutput -x yourVariable

Hope it helps

Portfolio answered 9/7, 2014 at 5:36 Comment(4)
Radek , I am using Tool interface thats not the problem.I have implemented something similar with CLI to pass around the values. but i was wondering about the significance of -D. Also i was trying ot give sample code hence you saw different name @ commandline and code parsing it but i took care of it in code. good catch .Cardiomegaly
Also one thing, you should use -D just after your jar, in your case: $HADOOP_HOME/bin/hadoop jar /home/hduser/Hadoop_learning_path/toolgrep.jar -D mapred.mapper.mystring=something /home/hduser/hadoopData/inputdir/ /home/hduser/hadoopData/grepoutputSartor
I solved the problem from another post here it was embedded in a users response and not the selected answer- -D property=value needs to be the FIRST Args to the MR. Boy not sure why such requirements are enforced. Also I learned importance of -D . with -D main sees 4 args but Tool runners run method is passed only args without -D and so we can access those wiht args[n] and -D via getters. Any other letter other thne -D we have to handle them as regular CLI parameters- Lesson Learned!Cardiomegaly
i saw your comment right after my post but that was it the placement of -D in the command was the problem thanks you get my voteCardiomegaly
F
2

To use -D option with hadoop jar command correctly, given below syntax should be used:

hadoop jar {hadoop-jar-file-path} {job-main-class} -D {generic options} {input-directory} {output-directory}

Hence -D option should be placed after job main class name i.e at third position. Because when we issue hadoop jar command then, hadoop scripts invokes RunJar class main(). This main () parses first argument to set Job Jar file in classpath and uses second argument to invoke job class main().

Once Job class main () is called then control is transferred to GenericOptionsParser which first parses generic command line arguments (if any) and sets them in Job's configuration object and then calls Job class' run () with remaining arguments (i.e input and output path)

Fortune answered 9/6, 2021 at 13:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.