You could use log4j which is the default logging framework that hadoop uses. So, from your MapReduce application you could do something like this:
import org.apache.log4j.Logger;
// other imports omitted
public class SampleMapper extends Mapper<LongWritable, Text, Text, Text> {
private Logger logger = Logger.getLogger(SampleMapper.class);
@Override
protected void setup(Context context) {
logger.info("Initializing NoSQL Connection.")
try {
// logic for connecting to NoSQL - ommitted
} catch (Exception ex) {
logger.error(ex.getMessage());
}
}
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// mapper code ommitted
}
}
This sample code will user log4j logger to log events to the inherited Mapper logger. All the log events will be logged to their respective task log's. You could visit the task logs from either JobTracker(MRv1)/ResourceManager(MRv2) webpage.
If you are using yarn you could access the application logs from command line using the following command:
yarn logs -applicationId <application_id>
While if you are using mapreduce v1, there is no single point of access from command line; hence you have to log into each TaskTracker and look in the configured path generally /var/log/hadoop/userlogs/attempt_<job_id>/syslog
specified in ${hadoop.log.dir}/userlogs
contains log4j output.