I am trying to enable bad input skipping on my Amazon Elastic MapReduce jobs. I am following the wonderful recipe described here:
http://devblog.factual.com/practical-hadoop-streaming-dealing-with-brittle-code
The link above says that I need to somehow set the following configuration parameters on an EMR job:
mapred.skip.mode.enabled=true
mapred.skip.map.max.skip.records=1
mapred.skip.attempts.to.start.skipping=2
mapred.map.tasks=1000
mapred.map.max.attempts=10
How do I set these (and other) mapred.XXX parameters on a JobFlow using Boto?