I can control the number of reducers by using PARALLEL clause in the statements which result in reducers.
I want to control the number of mappers. The data source is already created, and I can not reduce the number of parts in the data source. Is it possible to control the number of maps spawned by my pig statements? Can I keep a lower and upper cap on the number of maps spawned? Is it a good idea to control this?
I tried using pig.maxCombinedSplitSize, mapred.min.split.size, mapred.tasktracker.map.tasks.maximum etc, but they seem to not help.
Can someone please help me understand how to control the number of maps and possibly share a working example?