I am working on a hadoop
project and after many visit to various blogs and reading the documentation, I realized I need to use secondary sort feature provided by hadoop
framework.
My input format is of the form:
DESC(String) Price(Integer) and some other Text
I want the values in the reducer to be descending order of the Price.
Also while comparing DESC
I have a method which takes two strings and a Percentage and if similarity between the two strings equals or is greater than the percentage then I should consider them as equal.
The problem is after the Reduce Job is finished I can see some DESC
which is similar to the other string and yet they are in different group.
Here is my compareTo
method of Composite key:
public int compareTo(VendorKey o) {
int result =-
result = compare(token, o.token, ":") >= percentage ? 0:1;
if (result == 0) {
return pid> o.pid ?-1: pid < o.pid ?1:0;
}
return result;
}
and compare method of Grouping Comparator:
public int compare(WritableComparable a, WritableComparable b) {
VendorKey one = (VendorKey) a;
VendorKey two = (VendorKey) b;
int result = ClusterUtil.compare(one.getToken(), two.getToken(), ":") >= one.getPercentage() ? 0 : 1;
// if (result != 0)
// return two.getToken().compareTo(one.getToken());
return result;
}