Java Comparator for byte array (lexicographic)
Asked Answered
C

3

15

I have a hashmap with byte[] keys. I'd like to sort it through a TreeMap.

What is the most effective way to implement the comparator for lexicographic order?

Cravat answered 24/2, 2011 at 17:19 Comment(0)
N
26

Using Guava, you can use either of:

The UnsignedBytes comparator appears to have an optimized form using Unsafe that it uses if it can. Comments in the code indicate that it may be at least twice as fast as a normal Java implementation.

Nowhere answered 24/2, 2011 at 17:49 Comment(2)
do we have the solution in "Java",if so please post a working example.Resin
As ColinD says in the comment to my answer, my solution is the same as the non optimized one in Guava. So you can straight use mine, which is a working example, or follow ColinD's links.Cravat
C
20

Found this nice piece of code in Apache Hbase:

    public int compare(byte[] left, byte[] right) {
        for (int i = 0, j = 0; i < left.length && j < right.length; i++, j++) {
            int a = (left[i] & 0xff);
            int b = (right[j] & 0xff);
            if (a != b) {
                return a - b;
            }
        }
        return left.length - right.length;
    }
Cravat answered 24/2, 2011 at 18:12 Comment(3)
This is basically what the non-optimized version of Guava's UnsignedBytes.lexicographicalComparator() does.Nowhere
Hmm, why did they use i and j, when one variable would've been sufficient. Also, storing int length = Math.min(left.length, right.length) and comparing i < length would improve this for large arraysPreach
you would expect that the length field of the array would be as expensiveCravat
V
-2

I'm assuming the problem is just with the "byte vs. byte" comparison. Dealing with the arrays is straightforward, so I won't cover it. With respect to byte vs. byte, my first thought is to do this:

public class ByteComparator implements Comparator<byte> {
  public int compare(byte b1, byte b2) {
    return new Byte(b1).compareTo(b2);
  }
}

But that won't be lexicographic: 0xFF (the signed byte for -1) will be considered smaller than 0x00, when lexicographically it's bigger. I think this should do the trick:

public class ByteComparator implements Comparator<byte> {
  public int compare(byte b1, byte b2) {
    // convert to unsigned bytes (0 to 255) before comparing them.
    int i1 = b1 < 0 ? 256 + b1 : b1;
    int i2 = b2 < 0 ? 256 + b2 : b2;
    return i2 - i1;
  }
}

Probably there is something in Apache's commons-lang or commons-math libraries that does this, but I don't know it off hand.

Verminous answered 24/2, 2011 at 17:40 Comment(2)
There is Byte.comparator built into Java. No need to implement this.Bulk
@Bulk Java's Byte.comparator is not lexicographic in the way OP probably wants: Java's built-in Byte.comparator considers byte value 255 to be smaller than byte value 0.Verminous

© 2022 - 2024 — McMap. All rights reserved.