I've written a small benchmark to estimate the performance of the:
- NOP method (to get an idea of the baseline iteration speed);
- Original method, as provided by the OP ;
- RegExp;
- Compiled Regexp;
- The version provided by @maraca (w/o toLowerCase and substring);
- "fastIsHex" version (switch-based), I've added just for fun.
The test machine configuration is as follows:
- JVM: Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
- CPU: Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz
And here are the results I got for the original test string "0x123fa" and 10.000.000 iterations:
Method "NOP" => #10000000 iterations in 9ms
Method "isHexadecimal (OP)" => #10000000 iterations in 300ms
Method "RegExp" => #10000000 iterations in 4270ms
Method "RegExp (Compiled)" => #10000000 iterations in 1025ms
Method "isHexadecimal (maraca)" => #10000000 iterations in 135ms
Method "fastIsHex" => #10000000 iterations in 107ms
as you can see even the original method by the OP is faster than the RegExp method (at least when using JDK-provided RegExp implementation).
(for your reference)
Benchmark code:
public static void main(String[] argv) throws Exception {
//Number of ITERATIONS
final int ITERATIONS = 10000000;
//NOP
benchmark(ITERATIONS,"NOP",() -> nop(longHexText));
//isHexadecimal
benchmark(ITERATIONS,"isHexadecimal (OP)",() -> isHexadecimal(longHexText));
//Un-compiled regexp
benchmark(ITERATIONS,"RegExp",() -> longHexText.matches("0x[0-9a-fA-F]+"));
//Pre-compiled regexp
final Pattern pattern = Pattern.compile("0x[0-9a-fA-F]+");
benchmark(ITERATIONS,"RegExp (Compiled)", () -> {
pattern.matcher(longHexText).matches();
});
//isHexadecimal (maraca)
benchmark(ITERATIONS,"isHexadecimal (maraca)",() -> isHexadecimalMaraca(longHexText));
//FastIsHex
benchmark(ITERATIONS,"fastIsHex",() -> fastIsHex(longHexText));
}
public static void benchmark(int iterations,String name,Runnable block) {
//Start Time
long stime = System.currentTimeMillis();
//Benchmark
for(int i = 0; i < iterations; i++) {
block.run();
}
//Done
System.out.println(
String.format("Method \"%s\" => #%d iterations in %dms",name,iterations,(System.currentTimeMillis()-stime))
);
}
NOP method:
public static boolean nop(String value) { return true; }
fastIsHex method:
public static boolean fastIsHex(String value) {
//Value must be at least 4 characters long (0x00)
if(value.length() < 4) {
return false;
}
//Compute where the data starts
int start = ((value.charAt(0) == '-') ? 1 : 0) + 2;
//Check prefix
if(value.charAt(start-2) != '0' || value.charAt(start-1) != 'x') {
return false;
}
//Verify data
for(int i = start; i < value.length(); i++) {
switch(value.charAt(i)) {
case '0':case '1':case '2':case '3':case '4':case '5':case '6':case '7':case '8':case '9':
case 'a':case 'b':case 'c':case 'd':case 'e':case 'f':
case 'A':case 'B':case 'C':case 'D':case 'E':case 'F':
continue;
default:
return false;
}
}
return true;
}
So, the answer is no, for short-strings and the task at hand, RegExp is not faster.
When it comes to a longer strings, the balance is quite different,
below are results for the 8192 long hex string, I've generated with:
hexdump -n 8196 -v -e '/1 "%02X"' /dev/urandom
and 10.000 iterations:
Method "NOP" => #10000 iterations in 2ms
Method "isHexadecimal (OP)" => #10000 iterations in 1512ms
Method "RegExp" => #10000 iterations in 1303ms
Method "RegExp (Compiled)" => #10000 iterations in 1263ms
Method "isHexadecimal (maraca)" => #10000 iterations in 553ms
Method "fastIsHex" => #10000 iterations in 530ms
As you can see, hand-written methods (the one by macara and my fastIsHex), still beat the RegExp, but original method does not,
(due to substring() and toLowerCase()).
Sidenote:
This benchmark is very simple indeed and only tests the "worst case" scenario (i.e. a fully valid string), a real life results, with the mixed data lengths and a non-0 valid-invalid ratio, might be quite different.
Update:
I also gave a try to the char[] array version:
char[] chars = value.toCharArray();
for (idx += 2; idx < chars.length; idx++) { ... }
and it was even a bit slower than getCharAt(i) version:
Method "isHexadecimal (maraca) char[] array version" => #10000000 iterations in 194ms
Method "fastIsHex, char[] array version" => #10000000 iterations in 164ms
my guess is that is due to array copy inside toCharArray.
Update (#2):
I've run an additional 8k/100.000 iterations test to see if there is any real difference in speed between the "maraca" and "fastIsHex" methods, and have also normalized them to use exactly the same precondition code:
Run #1
Method "isHexadecimal (maraca) *normalized" => #100000 iterations in 5341ms
Method "fastIsHex" => #100000 iterations in 5313ms
Run #2
Method "isHexadecimal (maraca) *normalized" => #100000 iterations in 5313ms
Method "fastIsHex" => #100000 iterations in 5334ms
I.e. the speed difference between these two methods is marginal at best, and is probably due to a measurement error (as I'm running this on my workstation and not a specially setup clean test environment).
/^[range]*$/
does. – OchlocracyisHexadecimal
method would be/-?0x[0-9a-f]*/i
(though I believe that you actually want/-0x[0-9a-fA-F]*/
). – OchlocracyRegex.match(/0x[0-9a-f]+/, "x0x123fa")
or fromRegex.match(/0x[0-9a-f]+/, "0x123fag")
? – Clue