How do I implement a string comparison in Java that takes the same amount of time no matter whether they match or where a mismatch (if any) occurs?
Asked Answered
P

6

12

I want to implement a String comparison function that doesn't take a different amount of time depending on the number of characters that match or the position of the first mismatch. I assume there must be a library out there somewhere that provides this, but I was unable to find it via a quick search.

So far, the best idea I've got is to sum the XOR of each character and return whether or not the sum is 0. However, I'm pretty sure this wouldn't work so well with Unicode. I also have a vague concern that HotSpot would do some optimizations that would change my constant-time property, but I can't think of a specific optimization that would do this off the top of my head.

Thanks.

UPDATE: Sorry, I don't believe I was clear. I'm not looking for O(1), I'm looking for something that won't leak timing information. This would be used to compare hashed password values, and if the time it took to compare was different based on where the first mismatch occurred, that would be leaking information to an attacker.

Pilloff answered 25/8, 2011 at 13:22 Comment(6)
The time it takes to compare 2 strings would always be a function of the length of the string. You have to do some sort of comparison one way or another - and the more characters you have, the longer this comparison would take.Hollis
How would the complexity of XORing each character not depend on the number of characters?Luger
Am I correct in assuming you're asking for a String comparison function (returning true of false for equal or non-equal) that is entirely independent from String length? I don't actually think that's theoretically possible unless you have a number of processors/cores equal to the maximum String length. The best I can think of is optimizing strongly through comparing hash codes and length first, but even the hash code calculation depends on String length.Dulcle
you might be able to use the hash field, it might help to determine 2 strings are not equal, if the hash was already computed, but I think that's about it.Tollman
Can you tell us why you want to do this? Given the answers telling you that this is either impossible or impractical, knowing what you are trying to achieve would help us give you a good answer.Lilias
Sorry for the original wording, which didn't express my intent. I've attempted to clarify.Pilloff
H
3

I see two immediate possibilities for not leaking password-related information in timing:

1/ Pad both the password string and candidate string out to 1K, with a known, fixed character (like A). Then run the following (pseudo-code):

match = true
for i = 0 to 1023:
    if password[i] != candidate[i]:
        match = false

That way, you're always taking the same amount of loops to do the comparison regardless of where it matches.

There's no need to muck about with xor since you can still do a simple comparison, but without exiting the loop early.

Just set the match flag to false if a mismatch is found and keep going. Once the loop exits (taking the same time regardless of size or content of password and candidate), then check whether it matched.

2/ Just add a large (relative to the normal comparison time) but slightly random delay at the end of the comparison. For example, a random value between 0.9 and 1.1 seconds. The time taken for the comparison should be swamped by the delay and the randomness should fully mask any information leakage (unless your randomness algorithm leaks information, of course).

That also has the added advantage of preventing brute force attacks since a password check takes at least about a second.

Hagride answered 25/8, 2011 at 13:27 Comment(8)
My concern is that the strings might not be normalized, so the function might return false even if both values are español (to use a completely made up and perhaps not even valid example). Also, I'm aware that Unicode is not an encoding, but as I understand it, normalization is a problem no matter what encoding is in use.Pilloff
@Hank, now your requirement makes a little more sense - I've actually had to do this sort of stuff before, hence my wonder answer at #712839 :-) I've updated this answer with a couple of possibilities for you.Hagride
The padding will have the nice benefit of not even revealing the length of the password, thanks. Do you have any thoughts on the normalization issue?Pilloff
@Hank: yes, normalise them before comparison. All your stored passwords should be normalised in whatever DB/file you store them in (this should be done whenever they're created or updated). Normalise the user input before comparison. This will take a variable length of time based on the size of the user input but the attacker already knows that, so you're revealing nothing.Hagride
You may want to change the comparison in 1/ to match |= password[i] ^ candidate[i] (Note that this also inverts the value of match, so it really should be no_match and initialized to 0). The different branches of the if statement can stil leak timing information. (There's an extra jump if the characters match.)Phrygia
@Rich, good point, this answer was from a time when I didn't like removing info but it makes more sense to remove it so I have. As to the delay, I've also modded that a bit to introduce randomness rather than a fixed delay.Hagride
I think your point (2) is still bad advice. Adding a random delay will not defeat a timing attack. An external attacker can still estimate X from (X + random(0.9,1.1)) by statistical methods. They'll be averaging out random delays already to take account of the network delays, so this doesn't even make it harder. See codahale.com/a-lesson-in-timing-attacks for some commentary. (I've removed my previous comment, as you've obsoleted it, thanks.)Convent
Rich, the statistical methods will require a rather large number of samples to be taken, something made much harder with the fact each sample now takes a full second. In any case, there are now two independent variables that need to estimated (network delay and arbitrary random delay) which makes the required sample space MUCH larger.Hagride
L
5

I want to implement a String comparison function that doesn't take a different amount of time depending on the number of characters that match or the position of the first mismatch. I assume there must be a library out there somewhere that provides this, but I was unable to find it via a quick search.

So far, the best idea I've got is to sum the XOR of each character

Do you see the contradiction?

update:

To the updated, and therefore different question:

Can you gain information once, how much time is spent for comparing 2 Strings, in terms of constant amount and time, depending on the length of the two strings?

a + b*s(1).length + c*s(2).length + d*f(s(1), s(2))? 

Is there an upper bound of characters for String 1 and 2?

If the time is, depending on a factor for the machine, for example for the longest strings you expect 0.01ms. You measure the time to encode the string, and stay idle until you reach that time, maybe + a factor of rand(10%) of the time.

If the length of the input is not limited, you could calculate the timing in a way, that will fit for 99%, 99.9% or 99.99% of typical input, depending on your security needs, and the speed of the machine. If the program is interacting with the user, a delay up to 0.2s is normally experienced as instant reaction, so it wouldn't annoy the user, if your code sleeps for 0.19994s, while doing real calculations for 0.00006s.

Lianneliao answered 25/8, 2011 at 13:32 Comment(3)
The randomness doesn't necessarily help since the attacker can just take more samples to exclude it completely.Phrygia
@MichaelMior That's surely true, but the comparison takes some nanoseconds only and adding a random milliseconds delay may make the number of needed samples damn high.Admiral
The question asked about not leaking info about "the number of characters that match". By excluding "that match" from your bold text, you have completely changed his meaning. This answer is wrong.Convent
C
5

The following code is one common way to do a constant-time byte[] comparison in Java and avoid leaking password info via the time taken:

public static boolean isEqual(byte[] a, byte[] b) {
    if (a.length != b.length) {
        return false;
    }

    int result = 0;
    for (int i = 0; i < a.length; i++) {
      result |= a[i] ^ b[i];
    }
    return result == 0;
}

See http://codahale.com/a-lesson-in-timing-attacks/ for more discussion of this issue.

Current implementation in openjdk as an inspiration is available in isEqual method.

(This assumes that the length of the secret is not sensitive, for example if it is a hash. You should pad both sides to the same length if that is not true.)

This is essentially the same as your first suggestion:

So far, the best idea I've got is to sum the XOR of each character and return whether or not the sum is 0.

You asked:

However, I'm pretty sure this wouldn't work so well with Unicode.

This is a valid concern, but you need to clarify what you will accept as "equal" for a solution to be proposed. Luckily, you also say "This would be used to compare hashed password values", so I don't think that any of the unicode concerns will be in play.

I also have a vague concern that HotSpot would do some optimizations that would change my constant-time property,

Hopefully that's not true. I expect that the literature on how to avoid timing attacks in Java would address this if it were true, but I can't offer you any citations to back this up :-)

Convent answered 22/6, 2015 at 13:38 Comment(2)
Rich, you probably need to look at the early exit on different password lengths. Even the length of the password is information that can be leaked.Hagride
Thanks, I've updated my answer to note this. I think in most cases the secret will be a hash, so the length is not sensitive.Convent
H
3

I see two immediate possibilities for not leaking password-related information in timing:

1/ Pad both the password string and candidate string out to 1K, with a known, fixed character (like A). Then run the following (pseudo-code):

match = true
for i = 0 to 1023:
    if password[i] != candidate[i]:
        match = false

That way, you're always taking the same amount of loops to do the comparison regardless of where it matches.

There's no need to muck about with xor since you can still do a simple comparison, but without exiting the loop early.

Just set the match flag to false if a mismatch is found and keep going. Once the loop exits (taking the same time regardless of size or content of password and candidate), then check whether it matched.

2/ Just add a large (relative to the normal comparison time) but slightly random delay at the end of the comparison. For example, a random value between 0.9 and 1.1 seconds. The time taken for the comparison should be swamped by the delay and the randomness should fully mask any information leakage (unless your randomness algorithm leaks information, of course).

That also has the added advantage of preventing brute force attacks since a password check takes at least about a second.

Hagride answered 25/8, 2011 at 13:27 Comment(8)
My concern is that the strings might not be normalized, so the function might return false even if both values are español (to use a completely made up and perhaps not even valid example). Also, I'm aware that Unicode is not an encoding, but as I understand it, normalization is a problem no matter what encoding is in use.Pilloff
@Hank, now your requirement makes a little more sense - I've actually had to do this sort of stuff before, hence my wonder answer at #712839 :-) I've updated this answer with a couple of possibilities for you.Hagride
The padding will have the nice benefit of not even revealing the length of the password, thanks. Do you have any thoughts on the normalization issue?Pilloff
@Hank: yes, normalise them before comparison. All your stored passwords should be normalised in whatever DB/file you store them in (this should be done whenever they're created or updated). Normalise the user input before comparison. This will take a variable length of time based on the size of the user input but the attacker already knows that, so you're revealing nothing.Hagride
You may want to change the comparison in 1/ to match |= password[i] ^ candidate[i] (Note that this also inverts the value of match, so it really should be no_match and initialized to 0). The different branches of the if statement can stil leak timing information. (There's an extra jump if the characters match.)Phrygia
@Rich, good point, this answer was from a time when I didn't like removing info but it makes more sense to remove it so I have. As to the delay, I've also modded that a bit to introduce randomness rather than a fixed delay.Hagride
I think your point (2) is still bad advice. Adding a random delay will not defeat a timing attack. An external attacker can still estimate X from (X + random(0.9,1.1)) by statistical methods. They'll be averaging out random delays already to take account of the network delays, so this doesn't even make it harder. See codahale.com/a-lesson-in-timing-attacks for some commentary. (I've removed my previous comment, as you've obsoleted it, thanks.)Convent
Rich, the statistical methods will require a rather large number of samples to be taken, something made much harder with the fact each sample now takes a full second. In any case, there are now two independent variables that need to estimated (network delay and arbitrary random delay) which makes the required sample space MUCH larger.Hagride
A
3

This should take approximately the same time for any matching length Strings. It's constant-time with a big constant.

public static boolean areEqualConstantTime(String a, String b) {
    if ( a.length != b.length ) {
        return false;
    }

    boolean equal = true;
    for ( long i = 0; i < (Long)Integer.MAX_INT; i++ ) {
        if ( a.charAt((int)(i % aChars.length)) != b.charAt((int)(i % bChars.length))) {
            equal = false;
        }
    }
    return equal;
}

Edit

Wow, if you're just trying to avoid leaking timing information this facetious answer got pretty close to the mark! We can start with a naive approach like this:

public static boolean arePasswordsEqual(String a, String b) {
    boolean equal = true;
    if ( a.length != b.length ) {
       equal = false;
    }

    for ( int i = 0; i < MAX_PASSWORD_LENGTH; i++ ) {
        if ( a.charAt(i%a.length()) != b.charAt(i%b.length()) ) {
            equal = false;
        }
    }
    return equal;
 }

We need the MAX_PASSWORD_LENGTH constant because we can't simply use either the max or the min of the two input lengths as that would also leak timing information. An attacker could start with a very small guess and see how long the function takes. When the function time plateaus, he would know his password has the right length which eliminates much of the range of values he needs to try.

Annexation answered 25/8, 2011 at 13:50 Comment(9)
@Hank: No problem. Just as an aside, the popular cryptographically secure hashes have constant length outputs and due to the snowball principle leaking timing information probably wouldn't be as big of a deal. Are you comparing passwords in clear text?Annexation
No, I'm looking at the source to a Java impl of bcrypt. It's certainly not that big of a deal, but w/ security I always opt for safer than I think is remotely necessary. I'm thinking about patching it to use this comparison instead of the built-in Java comparison.Pilloff
@Hank: Cool. And by "snowball principle" I meant "avalanche property"Annexation
Wouldn't it be simpler to use the length of the user-controlled string instead of MAX_PASSWORD_LENGTH? Since this length is already known, nothing would be leaked.Adelladella
@Qerub: Yep, you could do that. As I said earlier, this is pretty irrelevant if you're hashing passwords to a fixed-length hash.Annexation
Note that this answer definitely still leaks timing information: you can repeatedly call arePasswordsEqual with different lengths of "test password" and it will return right away if the length of the test and the length of the real password are not the same length. So you just keep trying with various lengths until you get a call that takes longer than the others. Now you know the actual length.Hysteric
I know you specifically said that most hash algorithms have equal-length outputs, but that's an implementation detail that was out-of-scope of the original question.Hysteric
@ChristopherSchultz: Look again. I don't return right away if the lengths don't match. The result will just never be true. That said, I'm sure there would still be ways to glean information from branch prediction etc, so I'll give the standard advice here: never roll your own crypto.Annexation
@MarkPeters I must have been having a bad day. You are absolutely right. Apologies for the noise. I'm happy to delete the comments to clean things up.Hysteric
E
-1

You can do constant time string comparisons if you intern your strings. Then you can compare them with == operator resulting in a constant time equality check.

Read this for further detials

Epiphany answered 25/8, 2011 at 13:36 Comment(4)
Interning is not an O(1) operation though!Annexation
Yes thats the tradeoff. It depends on whether you have more unique strings or duplicates in your data set.Epiphany
The problem is making the comparison of a user-supplied string and a secret to a constant time operation. The secret can be interned in advance, but the time for the interning of the user-supplied string may be observed. It's unclear whether and how it depends on the secret string, but I'd call it risky.Admiral
operator == does answer question, if given strings are same instances... Intern could make deduplication and make == same result as .equals, but timing can leak from calling intern method...Bradeord
O
-1

The usual solution is to set a timer, and not return the result until the timer has expired.

The time taken to compare strings or hashes is not important, just set the time-out value sufficiently large.

Optics answered 25/8, 2011 at 18:17 Comment(1)
I don't think that's the "usual" solution, as it's very slow. Most languages and frameworks do something like "OR"ing every byte. See e.g. codahale.com/a-lesson-in-timing-attacks or php.net/manual/en/function.hash-equals.phpConvent

© 2022 - 2024 — McMap. All rights reserved.