How do I implement a string comparison in Java that takes the same amount of time no matter whether they match or where a mismatch (if any) occurs?

P

6

12

I want to implement a String comparison function that doesn't take a different amount of time depending on the number of characters that match or the position of the first mismatch. I assume there must be a library out there somewhere that provides this, but I was unable to find it via a quick search.

So far, the best idea I've got is to sum the XOR of each character and return whether or not the sum is 0. However, I'm pretty sure this wouldn't work so well with Unicode. I also have a vague concern that HotSpot would do some optimizations that would change my constant-time property, but I can't think of a specific optimization that would do this off the top of my head.

Thanks.

UPDATE: Sorry, I don't believe I was clear. I'm not looking for O(1), I'm looking for something that won't leak timing information. This would be used to compare hashed password values, and if the time it took to compare was different based on where the first mismatch occurred, that would be leaking information to an attacker.

Pilloff answered 25/8, 2011 at 13:22 Comment(6)

The time it takes to compare 2 strings would always be a function of the length of the string. You have to do some sort of comparison one way or another - and the more characters you have, the longer this comparison would take. – Hollis 25/8, 2011 at 13:25

How would the complexity of XORing each character not depend on the number of characters? – Luger 25/8, 2011 at 13:26

Am I correct in assuming you're asking for a String comparison function (returning true of false for equal or non-equal) that is entirely independent from String length? I don't actually think that's theoretically possible unless you have a number of processors/cores equal to the maximum String length. The best I can think of is optimizing strongly through comparing hash codes and length first, but even the hash code calculation depends on String length. – Dulcle 25/8, 2011 at 13:27

you might be able to use the hash field, it might help to determine 2 strings are not equal, if the hash was already computed, but I think that's about it. – Tollman 25/8, 2011 at 13:28

Can you tell us why you want to do this? Given the answers telling you that this is either impossible or impractical, knowing what you are trying to achieve would help us give you a good answer. – Lilias 25/8, 2011 at 13:43

Sorry for the original wording, which didn't express my intent. I've attempted to clarify. – Pilloff 25/8, 2011 at 14:10

H

3

I see two immediate possibilities for not leaking password-related information in timing:

1/ Pad both the password string and candidate string out to 1K, with a known, fixed character (like A). Then run the following (pseudo-code):

match = true
for i = 0 to 1023:
    if password[i] != candidate[i]:
        match = false

That way, you're always taking the same amount of loops to do the comparison regardless of where it matches.

There's no need to muck about with xor since you can still do a simple comparison, but without exiting the loop early.

Just set the match flag to false if a mismatch is found and keep going. Once the loop exits (taking the same time regardless of size or content of password and candidate), then check whether it matched.

2/ Just add a large (relative to the normal comparison time) but slightly random delay at the end of the comparison. For example, a random value between 0.9 and 1.1 seconds. The time taken for the comparison should be swamped by the delay and the randomness should fully mask any information leakage (unless your randomness algorithm leaks information, of course).

That also has the added advantage of preventing brute force attacks since a password check takes at least about a second.

Hagride answered 25/8, 2011 at 13:27 Comment(8)

My concern is that the strings might not be normalized, so the function might return false even if both values are español (to use a completely made up and perhaps not even valid example). Also, I'm aware that Unicode is not an encoding, but as I understand it, normalization is a problem no matter what encoding is in use. – Pilloff 25/8, 2011 at 14:4

@Hank, now your requirement makes a little more sense - I've actually had to do this sort of stuff before, hence my wonder answer at #712839 :-) I've updated this answer with a couple of possibilities for you. – Hagride 25/8, 2011 at 14:15

The padding will have the nice benefit of not even revealing the length of the password, thanks. Do you have any thoughts on the normalization issue? – Pilloff 25/8, 2011 at 14:41

@Hank: yes, normalise them before comparison. All your stored passwords should be normalised in whatever DB/file you store them in (this should be done whenever they're created or updated). Normalise the user input before comparison. This will take a variable length of time based on the size of the user input but the attacker already knows that, so you're revealing nothing. – Hagride 25/8, 2011 at 14:43

You may want to change the comparison in 1/ to match |= password[i] ^ candidate[i] (Note that this also inverts the value of match, so it really should be no_match and initialized to 0). The different branches of the if statement can stil leak timing information. (There's an extra jump if the characters match.) – Phrygia 12/9, 2012 at 18:21

@Rich, good point, this answer was from a time when I didn't like removing info but it makes more sense to remove it so I have. As to the delay, I've also modded that a bit to introduce randomness rather than a fixed delay. – Hagride 23/6, 2015 at 7:8

I think your point (2) is still bad advice. Adding a random delay will not defeat a timing attack. An external attacker can still estimate X from (X + random(0.9,1.1)) by statistical methods. They'll be averaging out random delays already to take account of the network delays, so this doesn't even make it harder. See codahale.com/a-lesson-in-timing-attacks for some commentary. (I've removed my previous comment, as you've obsoleted it, thanks.) – Convent 23/6, 2015 at 9:21

Rich, the statistical methods will require a rather large number of samples to be taken, something made much harder with the fact each sample now takes a full second. In any case, there are now two independent variables that need to estimated (network delay and arbitrary random delay) which makes the required sample space MUCH larger. – Hagride 23/6, 2015 at 14:28

L

5

I want to implement a String comparison function that doesn't take a different amount of time depending on the number of characters that match or the position of the first mismatch. I assume there must be a library out there somewhere that provides this, but I was unable to find it via a quick search.

So far, the best idea I've got is to sum the XOR of each character

Do you see the contradiction?

update:

To the updated, and therefore different question:

Can you gain information once, how much time is spent for comparing 2 Strings, in terms of constant amount and time, depending on the length of the two strings?

a + b*s(1).length + c*s(2).length + d*f(s(1), s(2))?

Is there an upper bound of characters for String 1 and 2?

If the time is, depending on a factor for the machine, for example for the longest strings you expect 0.01ms. You measure the time to encode the string, and stay idle until you reach that time, maybe + a factor of rand(10%) of the time.

If the length of the input is not limited, you could calculate the timing in a way, that will fit for 99%, 99.9% or 99.99% of typical input, depending on your security needs, and the speed of the machine. If the program is interacting with the user, a delay up to 0.2s is normally experienced as instant reaction, so it wouldn't annoy the user, if your code sleeps for 0.19994s, while doing real calculations for 0.00006s.

Lianneliao answered 25/8, 2011 at 13:32 Comment(3)

The randomness doesn't necessarily help since the attacker can just take more samples to exclude it completely. – Phrygia 12/9, 2012 at 18:23

@MichaelMior That's surely true, but the comparison takes some nanoseconds only and adding a random milliseconds delay may make the number of needed samples damn high. – Admiral 13/4, 2015 at 12:34

The question asked about not leaking info about "the number of characters that match". By excluding "that match" from your bold text, you have completely changed his meaning. This answer is wrong. – Convent 22/6, 2015 at 13:31

C

5

The following code is one common way to do a constant-time byte[] comparison in Java and avoid leaking password info via the time taken:

public static boolean isEqual(byte[] a, byte[] b) {
    if (a.length != b.length) {
        return false;
    }

    int result = 0;
    for (int i = 0; i < a.length; i++) {
      result |= a[i] ^ b[i];
    }
    return result == 0;
}

See http://codahale.com/a-lesson-in-timing-attacks/ for more discussion of this issue.

Current implementation in openjdk as an inspiration is available in isEqual method.

(This assumes that the length of the secret is not sensitive, for example if it is a hash. You should pad both sides to the same length if that is not true.)

This is essentially the same as your first suggestion:

So far, the best idea I've got is to sum the XOR of each character and return whether or not the sum is 0.

You asked:

However, I'm pretty sure this wouldn't work so well with Unicode.

This is a valid concern, but you need to clarify what you will accept as "equal" for a solution to be proposed. Luckily, you also say "This would be used to compare hashed password values", so I don't think that any of the unicode concerns will be in play.

I also have a vague concern that HotSpot would do some optimizations that would change my constant-time property,

Hopefully that's not true. I expect that the literature on how to avoid timing attacks in Java would address this if it were true, but I can't offer you any citations to back this up :-)

Convent answered 22/6, 2015 at 13:38 Comment(2)

Rich, you probably need to look at the early exit on different password lengths. Even the length of the password is information that can be leaked. – Hagride 23/6, 2015 at 7:36

Thanks, I've updated my answer to note this. I think in most cases the secret will be a hash, so the length is not sensitive. – Convent 23/6, 2015 at 9:24

H

3

I see two immediate possibilities for not leaking password-related information in timing:

1/ Pad both the password string and candidate string out to 1K, with a known, fixed character (like A). Then run the following (pseudo-code):

match = true
for i = 0 to 1023:
    if password[i] != candidate[i]:
        match = false