How do I write a reliable content-independent implementation of memcmp()?
Asked Answered
A

1

6

A naive implementation of memcmp() goes something like this (from this answer):

int memcmp_test(const char *cs_in, const char *ct_in, size_t n)
{
   size_t i;  
   const unsigned char * cs = (const unsigned char*) cs_in;
   const unsigned char * ct = (const unsigned char*) ct_in;

   for (i = 0; i < n; i++, cs++, ct++)
   {
       if (*cs < *ct)
       {
          return -1;
       }
       else if (*cs > *ct)
       {
          return 1;
       }
   }
   return 0;
}

Here the blocks traversal is stopped once the first mismatching byte is found. This can be no good for cryptographic applications because it makes the execution time dependent on the blocks content and this could allow for timing attacks. So OpenSSL uses this (taken from here):

int CRYPTO_memcmp(const void *in_a, const void *in_b, size_t len)
{
    size_t i;
    const unsigned char *a = in_a;
    const unsigned char *b = in_b;
    unsigned char x = 0;

    for (i = 0; i < len; i++)
         x |= a[i] ^ b[i];

    return x;
}

There're no breaks or returns in the middle, so this code will have to traverse the entire length of blocks. At least this is the intent.

Now here's one usage example (from here):

 static int des_ede3_unwrap(EVP_CIPHER_CTX *ctx,
     unsigned char *out, const unsigned char *in, size_t inl)
 {
      unsigned char icv[8], iv[8], sha1tmp[SHA_DIGEST_LENGTH];
      //whatever, unrelated then...
      if (!CRYPTO_memcmp(sha1tmp, icv, 8))
         rv = inl - 16;
      //whatever, unrelated
 }

Now with link-time code generation (Visual C++ LTCG) or link-time optimization (gcc LTO) the compiler is able to see both CRYPTO_memcmp() implementation and the invocation site (even if they are in different translation units). It can see that the invocation site does not use the actual value, it just compares it to null. So it is free to transform CRYPTO_memcmp() such that it return immediately once it find the first mismatching pair of bytes and the "secure" version of memcmp() is no longer secure.

How can memcmp() be implemented such that the a standard compliant compiler will not transform it into version that helps timing attacks?

Adlib answered 8/4, 2014 at 11:21 Comment(15)
By telling the compiler not to optimize. Either by using flags or #pragma. It's obviously compiler dependent.Ster
@stefan: This can make the code damn slow.Adlib
Well what do you want? Optimized code or safe code? With pragmas, used only rarely, this won't get too slow. There are also gcc attributes that prohibit inlining. You really have to ask your compiler not to do so or just leave out LTO (I never had any significant speed-up anyway. Only Compile time went up). As always: measure & profile.Ster
@stefan: This is all not really reliable - code will be reused elsewhere, where LTO or something will be on. A much better way would be to have some standard-compliant code that just traverses the entire blocks.Adlib
The standard explicitly wants optimizations to happen and encourages it. Any optimization which does not affect the observable outcome is OK, basically. You have to rely on the compiler not optimizing. You can do so on the flag level or the compiler specific macros / attributes level. End of story.Ster
@stefan: Well, maybe a volatile counter would help?Adlib
This falls under the category of "making it hard for the compiler to optimize", but it isn't in the category of "the compiler doesn't optimize". Maybe your compiler can't optimize with a volatile somewhere. That doesn't mean that it is impossible and a future version / different vendor might do it. Work with the compiler (using the tools [pragma...] it provides), not against it. The compiler is your friend who rewrites your code in a more efficient way. If you insist on your code being the way you want it, you have to tell him.Ster
Note, that the caller code is also subject to timing attack due to if checks.Bombast
@Adlib 1) Are you looking for an int memcmp() like function that returns <0, 0 >0 on less then, equal or greater or something different like bool memeq() that returns true/false on match/mis-match? 2) Must data be const or can it be manipulated and returned to the same state on function exit?Perimeter
@chux: Let's pretend it should be const and the result should be <0, 0, or >0 as with usual memcmp().Adlib
In CRYPTO_memcmp(), could not the variable unsigned char x be made volatile unsigned char x to prevent the complier from transforming CRYPTO_memcmp()?Perimeter
@chux: I guess this could solve the problem.Adlib
I'm curious what the solution was, can you post an answer telling us what you did? (and accept it?)Deidradeidre
@MikeOunsworth: There're two possible solutions. The first one absolutely portable and up to the Standard - declare x volatile - but that makes code about two times slower. The second one is to cast the pointers into volatile unsigned char* and access through them which is just as fast but is not fully Standard compliant (see this https://mcmap.net/q/477518/-is-it-possible-to-guarantee-code-doing-memory-writes-is-not-optimized-away-in-c/57428).Adlib
See also #25374267, which is essentially the same question.Gurgitation
A
4

There're two possible solutions.

The first one is absolutely portable and up to the Standard - declare x volatile which basically tells the compiler that it must preserve the sequence of updating x and so it must read both data arrays fully. It doesn't prevent the compiler from making those reads in larger portions (such as read several bytes at a time and then use them in the right sequence), but it's no big deal here - the compiler will have to emit a number of reads proportional to the number of bytes in the data arrays. The problem with this approach is that it can make this code slower - some benchmarks I ran show about 50 percent slowdown on a specific processor and specific toolset. YMMV.

The second possible solution is to cast the pointers into volatile unsigned char* and access through them

const volatile unsigned char * cs = (const volatile unsigned char*) cs_in;
const volatile unsigned char * ct = (const volatile unsigned char*) ct_in;
// the rest of the code is the same

which is just as fast but is not fully Standard compliant (see this). Many compiler treat such cast as a hint that these reads should not be altered but the Standard doesn't really make any guarantees for that and so it is possible that a compiler breaks this code on purpose. Therefore this solution is not portable.

Adlib answered 13/11, 2014 at 13:11 Comment(2)
If the function were to check whether x is greater than an exported variable whose value always happens to be zero, rather than checking whether x is non-zero, the only way a compiler could generate efficient early-exit code would be to have test the value of that exported variable before starting the loop, and then run one loop (with an early-exit feature) if that exported variable happens to be zero and another if it doesn't. If someplace in the code sets the value of that exported variable based on some some volatile that's also always zero, a compiler would...Jeanene
...have no realistic basis for expecting such an "optimization" to be helpful, and the cost would be one volatile load and ordinary store, sometime during program execution, plus one non-qualified load and compare for each complete call of the memory-compare function. BTW, I'd call the function something like mem_notequal to make clear that the return value does not follow the memcmp pattern.Jeanene

© 2022 - 2024 — McMap. All rights reserved.