GCM Multiplication Implementation
Asked Answered
J

6

5

I am puting up a C code for the Multiplication of block (Alogrithm 1) in the GCM SP-800-38D document here. Page 11-12.

Having completed the code, I want to see if there are any way I can test the code. You can find attached below the code I have put up. Note that instead of the 128 bit block, I used a 24 bit block just for testing purposed. I will appreciate any suggestions where necessary.

void BLK_MUL (u8 *val_1,u8 *val_2, u8 *out_val)
{ 

    u8 xdata R_val = 0xE1;     
    u8 xdata Z_val[3],V_val[3];
    u8 mask_b   = 0x80;        
    u16 i;  u8 j;           
    bit rnd;                

    for(j=0;j<3;j++,++val_2)    
    {
        Z_val[j]=0x00;      
        V_val[j]=*val_2;    
    }       


    for(i=0;i<24;i++)
    { 
        if (*val_1 & mask_b)
        {
            for(j=0;j<3;j++)    
                Z_val[j]^=V_val[j]; 
        }
        if (!(V_val[2] & 0x01))
        {//if LSB of V_val is 0
            for(j=0;j<3;j++)
            { //V_val = rightshift(V_val)
                if (j!=0)
                    if (V_val[2-j] & 0x01)
                        V_val[3-j] |= 0x80;
                V_val[2-j]>>=1;          
            }
        }
        else
        {//if LSB of V_val is 1
            for(j=0;j<3;j++)
            {//V_val = rightshift(V_val)
                if (j!=0)
                    if (V_val[2-j] & 0x01)
                        V_val[3-j] |= 0x80;
                V_val[2-j]>>=1;          
            }
            V_val[0]^=R_val; //V_val = rightshift(V_val) ^ R                
        }
        if(mask_b & 0x01)   { val_1++; rnd=1;}  
        mask_b >>= 1;                       
        if (rnd)    { mask_b=0x80; rnd=0; } 
    }

    STR_CPY(out_val,Z_val,3);   

    return ;        
}


void main()
{

    code unsigned char val_1[3] ={  0x2b,0x7e,0x15  };
    code unsigned char val_2[3] ={  0x39,0x25,0x84  };
    unsigned char out[3];

    BLK_MUL (val_1,val_2,out);

    return;
}
Jolie answered 18/5, 2012 at 15:0 Comment(4)
In the if (!(V_val[2] & 0x01)) clause and in its else clause there are identical for-loops. Is there a good reason for this?Scruple
@WilliamMorris I added some comments in the code above (pls check). I just followed the code steps plainly. Any better option pls? :)Jolie
Yes but the duplication is still redundant. I added some code review comments as an answer belowScruple
Does this code compute GF(2^{128}) multiplication?Lactiferous
R
9

At some point you will of course have to check your code against test vectors. But there are quite a few tests that you can perform without having to know or compute any test vectors at all.

First the multiplication in GF(2^128) is commutative. Hence you can just compute BLK_MUL(val_1, val_2, out1) and BLK_MUL(val_2, val_1, out2) with any input and you should get the same result. Since your code uses val_1 and val_2 differently this is already quite a good test.

Then you can use that multiplication is distributive, I.e. you can test that (x+y)*z = (x*z)+(y*z), (where the addition in GF(2^128) is computed by xoring corresponding bytes of the two values together).

Finally, once you have implemented the whole field GF(2^128) you can also exploit that its order is 2^128-1. I.e. if you start with a value x then square it 128 times, then you should get x back.


A few additional comments:

The advantage of using equations for testing (over only using test vectors) is that you can easily run a large number of tests. Because it is rather easy to add tests this way I frequently do some simple tests with sparse inputs (e.g. just single bits set in the input) first. If something is wrong then this helps to identify the bugs fast.

Your current code uses temporary variables for the result. This is indeed a good idea, since it ensures copy safety. I think a good unit test should also cover this case. I.e. you might want to compute the same result twice: once with input and output pointing to different memory location, and once with the output being the same memory as the input.

Furthermore, at least one of the other answers talks about optimizations. I think if you refactor the code then you should look for meaningful components to reuse, rather than blindly looking for look-a-like code snippets. Since GF(2^128) is a field, of course addition and multiplication in the field are meaningful components. Another meaningful component is the multiplication by the polynomial x (which is something that is quite frequently used in crypto).

Rosalindrosalinda answered 18/5, 2012 at 15:32 Comment(1)
Are there any test vectors that I can use as well?Jolie
D
5

Test vectors for GCM mode can be found here and there's a bunch of NIST test vectors here.

Dira answered 18/5, 2012 at 22:56 Comment(0)
E
4

A test example for GF(2^128) can be found at: PCLMULQDQ CPU instruction, page 78

Evacuation answered 9/4, 2013 at 13:9 Comment(0)
J
3

Thanks @jack and @emboss. Following suggestions by jack, i performed the tests on the function, and it proved correct. Hope this is useful to someone else, but still will appreciate any suggestion and correction. :) See MAIN code below:

void main()
{

u8 j;
u8 val_1[3] ={  0x2b,0x7e,0x15  };
u8 val_2[3] ={  0x39,0x25,0x84  };
u8 val_3[3] ={  0x23,0x71,0x25  };
u8 val_3_2[3]={ 0x23,0x71,0x25  };
u8 val_4[3] ={  0x33,0x35,0x44  };
u8 val_5[3] ={  0x2e,0x77,0x11  };

u8 out[3];
u8 out_1[3];
u8 out_2[3];
u8 out_3[3];
u8 out_4[3];
u8 out_5[3];

     //PROOF X*Y = Y*X
BLK_MUL (val_1,val_2,out);      //X*Y
BLK_MUL (val_2,val_1,out_1);    //Y*X
            //N.B: out == out_1
     //PROOF (X+Y)*Z = (X*Z)+(Y*Z) 
for(j=0;j<3;j++)    
    val_3[j]^=val_4[j];         //X = X+Y
BLK_MUL (val_3,val_5,out_2);    //(X+Y)*Z
BLK_MUL (val_5,val_3,out_3);    //Z*(X+Y)
            //N.B: out_2 == out_3

BLK_MUL (val_3_2,val_5,out_4);      //X*Z
BLK_MUL (val_4,val_5,out_5);        //Y*Z
for(j=0;j<3;j++)    
    out_4[j]^=out_5[j];         //(X*Z)+(Y*Z)
            //N.B: out_3=out_2=out_4
return;
}
Jolie answered 19/5, 2012 at 3:12 Comment(0)
S
2

I won't comment on whether it works or how to test it, but I can give you some coding hints:

  • function names are usually lower case
  • the input arrays should be const...
  • or if you don't mind destroying the 2nd input array then V_val is redundant.
  • your variables should be renamed. Call the inputs x and y and the output z; R_val should be simply a constant, R, mask_b simply mask, v_val simply v. Make these changes and your code will suddenly look readable.
  • take out the comments you added, they are redundant
  • type u8 should be uint8_t (from stdint.h) or simply unsigned char.
  • Z_val is redundant; just use output parameter z (out_val)
  • I don't know what 'xdata' means
  • i and j should be int
  • rnd is redundant
  • 0x01 is the same as 1 - the latter being more readable.

  • the if (!(V_val[2] & 0x01)) clauses can be simplified to:

    v[2]>>=1;  /* EDIT --- missed this out before */
    for (j=1; j<3; j++)
    {
        if (v[2-j] & 1)
            v[3-j] |= 0x80;
        v[2-j] >>= 1;
    }
    if (v[2] & 1)
        v[0] ^= R;
    
  • this stuff:

    if(mask_b & 0x01)   { val_1++; rnd=1;}
    mask_b >>= 1;
    if (rnd)    { mask_b=0x80; rnd=0; }
    

    can be simplified to (after renaming)

    if ((mask >>= 1) == 0)
    {
        x++;
        mask = 0x80;
    }
    
    • the STR_CPY call is redundant as you output directly to z. Also it looks by its name like a string copy, which is definitely not what would be needed.

EDIT2 (1st edit was adding missing line before for-loop above)

You have two copies of the same loop, one in the if (!(v[2] & 1)) clause and one in the else clause. The latter (else clause) also has a v[0] ^= R;. One copy can be omitted. Also the loop treats j==0 specially, which is why I extracted it from the loop (but omitted it in the post). the loops is then over just 2 items, j==1 and 2.

You could and probably should unwind the remaining loop to get:

    v[2] >>= 1;

    if (v[1] & 1) v[2] |= 0x80;
    v[1] >>= 1;

    if (v[0] & 1) v[1] |= 0x80;
    v[0] >>= 1;

    if (v[2] & 1) v[0] ^= R;
Scruple answered 19/5, 2012 at 16:20 Comment(2)
Thanks Williams, for the if (!(V_val[2] & 0x01)) clause you wrote above, It will not work because you first have to mask the LSB of V before you right-shift once. Also what happens the j=0 because you only begin with j=1.Jolie
Yes, I missed out the j==0 bit - now added before the loop. Also unwound the loops, which arguably is better.Scruple
I
2

You also have a math consideration trying to use GF(2^128) code for GF(2^24)

The constant "R_val=0xE1" is specific to GF(2^128) and reflects the lower order indicies of the irreducible polynomial

    x^128 + x^7 + x^2 + x^1 + 1 

which is the generator for the popular isomorphism of GF(2^128) often used cryptographically.

This is equivalent to the bit vector (1 || 120"0" || 10000111). (Where each bit reflects a power of "x" appearing in the polynomial.)

The low order bits (the high order bit 128 is consumed balancing the bit that overflows) are written x87 in big-endian fashion or xE1 when expressed in little-endian fashion.

However for GF(2^24), it seems unlikely, a priori, that

    x^24 + x^7 + x^2 + x^1 + 1 

is irreducible (but I have not tested it).
So what is known to be a good, valid irreducible polynomial for use in G(2^24)?
According to the table http://www.hpl.hp.com/techreports/98/HPL-98-135.pdf the lowest / simplest irreducible polynomial for GF(2^24) seems to be

    x^24 + x^4 + x^3 + x^1 + 1

which corresponds to the bit vector (1 || 16 "0" || 00011011).

Therefore, again because the high order bit is consumed balancing the overflow, the proper correction factor, your R_val, written in litle-endian fashion would be R_val = 0xD8

Hope you find this useful.

Induplicate answered 17/9, 2013 at 6:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.