I am currenty converting the nacl library to risc-v. I already have poly1305 working. I am trying to do this using the risc-v core instruction set, so I don't have a multiplier. The algorithm for Pol1305 is using at the moment ceil(m/16)*17*17 8-bit multiplications where m is the message length in bytes(multiplying two 2^130 integers in base 2^8 modulo 2^130-5). So I want to use a fast multiplication algorithm to keep it fast.
At the moment I have the shift-and-add algorithm working for the multiplication. However that takes 63 cycles for 8-bit values, since I need to avoid branching (timing side-channel) and thus some masking is involved which takes a few more cycles.
andi t2, t0, 1 //t0 is the multiplier
sub t2, zero, t2 //creating a mask
and t3, t1, t2 //applying the mask to the multiplicand
add a0, a0, t3 //doing the add
srli t0, t0, 1 //shifting the multiplier
slli t1, t1, 1 //shifting the multiplicand
This is giving me valid results at 63 cycles per multiplication. The issue is that the total execution time of the program is 175219 cycles for a message of 131 bytes. Of this time 9*17*17*63 = 163863 cycles are used for multiplication. Which I would like to improve.
add same,same
– Periwigy = array[x]
" where an attacker can determine which cache line you accessed (and infer the value ofx
) from measuring the time differences between "cache hit or miss" of other (possibly only related by cache aliasing) accesses. – Nonobjective