Does anyone have (or can easily write) an optimal inline assembly function for the ARM Cortex M0+ processor in Thumb mode to multiply two 32-bit numbers and return a 64-bit number?
As the M0+ does not have long multiply, the only way this can be accomplished is through primitive multiplication, for which the compiler calls __aeabi_lmul
which performs 64x64=64 multiplication in 34 instructions. I'm hoping a significantly faster algorithm exists, given that the inputs are only 32 bits.