I'm writing a very computationally intense procedure for a mobile device and I'm limited to 32-bit CPUs. In essence, I'm performing dot products of huge sets of data (>12k signed 16-bit integers). Floating point operations are just too slow, so I've been looking for a way to perform the same computation with integer types. I stumbled upon something called Block Floating Point arithmetic (page 17 in the linked paper). It does a pretty good job, but now I'm faced with a problem of 32 bits just not being enough to store the output of my calculation with enough precision.
Just to clarify, the reason it's not enough precision is that I would have to drastically reduce precision of each of my arrays' elements to get a number fitting into a 32-bit integer in the end. It's the summation of ~16000 things that makes my result so huge.
Is there a way (I'd love a reference to an article or a tutorial) to use two 32-bit integers as most significant word and least significant word and define arithmetic on them (+, -, *, /) to process data efficiently? Also, are there perhaps better ways of doing such things? Is there a problem with this approach? I'm fairly flexible on programming language I use. I would prefer C/C++ but java works as well. I'm sure someone has done this before.
long
is typically 32 bits on a 32 bit system (although this isn't required) whilelong long
if available is the 64 bit integral type. – Exsiccate