Having two int matrices A and B, with more than 1000 rows and 10K columns, I often need to convert them to float matrices to gain speedup (4x or more).
I'm wondering why is this the case? I realize that there is a lot of optimization and vectorizations such as AVX, etc going on with float matrix multiplication. But yet, there are instructions such AVX2, for integers (if I'm not mistaken). And, can't one make use of SSE and AVX for integers?
Why isn't there a heuristic underneath matrix algebra libraries such as Numpy or Eigen to capture this and perform integer matrix multiplication faster just like float?
About accepted answer: While @sascha's answer is very informative and relevant, @chatz's answer is the actual reason why the int by int multiplication is slow irrespective of whether BLAS integer matrix operations exist.