Basically, the expression 0.4 * a
is consistently, and surprisingly, significantly faster than a * 0.4
. a
being an integer. And I have no idea why.
I speculated that it is a case of a LOAD_CONST LOAD_FAST
bytecode pair being "more specialized" than the LOAD_FAST LOAD_CONST
and I would be entirely satisfied with this explanation, except that this quirk seems to apply only to multiplications where types of multiplied variables differ. (By the way, I can no longer find the link to this "bytecode instruction pair popularity ranking" I once found on github, does anyone have a link?)
Anyway, here are the micro benchmarks:
$ python3.10 -m pyperf timeit -s"a = 9" "a * 0.4"
Mean +- std dev: 34.2 ns +- 0.2 ns
$ python3.10 -m pyperf timeit -s"a = 9" "0.4 * a"
Mean +- std dev: 30.8 ns +- 0.1 ns
$ python3.10 -m pyperf timeit -s"a = 0.4" "a * 9"
Mean +- std dev: 30.3 ns +- 0.3 ns
$ python3.10 -m pyperf timeit -s"a = 0.4" "9 * a"
Mean +- std dev: 33.6 ns +- 0.3 ns
As you can see - in the runs where the float comes first (2nd and 3rd) - it is faster.
So my question is where does this behavior come from? I'm 90% sure that it is an implementation detail of CPython, but I'm not that familiar with low level instructions to state that for sure.
float.__add__
immediately converts the integer to afloat
, where asint.__add__
raisesNotImplemented
, forcingfloat.__radd__
to be called. – Lithium