What is integer division heavily used for?
Asked Answered
C

0

10

An analysis on https://ridiculousfish.com/blog/posts/benchmarking-libdivide-m1-avx512.html finds that the new Apple CPU has spent a lot of resources making integer division massively faster.

This is a surprising thing to do. In my experience, integer division is not really used, except in cases of dividing by a compile time constant, which can be replaced with a shift or multiply.

It was even more surprising in the discussion on https://news.ycombinator.com/item?id=27133804 where someone said

When I've been micro-optimizing performance-critical code, integer division shows up as a hot spot regularly.

Now I'm really curious: just what are people doing, that makes integer division a bottleneck? I'm trying to think where it could be used. Cases I've seen:

  • Floating-point emulation. But these days the only CPUs without hardware floating-point are tiny microcontrollers that also wouldn't have hardware integer division anyway.

  • Hash tables with the number of buckets a prime number for a little bit of extra randomness. But it's long been known that this is not the best way to do things; if you don't trust your hash function to provide enough randomness, get a better hash function.

  • Early 3D like the PlayStation 1 using fixed point coordinates. But everyone nowadays does floating-point 3D.

So what exactly is all this integer division being used for?

Colyer answered 13/5, 2021 at 23:14 Comment(10)
But it's long been known that this is not the best way to do things - In general, that's not the same as "code in the wild doesn't contain this". IDK, it's not something I've that's come up in stuff I've looked at optimizing, but certainly I see code on SO that uses / with a variable when it could have avoided it. And since it's "better" to make everything a variable according to some people, it's not always visible as a compile-time constant.Annabelleannabergite
Integer division doesn't have to be used a ton for it to be a bottleneck. A DIV instruction can take many more clock cycles (10x - 20x) as the same MUL instruction, so you only need to use it a little bit for it to have an impact.Briefing
Re "Floating-point emulation": Keep in mind that one of the major features of Apple's M1 machines is x86 emulation, which includes floating-point: and x87 emulation can't be done in hardware, as ARM64 has no native support for 80-bit long doubles. Also, the long double in ARM64 C is 128-bit quad precision, for which there's also no hardware support AFAIK, so that needs to be done in software too.Neslund
I've spend decades programming professionally, and rarely use anything except integers. What makes you think integer division is not much used? I'd agree that addition and subtraction feature much more heavily, but it's still a relatively useful operation.Skillless
@NateEldredge: As another data point, Intel further upgraded integer division in Ice Lake, no longer sharing an execution unit with the FP div/sqrt unit, and lowering the latency and number of front-end uops (from 10 to 4 for 32-bit). uops.info (No throughput change for 32-bit, but dramatically speeding up 64-bit operand-size, so I think the HW is wider now.) So anyway, apparently there are reasons to spend HW on integer division outside of things specific to M1 / AArch64.Annabelleannabergite
Not sure about everyone else; but I use a huge number of integer divisions in code to generate prime numbers for RSA encryption keys (e.g. using hundreds of division instructions to find the modulo of an "up to 8192-bit big integer" divided by a "product of up to 3 primes that fit in a single integer"; and repeating that several hundred times using values/divisors from a table). Even though division is slow; this "thousands of divisions per prime number candidate" approach is faster than not doing it and letting more non-prime candidates reach the even more expensive primality tests.Uncritical
The increased attention to divisions may be the result of a plateau in the improvement of microprocessor tech. When you cannot increase the clock and the other micro-architectural (e.g. superscalar widths) or architectural (e.g. caches) factors are within their maximum ROI you start optimizing less frequent use cases. That said, divisions occur heavily in statistical, simulation, and physics SW. This also includes algorithms that share their strategy with a physical process (e.g. scheduling) but, of course, with less impact. ...Irrelievable
... People tend to optimize data-intensive processes (e.g. compression, codec, statistics) and data processing usually means equations/formulas and similar math with its good share of integer divisions.Irrelievable
One aspect that may not be immediately obvious is that often in high-performance implementations of algorithms or data structures implementers tend to choose power-of-2 sizes for their arrays or bucket sizes explicitly because arbitrary integer division in hardware is very slow, but division by a power-of-2 is very fast. The downside is that this leads to higher memory usage (because you have to round up to the next power of 2). If arbitrary integer division becomes much faster, the need for using power-of-2 sizes decreases, and this is beneficial because it can lead to better CPU cache usage.Viand
Here is your answer: cryptography and memory managment forum.dlang.org/post/[email protected] would be interesting to trace how much is spend in divq, idiv 32 and 64 bit. Also, the problem is all compilers remove division as much as possible.Pharmacopsychosis

© 2022 - 2024 — McMap. All rights reserved.