How can I strength reduce division by 2^n + 1?
Asked Answered
U

2

17

I need to perform some integer divisions in the hot path of my code. I've already determined via profiling and cycle counting that the integer divisions are costing me. I'm hoping there's something I can do to strength reduce the divisions into something cheaper.

In this path, I am dividing by 2^n+1, where n is variable. Essentially I want to optimize this function to remove the division operator:

unsigned long compute(unsigned long a, unsigned int n)
{
    return a / ((1 << n) + 1);
}

If I were dividing by 2^n, I would just replace the div with a shift-right by n. If I were dividing by a constant, I would let the compiler strength reduce that specific division, likely turning it into a multiply and some shifts.

Is there a similar optimization that applies to 2^n+1?

Edit: a here can be an arbitrary 64-bit integer. n takes only a few values between 10 and, say, 25. I can certainly precompute some values for each n, but not for a.

Unfruitful answered 25/10, 2010 at 17:32 Comment(4)
Are there any constraints on the values of a and n?Vyner
What's the context in which you're calling the function?Lonee
@JR, I can't see the shift being the issue - it's one instruction. It's the divusion that will take the timeVyner
If a is a 64 bit integer you should declare it as such. unsigned long is only guaranteed to have 32 bit. Use uint64_t or uint_least64_t for it. For the 1 << n if your n might ever go to 31 you may have undefined behavior. Use UINT64_C(1) << n instead.Indefectible
A
13

Since you can only shift an int so many places, you can put all those cases into a choice of one of several divisions by a constant:

unsigned long compute(unsigned long a, unsigned int n)
{
    // assuming a 32-bit architecture (making this work for 64-bits 
    // is left as an exercise for the reader):
    switch (n) {
        case  0: return a / ((1 << 0) + 1);
        case  1: return a / ((1 << 1) + 1);
        case  2: return a / ((1 << 2) + 1);

            // cases 3 through 30...

        case 31: return a / ((1 << 31) + 1);
    }
}

So now each division is by a constant, which compilers will typically reduce to a series of multiply/shift/add instructions (as the question mentioned). See Does a c/c++ compiler optimize constant divisions by power-of-two value into shifts? for deatils.

Argolis answered 25/10, 2010 at 17:39 Comment(15)
Interesting idea. Perhaps I can write this code, then extract the strength reduction parameters into a table.Unfruitful
Worth a try, but you're trading off an unpredictable jump against the difference between shift+division instructions and the equivalent strength-reduced division. Any ideas when that's a good trade-off?Noelyn
+1,makes sense; although you would probably want to benchmark this to confirm that the indirection and/or conditional branches required to implement the switch() are actually faster than the integer divide...Stork
You can even put a default to fall back on general cases.Lonee
@J.S.: it doesn't matter what a can be - since n is limited to less than the number of bits in an unsigned int you have a limited number of divisors.Argolis
I agree that branching might make this not speed things up, so a benchmark comparison is in order. If that doesn't work out, there's another possible solution I'll try to post later (unfortunately, today's moving day...).Argolis
@Paul: I've added a reference to division by constants can be optimized by the compiler.Argolis
@MB, the hackersdelight pdf is enligtening. Those techniques should work nicely here...Vyner
You are doing the << on int. This is bad, since on a platform with 32 bit int this will lead to undefined behavior. Use 1UL << something instead of 1 << something.Indefectible
@JG, we're told n is 10 to 25 or so, so not an issue here.Vyner
Also, a little help would be writing (1 << n) + 1 as (1ul << n) | 1ul since all the values of the expression end with 1Dereism
@Jens: The << operation was in the original specification - it's not undefined unless the n value is greater than or equal to the width of the promoted left operand. Since it's in the original function, I assumed that the value of n would be appropriate.Argolis
@Ed.C: I'm not sure what help that would be.Argolis
Probably nothing, but it's a rule of thumb to use binary operators whenever possible instead of arithmetic operators.Dereism
This switch is almost surely a lot slower than the division on modern hardware. Instead you should make a table of the multiply-shift-add operations the compiler generates for each and then use n as an index into this table and apply them yourself.Tympanic
S
9

You can replace integer division by a constant, by multiplication (modulo wordsize) with a magic number and a shift.

The magic numbers can be pre-calculated for known constants.

Since n can't take many values e.g. 0..31 it is "easy" to pre-calculate these magic numbers for all n and store it in a table with 32 elements.

Javascript Page for calculating the magic numbers

A good compiler can compute the magic numbers and replace integer division by multiplication and shift if the divisor is constant at compile time. Depending on how the rest of the code is structured around the performance critical code you could use macro or inline tricks to unroll for all possible values of n and let the compiler do the work of finding the magic numbers (similar to the answer with the switch, but I would put more code in the constant region otherwise it might be a tradeof not worth it -- branching can cost you performance also)

Detailed description together with code for calculating the magic numbers can be fund in the Book "Hackers Delight" by Henry S. Warren, Jr. (highly recommended must have book!) pp. 180ff.

Link to Google Books for the relevant chapter:

Chapter 10-9 Unsigned Division by Divisors >= 1

Sonics answered 25/10, 2010 at 18:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.