Are there any numbers that enable fast modulo calculation on floats?

Asked 6/3, 2018 at 20:17 Answered 8/3, 2018 at 10:44

I know that for unsigned integers, I can replace the modulo operation with a bitmask, if the divisor is a power of two. Do any numbers have a similar property for floats? That is, are there any numbers n for which f mod n can be calculated more efficiently than the in general case, not necessarily using a bitmask?

~~Other than, of course, one.~~ Brain faliure

Edit: to clarify, f is any floating point number (determined at runtime), n is any compile-time constant number in any format and I expect the result to be a float.

Almeta answered 6/3, 2018 at 20:17 Comment(14)

My intuition says "No". And the language lawyer says "it might depend on the floating point representation, which is implementation defined" – Indopacific 6/3, 2018 at 20:17

I suppose there is the case where n == 0... – Sentimentalism 6/3, 2018 at 20:21

"Other than, of course, one." <- Why "of course"? – Unfasten 6/3, 2018 at 20:22

Use fmod(x,y). Good compilers, may detect optimizations available for select x,y and emit fast code. – Highway 6/3, 2018 at 20:24

I think OP means that if n == 1 the result is obviously 0. – Pippy 6/3, 2018 at 20:26

@Jean-FrançoisFabre: Ah, I was assuming that f was a general floating-point number rather than an integral floating-point number. – Unfasten 6/3, 2018 at 20:27

you were assuming right... My remark is moot. – Pippy 6/3, 2018 at 20:28

@Jean-FrançoisFabre Apart from the issue that with floats you can never be sure that n==1 is even possible... – Indopacific 6/3, 2018 at 20:28

1 can be represented with just 1 bit IIRC so it's exact, even as float (like all powers of 2) – Pippy 6/3, 2018 at 20:30

@MarkDickinson Because f%1 == f – Almeta 6/3, 2018 at 20:30

@ego, Do you expect f to be a float with values spanning all float (fractions, whole numbers, negatives, INF, NaN, signed 0), or interested in a subset of float? – Highway 6/3, 2018 at 20:31

@Almeta f%1 == f what?! f%1 = 0 is a better candidate, yet even that has issues. – Highway 6/3, 2018 at 20:32

Need more clarity: "mod" with unsigned is unabiguous. "mod" with int can be thought of differently depending on which "mod" you want. In C, % is the remainder, perhaps not the "mod" you want. This applies to float also. Either define or post examples of expected results with various float "modded" by a constant. – Highway 6/3, 2018 at 20:45

Which of the multiple available operations in C/C++ does this question refer to:fmod(), remainder(), remquo()? The difference between the first two is a difference in rounding mode for the quotient. – Inpatient 6/3, 2018 at 23:32

If n == 1.0 or n == -1.0, then you can do:

r = f - trunc(f);

On x86_64, trunc will typically use the ROUNDSD instruction, so this will be pretty fast.

If n is a power of 2 with magnitude greater than or equal to 1, and your platform has a fma function that is native (for Intel, this means Haswell or newer), then you could do

r = fma(-trunc(f / n), n, f);

Any reasonable compiler should switch the division to a multiplication, and fold the negation into the appropriate FMA (or the constant), resulting in a multiplication, a truncation and an FMA.

This can also work for smaller powers of 2, as long as the result doesn't overflow (so the compiler wouldn't be free to substitute it).

Whether any compilers will actually do this is another matter. Floating point remainder functions aren't used much, and don't get much attention from compiler writers, e.g. https://bugs.llvm.org/show_bug.cgi?id=3359

Scalf answered 6/3, 2018 at 21:26 Comment(2)

You say “Not generally” and then describe cases for which there are solutions, which is what the question asks for: “Do any numbers have a similar property for floats?” So the answer is yes. – Ola 6/3, 2018 at 21:51

Good point, I'll fix it (I had a different solution in mind when I started the answer) – Scalf 6/3, 2018 at 21:53

The mathematics works the same for floating-point types as it does for integer types: If n is a power of the radix (two for binary), then f modulo n can be computed by zeroing digits representing values n or greater (also known as the high bits or high digits).

So, for a binary integer with bits b₁₅ b₁₄ b₁₃ b₁₂ b₁₁ b₁₀ b₉ b₈ b₇ b₆ b₅ b₄ b₃ b₂ b₁ b₀, we can compute the residue modulo four simply by setting b₁₅ to b₂ to zero, leaving only b₁ b₀.

Similarly, if the radix of the floating-point format is two, we can compute the residue modulo four by removing all digits whose value is four or greater. This does not require a division, but it does require examining the bits representing the value. A simple bit mask alone will not suffice.

The C standard characterizes a floating-point type as a sign (±1), a base b, an exponent, and some number of base b digits. Thus, if we know the format a particular C implementation uses to represent a floating-point type (the way that the sign, exponent, and digits are encoded into bits), an algorithm for calculating f modulo n, where n is a power of b, is:

Let y = f.
Use the difference between the exponent of y and the exponent of n to decide which digits in y have position values less than n.
Change those digits to zero.
Return f − y.

Some notes:

The algorithm has to handle infinities, NaNs, subnormals, and other special cases.
The purpose of zeroing low digits in the copy y and subtracting from f rather than zeroing high digits directly in f is to avoid the need to zero the implicit bit in the IEEE 754 format. (In the algorithm as stated, if the implicit bit in y needs to be zeroed, then all of y is zeroed, so it is easy.)
Although no division is used, the manipulations are not as simplistic as a bit mask and are unlikely to be useful generally. However, there are special situations, often with known values of d and limited values of x, where such bit manipulations of floating-point representations are useful.

Sample code:

//  This code assumes double is IEEE 754 basic 64-bit binary floating-point.

#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>


//  Return the bits representing the double x.
static uint64_t Bits(double x)
    { return (union { double d; uint64_t u; }) { x } .u; }


//  Return the double represented by bits x.
static double Double(uint64_t x)
    { return (union { uint64_t u; double d; }) { x } .d; }


//  Return x modulo 2**E.
static double Mod(double x, int E)
{
    uint64_t b = Bits(x);
    int      e = b >> 52 & 0x7ff;

    //  If x is a NaN, return it.
    if (x != x) return x;

    //  Is x is infinite, return a NaN.
    if (!isfinite(x)) return NAN;

    //  If x is subnormal, adjust its exponent.
    if (e == 0) e = 1;

    //  Remove the encoding bias from e and get the difference in exponents.
    e = (e-1023) - E;

    //  Calculate number of bits to keep.  (Could be consolidated above, kept for illustration.)
    e = 52 - e;

    if (e <= 0) return 0;
    if (53 <= e) return x;

    //  Remove the low e bits (temporarily).
    b = b >> e << e;

    /*  Convert b to a double and subtract the bits we left in it from the
        original number, thus leaving the bits that were removed from b.
    */
    return x - Double(b);
}


static void Try(double x, int E)
{
    double expected = fmod(x, scalb(1, E));
    double observed = Mod(x, E);

    if (expected == observed)
        printf("Mod(%a, %d) = %a.\n", x, E, observed);
    else
    {
        printf("Error, Mod(%g, %d) = %g, but expected %g.\n",
            x, E, observed, expected);
        exit(EXIT_FAILURE);
    }
}


int main(void)
{
    double n = 4;

    //  Calculate the base-two logarithm of n.
    int E;
    frexp(n, &E);
    E -= 1;

    Try(7, E);
    Try(0x1p53 + 2, E);
    Try(0x1p53 + 6, E);
    Try(3.75, E);
    Try(-7, E);
    Try(0x1p-1049, E);
}

Ola answered 6/3, 2018 at 21:18 Comment(14)

The operations involve a bitmask in the mantissa, which reduces to a left bit shift of n bits on it, followed by a renormalization of the number (another shift to put the most significand bit the first) and the addition to the exponent field of the number of shifts finally involved. – Grekin 8/3, 2018 at 9:47

In reality, it's even simpler, as the operation is exactly the same as for normalization, but forcing the left bits to zero while dropping them over the left side of the mantissa. Normalization searches for the first 1 bit while left shifting the mantissa and decrementing the biased exponent. This is the same, but forcing it to do n shifts, before searching the first 1 bit. – Grekin 8/3, 2018 at 11:9

@LuisColorado: If you write the code, I think you will find it is not simpler to clear the left bits of the significand. (“Significand” is the preferred term; “mantissa” is a legacy term that properly refers to the fractional part of a logarithm.) First, the leading bit is not present in the encoded significand, so it cannot be used for normalization until it is added. Second, to shift the bits of the significand, you have to separate them from the exponent and sign fields. Third, C does not provide an operator for normalizing. – Ola 8/3, 2018 at 12:14

@LuisColorado: So, the necessary code is more complicated than the code I supplied which instead clears the low (“right”) bits and then subtracts. Additionally, I updated the code so that, replacing the creation of a bit mask to clear the low bits with a simpler >> e << e. This makes the code to adjust the significand (versus the code to handle special cases and calculate the exponent difference) just three operations (right shift, left shift, floating-point subtract), each of which typically maps to a single instruction. – Ola 8/3, 2018 at 12:19

@LuisColorado: Additionally, I added code to handle subnormals. With the technique of removing low bits and subtracting, the extra code to handle subnormals is merely if (e == 0) e = 1;. If we tried to handle subnormals while manipulating the high bit of the significand, additional code would be required (because a subnormal significand cannot be normalized, so it must be handled specially). – Ola 8/3, 2018 at 12:20

In the case of subnormals you don't normalize after left shifting... but that's quite strange, as for that to happen the modulo has to be a subnormal also, and that's not only very strange (to use the subnormal range) but also allows you to switch to fixed point arithmetic and forget floating point. – Grekin 9/3, 2018 at 6:20

subnormals are handled as fixed point arithmethic. The only case that has to be handled specially is when a non-subnormal module makes the original number to become subnormal... in that case, when you get to the subnormal range e == 0, you only stop normalizing the number and store it as is.... no normalizing with subnormals, as they are fixed point arithmetic. – Grekin 9/3, 2018 at 6:30

Eric, probably you have a better answer than mine, please, write it. Don't come here to correct me telling if significand is more exact than mantissa, as both terms are clearly defined for the problem. I lack the english education you have, and am grateful and merciful for the correction, but it is not needed here because both terms apply. – Grekin 9/3, 2018 at 6:33

Eric, C probably doesn't have an operator to do normalization, but it is an operation that you have to implement to do simple arithmetic with floating points in ieee-754 if you want to operate them in software. In hardware it is fully implemented, and even more.... the modulo operator is optimized and deals automatically with powers of two moduli, so this discussion is clearly out of scope. – Grekin 9/3, 2018 at 6:35

@LuisColorado: (a) There is no need to normalize after left shifting. The code shifts both right and left, in b >> e << e, so the net shift is zero. The effect is only to clear certain low bits. The implicit high bit is unaffected, so there is no need for renormalization. The code works for both normal and subnormal values. (I tested it each each combination of a bit at each position in the subnormal range and a few bits into the normal range, for one such bit in the dividend and one such in the divisor. If you believe it is incorrect, show a counterexample.) – Ola 9/3, 2018 at 12:9

@LuisColorado: (b) The fmod function is not optimized for special cases that the OP asks about. (Certainly not in macOS and I doubt in implementations generally.) – Ola 9/3, 2018 at 12:13

@LuisColorado: (c) My code does not require any operation to do normalization not already present in it. I have presented working code. If you believe it is incorrect, then show specific numbers for which the code fails. – Ola 9/3, 2018 at 12:14

I prepared a version of Mod for float instead of double and tested it (by comparing to fmod) with E from −150 to −99 (so far) with every non-negative value of x representable in float. No errors have been reported, so the code is handling both normals and subnormals correctly. – Ola 9/3, 2018 at 14:2

@Eric Postpischil Minor nitpick: results of negative zero currently are not handled correctly (from the C standard, incorporated into C++ standard by reference: " ... if y is nonzero, the result has the same sign as x ..."). This is easily addressed by copying the sign of the dividend x to the result (two instances in this code). – Inpatient 31/3, 2023 at 21:10

If n == 1.0 or n == -1.0, then you can do:

r = f - trunc(f);

On x86_64, trunc will typically use the ROUNDSD instruction, so this will be pretty fast.

If n is a power of 2 with magnitude greater than or equal to 1, and your platform has a fma function that is native (for Intel, this means Haswell or newer), then you could do

r = fma(-trunc(f / n), n, f);

Any reasonable compiler should switch the division to a multiplication, and fold the negation into the appropriate FMA (or the constant), resulting in a multiplication, a truncation and an FMA.

This can also work for smaller powers of 2, as long as the result doesn't overflow (so the compiler wouldn't be free to substitute it).

Scalf answered 6/3, 2018 at 21:26 Comment(2)

Good point, I'll fix it (I had a different solution in mind when I started the answer) – Scalf 6/3, 2018 at 21:53

In general, no. However, given the "fuzzy" nature of floating point (at least IEEE 754), there are ~~abuses~~ tricks that can radically speed up certain calculations by approximations, at the cost of memory, precision, portability, maintainability, or a combination of these.

The simplest approach is to precalculate the operation and store the result into a lookup table before runtime. The larger you make the table, the more memory you use, but you also gain more precision.

A more unique approach dabbles with the floating point representation in memory. One of the more famous floating point hacks is the fast inverse square root. By these concepts, the general idea is that you can make an approximation function for just about anything you want, not just inverse square roots. If you know your input range and your tolerance for error, you could construct such an algorithm that is tuned precisely for your purposes.

And it would be very remiss of me not to state the importance in benchmarking! If you want to apply any of these techniques in an effort to speed up your program, benchmark first, and be certain that you are optimizing the right place!

Jeannettejeannie answered 6/3, 2018 at 21:25 Comment(0)

Yes, when they are powers of two, there's also the possibility of doing modulo operations with a pseudo-mask, but in this case, we have to consider the fact that floating point numbers are formatted following IEEE-754 standard.

Let's assume we do the same operation, but this time, as numbers are real numbers, the power of two number will be a 1 bit followed by an infinite number of 0s.

1000000000000000000000000... * 2^exp

to get the mask, we do the same thing as with integers... change the 1 with a 0 and all the bits following that digit change to 1s.

0111111111111111111111111... * 2^exp  =
1111111111111111111111111... * 2^(exp-1)
(THIS NUMBER IS (1.0 - FLT_EPSILON) * 2^(exp_of_module - 1))

but this is an all ones mask so the mantissa is never touched, except for the bits that are over the modulo number (which go to zero). When we zero all this bits, we don't need to do anything but shifting them to the left and let them drop into the waste basket, because they are masked out. So the masking of the mantissa is always a left shift (filling with more digits on the right from the number --- oops, we don't have them, so fill with zeros/random bits/ones...) and then normalize the number (this means to shift the number until the first significative one gets to the first place and the exponent bias is adjusted coordinately)

Let's see an example: We have 2.0 as modulus, and M_PI(3.141592...) as the number to mask on:

3.141592... = 40 49 0f db
            = 0100 0000 0100 1001 0000 1111 1101 1011...
; which represents
            = 0 (positive)
               100 0000 0 (exponent biased 128 ==> 1)
                     (1.)100 1001 0000 1111 1101 1011...
                      ^ Allways 1 so IEEE-754 doesn't include, we have to do, to operate.
                       11.00 1001 0000 1111 1101 1011...
                       01.11 1111 1111 1111 1111 1111...
             MASKED == 01.00 1001 0000 1111 1101 1011...
                       / // //// //// //// //// ////,-- introduced to complete format
       RENORMALIZED == 1.001 0010 0001 1111 1011 011X...
; result    = 0 (positive)
            =  011 1111 1 (new exponent after norm. 127 ==> 0)
                       1.001 0010 0001 1111 1011 011X
            = 0011 1111 1001 0010 0001 1111 1011 011X

which is 1.141592... as expected.

As you see, we have to check the difference between the biased exponents, and left shift the mantissa as many places as that difference indicates. At the same time, we need to substract that difference to the biased exponent (and check for underflow or subnormal cases) and renormalize the number (left shift the mantissa, and decrement exponent, until the firs one in the mantissa goes to the hidden bit place)

NOTE

I assumed the biased exponent of the modulo is lower than the one of the number to be operated on, as in the other case, the mask is all ones, and the number is not affected by the mask.

Grekin answered 8/3, 2018 at 10:44 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

NOTE

Recommended topics

Hot tags