Next higher/lower IEEE double precision number
Asked Answered
W

8

27

I am doing high precision scientific computations. In looking for the best representation of various effects, I keep coming up with reasons to want to get the next higher (or lower) double precision number available. Essentially, what I want to do is add one to the least significant bit in the internal representation of a double.

The difficulty is that the IEEE format is not totally uniform. If one were to use low-level code and actually add one to the least significant bit, the resulting format might not be the next available double. It might, for instance, be a special case number such as PositiveInfinity or NaN. There are also the sub-normal values, which I don't claim to understand, but which seem to have specific bit patterns different from the "normal" pattern.

An "epsilon" value is available, but I have never understood its definition. Since double values are not evenly spaced, no single value can be added to a double to result in the next higher value.

I really don't understand why IEEE hasn't specified a function to get the next higher or lower value. I can't be the only one who needs it.

Is there a way to get the next value (without some sort of a loop which tries to add smaller and smaller values).

Westfalen answered 7/8, 2009 at 16:59 Comment(1)
IEEE-754 has specified such functions -- nextUp and nextDown as required in section 5.3.1 of the revised (2008) standard, and the earlier nextafter function recommended by the original (1985) standard, and required in C99.Wayless
S
2

C# now has the System.Math.BitIncrement/BitDecrement functions as of .Net Core 3.0+.

According to Microsoft's notes:

[These correspond] to the nextUp and nextDown IEEE operations. They return the smallest floating-point number that compares greater or lesser than the input (respectively). For example, Math.BitIncrement(0.0) would return double.Epsilon.

Sclar answered 4/11, 2021 at 14:50 Comment(2)
Okay, after 12 years there is finally an assured (and easy) way to do it. Thanks, Matt.Westfalen
Yeah sometimes it seems Microsoft likes being in their own little world. Btw, note my comment on Fx's answer: if they edit their answer as I recommended to them then I'd recommend you give it the green checkmark since I'll delete this answerSclar
W
13

There are functions available for doing exactly that, but they can depend on what language you use. Two examples:

  • if you have access to a decent C99 math library, you can use nextafter (and its float and long double variants, nextafterf and nextafterl); or the nexttoward family (which take a long double as second argument).

  • if you write Fortran, you have the nearest intrinsic available

If you can't access these directly from your language, you can also look at how they're implemented in freely available, such as this one.

Warrin answered 9/8, 2009 at 17:17 Comment(1)
Feel free to add .Net Core 3.0+'s built in functions to your list. Let me know when you do so I can delete my answer and this comment. Thanks!Sclar
C
9

Most languages have intrinsic or library functions for acquiring the next or previous single-precision (32-bit) and/or double-precision (64-bit) number.

For users of 32-bit and 64-bit floating point arithmetic, a sound understanding of the basic constructs is very useful for avoiding some hazards with them. The IEEE standard applies uniformly, but still leaves a number of details up to implementers. Hence, a platform universal solution based on bit manipulations of the machine word representations may problematic and may depend on issues such as endian and so on. Whilst understanding all the gory details of how it could or should work at the bit level may demonstrate intellectual prowess, it is still better to use an intrinsic or library solution that is tailored for each platform and has a universal API across supported platforms.

I noticed solutions for C# and C++. Here are some for Java:

Math.nextUp:


public static double nextUp(double d)

Returns the floating-point value adjacent to d in the direction of positive infinity. This method is semantically equivalent to nextAfter(d, Double.POSITIVE_INFINITY); however, a nextUp implementation may run faster than its equivalent nextAfter call.

Special Cases:

  • If the argument is NaN, the result is NaN.
  • If the argument is positive infinity, the result is positive infinity.
  • If the argument is zero, the result is Double.MIN_VALUE

Parameters:

d - starting floating-point value

Returns:

The adjacent floating-point value closer to positive infinity.


public static float nextUp(float f)

Returns the floating-point value adjacent to f in the direction of positive infinity. This method is semantically equivalent to nextAfter(f, Float.POSITIVE_INFINITY); however, a nextUp implementation may run faster than its equivalent nextAfter call.

Special Cases:

  • If the argument is NaN, the result is NaN.
  • If the argument is positive infinity, the result is positive infinity.
  • If the argument is zero, the result is Float.MIN_VALUE

Parameters:

f - starting floating-point value

Returns:

The adjacent floating-point value closer to positive infinity.


The next two are a bit more complex to use. However, a direction towards zero or towards either positive or negative infinity seem the more likely and useful uses. Another use is to see an intermediate value exists between two values. One can determine how many exist between two values with a loop and counter. Also, it seems they, along with the nextUp methods, might be useful for increment/decrement in for loops.

Math.nextAfter:


public static double nextAfter(double start,
                               double direction)

Returns the floating-point number adjacent to the first argument in the direction of the second argument. If both arguments compare as equal the second argument is returned.

Special cases:

  • If either argument is a NaN, then NaN is returned.
  • If both arguments are signed zeros, direction is returned unchanged (as implied by the requirement of returning the second argument if the arguments compare as equal).
  • If start is ±Double.MIN_VALUE and direction has a value such that the result should have a smaller magnitude, then a zero with the same sign as start is returned.
  • If start is infinite and direction has a value such that the result should have a smaller magnitude, Double.MAX_VALUE with the same sign as start is returned.
  • If start is equal to ±Double.MAX_VALUE and direction has a value such that the result should have a larger magnitude, an infinity with same sign as start is returned.

Parameters:

start - starting floating-point value direction - value indicating which of start's neighbors or start should be returned

Returns:

The floating-point number adjacent to start in the direction of direction.


public static float nextAfter(float start,
                              double direction)

Returns the floating-point number adjacent to the first argument in the direction of the second argument. If both arguments compare as equal a value equivalent to the second argument is returned.

Special cases:

  • If either argument is a NaN, then NaN is returned.
  • If both arguments are signed zeros, a value equivalent to direction is returned.
  • If start is ±Float.MIN_VALUE and direction has a value such that the result should have a smaller magnitude, then a zero with the same sign as start is returned.
  • If start is infinite and direction has a value such that the result should have a smaller magnitude, Float.MAX_VALUE with the same sign as start is returned.
  • If start is equal to ±Float.MAX_VALUE and direction has a value such that the result should have a larger magnitude, an infinity with same sign as start is returned.

Parameters:

start - starting floating-point value direction - value indicating which of start's neighbors or start should be returned

Returns:

The floating-point number adjacent to start in the direction of direction.

Combes answered 10/7, 2012 at 21:10 Comment(0)
M
6

As Thorsten S. says, this can be done with the BitConverter class, but his method assumes that the DoubleToInt64Bits method returns the internal byte structure of the double, which it does not. The integer returned by that method actually returns the number of representable doubles between 0 and yours. I.e. the smallest positive double is represented by 1, the next largest double is 2, etc. etc. Negative numbers start at long.MinValue and go away from 0d.

So you can do something like this:

public static double NextDouble(double value) {

    // Get the long representation of value:
    var longRep = BitConverter.DoubleToInt64Bits(value);

    long nextLong;
    if (longRep >= 0) // number is positive, so increment to go "up"
        nextLong = longRep + 1L;
    else if (longRep == long.MinValue) // number is -0
        nextLong = 1L;
    else  // number is negative, so decrement to go "up"
        nextLong = longRep - 1L;

    return BitConverter.Int64BitsToDouble(nextLong);
}

This doesn't deal with Infinity and NaN, but you can check for those and deal with them however you like, if you're worried about it.

Maseru answered 17/2, 2010 at 19:2 Comment(2)
I see that you are using my code because the argument is value, but BitConverter.DoubleToInt64Bits gets "d" as argument. I had reservations about simply adding one because the IEEE format separates exponent and significand, but because it has a hidden bit your function is in fact ok as far as I can see.Forelady
Dear reader, hello from the future! Toward the end of 2019 Microsoft will give you .Net Core 3.0 which has built in functionsSclar
F
3

Yes, there is a way. In C#:

public static double getInc (double d)
{
    // Check for special values
    if (double.IsPositiveInfinity(d) || double.IsNegativeInfinity(d))
        return d;
    if (double.IsNaN(d))
        return d;

    // Translate the double into binary representation
    ulong bits = (ulong)BitConverter.DoubleToInt64Bits(d);
    // Mask out the mantissa bits
    bits &= 0xfff0000000000000L;
    // Reduce exponent by 52 bits, so subtract 52 from the mantissa.
    // First check if number is great enough.
    ulong testWithoutSign = bits & 0x7ff0000000000000L;
    if (testWithoutSign > 0x0350000000000000L)
        bits -= 0x0350000000000000L;
    else
        bits = 0x0000000000000001L;
    return BitConverter.Int64BitsToDouble((long)bits);
}

The increase can be added and subtracted.

Forelady answered 8/12, 2009 at 0:4 Comment(3)
This doesn't compile, and I don't think that you are properly using the BitConverter.DoubleToInt64Bits method properly anyway. If you want to get the byte representation of a number, you should use BitConverter.GetBytes (but then you need to make sure you increment or decrement the exponent, if needed).Maseru
It does not compile because C# does not allow to mix ulong/long constants and variables (which is stupid for bit operators). And you thought wrong, the BitConverter method does indeed return the internal byte structure of the double in the IEEE format.Forelady
Dear reader, hello from the future! Toward the end of 2019 Microsoft will give you .Net Core 3.0 which has built in functionsSclar
S
2

C# now has the System.Math.BitIncrement/BitDecrement functions as of .Net Core 3.0+.

According to Microsoft's notes:

[These correspond] to the nextUp and nextDown IEEE operations. They return the smallest floating-point number that compares greater or lesser than the input (respectively). For example, Math.BitIncrement(0.0) would return double.Epsilon.

Sclar answered 4/11, 2021 at 14:50 Comment(2)
Okay, after 12 years there is finally an assured (and easy) way to do it. Thanks, Matt.Westfalen
Yeah sometimes it seems Microsoft likes being in their own little world. Btw, note my comment on Fx's answer: if they edit their answer as I recommended to them then I'd recommend you give it the green checkmark since I'll delete this answerSclar
E
1

I'm not sure I'm following your problem. Surely the IEEE standard is totally uniform? For example, look at this excerpt from the wikipedia article for double precision numbers.

3ff0 0000 0000 0000   = 1
3ff0 0000 0000 0001   = 1.0000000000000002, the next higher number > 1
3ff0 0000 0000 0002   = 1.0000000000000004

What's wrong with just incrementing the least significant bit, in a binary or hex representation?

As far as the special numbers go (infinity, NaN,etc.), they're well defined, and there aren't very many of them. The limits are similarly defined.

Since you've obviously looked into this, I expect I've got the wrong end of the stick. If this isn't sufficient for your problem, could you try and clarify what you're wanting to achieve? What is your aim here?

Eskil answered 7/8, 2009 at 17:37 Comment(4)
Would that work in cases where the exponent would have to increase?Delibes
My aim is to do this cleanly, preferably from C#, but I'll stoop to bit level if I have to. The problem is that the IEEE standard is not in the public domain, and I can't afford to purchase it. The standard defines the bit patterns for the case you show, but also for all the unusual numbers (such as the sub-normals). One shouldn't have to know the full details of all the number formats to do this task. But if you flip bits yourself, you would have to. What if the 'next' number is a subnormal? Unless you know all the rules, you CAN'T get there!Westfalen
@Mark T:Ok, I understand your problem now. I hadn't realised the standard wasn't freely available (amazing)! Here are implementations of a number of functions, including dnxtaft.f, which returns the next floating point value in the direction of x. Perhaps this will help? math.utah.edu/~beebe/software/ieeeEskil
@Mark T: I also note there is the fp_class() function, which tells you what type of floating point number you are dealing with: intel.com/software/products/compilers/docs/flin/main_for/…, example usage here: johndcook.com/IEEE_exceptions_in_cpp.htmlEskil
C
1

In regards to the epsilon function, it is an estimation of how far off the approximation of a decimal value the binary double could be. That is because, for very large positive or negative decimal numbers or very small positive or negative decimal numbers, many of them map to the same binary representation as a double. Try some very, very large or very, very small decimal numbers, create doubles from them and then transform back to a decimal number. You will find that you will not get the same decimal number back, but the one that the double is closest to instead.

For values near (near relative to the vast range of decimal values doubles can represent) 1 or -1, epsilon will be zero or very, very small. For values that progressively head towards + or - infinity or zero, epsilon will start to grow. At values extremely close to zero or either infinity, epsilon will be very large because the available binary representations for decimal values in those ranges are very, very sparse.

Combes answered 11/7, 2012 at 9:0 Comment(0)
L
0

If you use .NET then since .NET 7 you can also use Double.BitIncrement/Double.BitDecrement apart from the older Math.BitIncrement/Math.BitDecrement

There are also the generic versions: IFloatingPointIeee754<TSelf>.BitDecrement() and IFloatingPointIeee754<TSelf>.BitIncrement() as Rob commented

Loella answered 15/10, 2022 at 5:6 Comment(1)
And generic math adds link and link which are the same functions you pointed out, but more generic. ;)Pippo

© 2022 - 2024 — McMap. All rights reserved.