How to simulate Single precision rounding with Doubles?

Asked 23/9, 2012 at 14:28 Answered 9/11, 2020 at 13:44

Solved floating-point double floating-accuracy ieee-754

i had a problem where i was trying to reconstruct the the formula used in an existing system, a fairly simple formula of one input and one output:

y = f(x)

After a lot of puzzling, we managed to figure out the formula that fit our observed data points:

enter image description here

And as you can see our theoretical model fit observed data very well:

enter image description here

Except when we plot residual errors (i.e. y = f(x) - actualY), we see some lines appear in the residuals:

enter image description here

It was obvious that these lines were the result of applying some intermediate rounding in our formula, but it was not obvious where. Eventually it was realized that the original system (the one we're trying to reverse engineer) is storing values in an intermediate Decimal data type:

with 8-bit precision of the fraction
using the 0.5 round-up rounding model:

We could simulate this 8-bit precision in the fraction by:

multiply by 128 (i.e. 2^8)
apply the round
divide by 128 (i.e. 2^8)

Changing our equation above into:

enter image description here

This reduces the residual errors significantly:

enter image description here

Now, all of that above has no relevance to my question except:

To show that simulating the numerical representation in the computer can help the model
To get people's attention with pretty pictures and colors
Silence critics who would refuse to contribute until i explain why i'm asking my question

Now i want to simulate Single Precision floating point numbers, inside a programming language (and Excel) which use Double Precision floating point numbers. i want to do this because i think it is what's needed.

In the above example i thought the original system was using a Decimal data type with fixed 8-bit fractional precision using 0.5 round-up rules. i then had to find a way to simulate that computation model with Double math. Now i think the original system is using Single precision math, that i want to simulate using Double.

How do i simulate single-precision rounding using doubles?

In my current model, i once again have residuals that fall into the regular linear patterns - that are a tell-tale sign of rounding:

enter image description here

The problem is that the error becomes larger, and only visible, as my input variables become larger. i realized this is likely caused by the fact that all floating point numbers are normalized into IEEE 754 "scientific notation".

And even if i'm wrong, i still want to try it.

And even if i don't want to trying it, i'm still asking the question

How do i simulate Single precision rounding using Doubles?

It seems to me i could still apply the concept of "rounding after 8 fractional bits" (although 24 bits for Single precision floating point), as long as i can first "normalize" the value. e.g.

1234567898.76543

needs to be converted into (something similar to):

1.23456789876543 E-09

Then i could apply my "round to the 24th bit" (i.e. 2^24 = 16,777,216)

floor(1.23456789876543E-09 * 16777216 + 0.5) / 16777216;

The problem, then, is what combination of sign, abs, ln, exp (or other functions) can i possible apply so that i can "normalize" my value, round it to the n-th binary place, then "denormalize" it?

Note: i realize IEEE representation keeps a binary 1 as the most significant bit. i might not need to duplicate that behavior in order to get correct results. So it's not a deal-breaker, nor is it cause to suggest that the entire approach is a failure.

frexp takes a floating point number and separates the mantissa from the exponent. The resulting mantissa is either 0 or in one of the ranges [0.5, 1.0) or (-1.0, 0.5]. You can then remove any extra bits in the obvious way (floor(mantissa * 2^k)/2^k for non-negative values, for example). (Edited to add:) It would be better to subtract k from the exponent in the call to ldexp than to do the divide as shown, because I'm pretty sure that Lua doesn't guarantee that 2^k is precise.

ldexp is the inverse of frexp; you can use that to put the truncated number back together again.

I have no idea how to do this in Excel. Check the manual :) (Edited to add:) I suppose you could get roughly the same effect by dividing the number by 2 to the power of the ceiling of the log 2 of the number, and then doing the binary round as indicated above, and then reversing the process to recreate the original exponent. But I suspect the results would occasionally run into peculiarities with Excel's peculiar ideas about arithmetic.

Alatea answered 23/9, 2012 at 16:31 Comment(1)

Nice. Simple. Pragmatic. Thank you so much for the insightful answer. – Hokanson 3/12, 2014 at 1:31

You can get most of the effect of rounding to single precision using:

y = x + x * 0x1p29 - x * 0x1p29;

In most cases, this produces the same result in y as if x had been rounded to float (32-bit binary IEEE 754) and then converted back to double (64-bit). It works by adding a value (x * 0x1p29) that “pushes” some bits of x out of the significand, causing rounding at bit 23, and then subtracting the value that was added. (0x1p29 is hexadecimal floating-point for 2²⁹, 536870912.)

In rare cases, it produces a slightly different result. If you merely want to reduce noise in a model, these rare cases might be negligible. If you want to eliminate them, then, instead of adding and subtracting 2²⁹x, you could find the largest power of 2 not greater than x and add and subtract 2²⁹ times that instead of 2²⁹x. (To find the power of 2, you can take the base-two logarithm and take the floor of that. However, there are still rounding issues that might require compensation. Additionally, if the input might be zero or negative, you must avoid the error that occurs when taking its logarithm.)

Additionally, this does not reproduce the behavior for numbers that are subnormal in single-precision or that overflow in single-precision.

Finally, there are rare cases where computing a double-precision result and then rounding to single precision produces a result slightly different from computing a single-precision result originally, and no method of rounding the double-precision result will fix this.

Fulminant answered 24/9, 2012 at 17:1 Comment(8)

What's constantly in the back of my mind is that the original system can consistently reproduce these values. It's not some random magic going on - it's an algorithm in a computer. At some level the laws of computing say i should be able to reproduce the math. – Mcphail 25/9, 2012 at 0:55

@IanBoyd, you can, see my answer. (ldexp and frexp just twiddle bits, they don't do complicated computations in the background.) So they give you precisely the result you're looking for. (At least if you're using some kind of C variant or Lua. Or, I just checked, Python.) – Alatea 26/9, 2012 at 3:22

@IanBoyd: Of course the computation can be replicated. However, there is no reason that should be easy given the operations available in Excel. – Fulminant 26/9, 2012 at 13:16

i'm going to give it to rici, but you have a very good solution also. i noticed that x+x*2^29-x*2^29 gives the same results as my original floor(x*c+0.5)/c routine. i don't know why, but details of floating point operations are never obvious. – Mcphail 28/9, 2012 at 20:46

To your last point, that's not quite true. If the original double-precision result were rounded with the "round inexact to odd" rule, then you can round that to single without introducing a double rounding. We're way off in the weeds of floating-point arithmetic (and way off of what there is hardware or language support for) at that point, however. – Pentimento 28/9, 2012 at 20:55

@StephenCanon Probably my only next bet is to dig out the assembly language debugger. – Mcphail 28/9, 2012 at 21:1

@StephenCanon: “No method of rounding the double-precision result will fix this” means there is no function of the already obtained double-precision result that always produces what would have been obtained in single-precision. That is, it is a statement about what we can do after obtaining the double, not about how we can round it before it is final. (And we are in no danger of Excel providing “round inexact to odd.”) – Fulminant 28/9, 2012 at 21:12

If the formula uses only basic IEEE 754 operations (as the OP's does), another relevant reference may be “When is double rounding innocuous?” by Samuel A. Figueroa. I did not read it, I only had it summarized to me by an expert, but if I do not mis-remember, there is never a double rounding issue when rounding to single-precision the result of a double-precision basic operation on single-precision operands. – Homogony 5/11, 2012 at 21:16

Use code like this (C):

double x, y;
/ ... y gets a double value somewhere ... /
x = (double)(float)y;

After that, x (double) will have a value resulted from rounding of y as single precision float.

Edda answered 13/2, 2015 at 14:36 Comment(0)

You can create a custom function using a module in Excel VBA:

Function SINGLEFLO(Eingang As Double)
    Dim MySingle As Single
    MySingle = CSng(Eingang)
    SINGLEFLO = CDbl(MySingle)
End Function

Then you can use SINGLEFLO() in all Excel Cells were you need limiting precision to Single Float.

Cathepsin answered 9/11, 2020 at 13:44 Comment(0)

See also

Recommended topics

Hot tags