i had a problem where i was trying to reconstruct the the formula used in an existing system, a fairly simple formula of one input and one output:
y = f(x)
After a lot of puzzling, we managed to figure out the formula that fit our observed data points:
And as you can see our theoretical model fit observed data very well:
Except when we plot residual errors (i.e. y = f(x) - actualY
), we see some lines appear in the residuals:
It was obvious that these lines were the result of applying some intermediate rounding in our formula, but it was not obvious where. Eventually it was realized that the original system (the one we're trying to reverse engineer) is storing values in an intermediate Decimal
data type:
- with 8-bit precision of the fraction
- using the 0.5 round-up rounding model:
We could simulate this 8-bit precision in the fraction by:
multiply by 128 (i.e. 2^8)
apply the round
divide by 128 (i.e. 2^8)
Changing our equation above into:
This reduces the residual errors significantly:
Now, all of that above has no relevance to my question except:
- To show that simulating the numerical representation in the computer can help the model
- To get people's attention with pretty pictures and colors
- Silence critics who would refuse to contribute until i explain why i'm asking my question
Now i want to simulate Single Precision
floating point numbers, inside a programming language (and Excel) which use Double Precision
floating point numbers. i want to do this because i think it is what's needed.
In the above example i thought the original system was using a Decimal data type with fixed 8-bit fractional precision using 0.5 round-up rules
. i then had to find a way to simulate that computation model with Double
math. Now i think the original system is using Single
precision math, that i want to simulate using Double
.
How do i simulate single-precision rounding using doubles?
In my current model, i once again have residuals that fall into the regular linear patterns - that are a tell-tale sign of rounding:
The problem is that the error becomes larger, and only visible, as my input variables become larger. i realized this is likely caused by the fact that all floating point numbers are normalized into IEEE 754 "scientific notation".
And even if i'm wrong, i still want to try it.
And even if i don't want to trying it, i'm still asking the question
How do i simulate
Single
precision rounding usingDoubles
?
It seems to me i could still apply the concept of "rounding after 8 fractional bits" (although 24 bits for Single
precision floating point), as long as i can first "normalize" the value. e.g.
1234567898.76543
needs to be converted into (something similar to):
1.23456789876543 E-09
Then i could apply my "round to the 24th bit" (i.e. 2^24 = 16,777,216)
floor(1.23456789876543E-09 * 16777216 + 0.5) / 16777216;
The problem, then, is what combination of sign
, abs
, ln
, exp
(or other functions) can i possible apply so that i can "normalize" my value, round it to the n-th binary place, then "denormalize" it?
Note: i realize IEEE representation keeps a binary 1
as the most significant bit. i might not need to duplicate that behavior in order to get correct results. So it's not a deal-breaker, nor is it cause to suggest that the entire approach is a failure.