How to find the multiplier that produces a smaller output for every double values?

Asked 22/8, 2018 at 15:55 Answered 22/8, 2018 at 17:52

Solved algorithm floating-point double multiplication ieee-754

How would you find the greatest and positive IEEE-754 binary-64 value C such that every IEEE-754 product of a positive, normalized binary-64 value A with C is smaller than A?

I know it must be close to 0.999999... but I'd like to find exactly the greatest one.

Suppose round-to-nearest, ties to even.

Tagmeme answered 22/8, 2018 at 15:55 Comment(12)

what about binary search on mantissa bits ? – Theravada 22/8, 2018 at 16:6

Is this something dynamic for a specific number on a specific platform, or a general closest number. If it's the latter, then it would be a sign of 0, an exponent of all ones, except for the first and last bits, and a mantissa of all 1. – Disario 22/8, 2018 at 16:6

Section 5.2.4.2.2 of the C11 standard has a log of information on the properties of floating point types. Maybe some combination of things in there will be helpful? – Renown 22/8, 2018 at 16:6

Depending on rounding mode it could be as high as 1-DBL_EPSILON, and if you don't assume IEEE-754 then could be literally anything. Are you willing to assume the "standard" floating point system (IEEE-754) and "standard" rounding mode (round to nearest, ties to even)? – Professoriate 22/8, 2018 at 16:6

Exactly, I din't specify that, need to add it to the post. – Tagmeme 22/8, 2018 at 16:18

I see your edit, but do you mean to also imply IEEE-754 binary-64 value representation and IEEE-754 multiplication? C itself does not require any of these things, and some (mostly historic) implementations have used different ones. – Latrell 22/8, 2018 at 16:28

I deleted my answer because I thought that the OP was asking about C#. – Metallurgy 22/8, 2018 at 16:29

@ John Bollinger, yep, should I also require "modern" FPU in the post? Any suggestion how to specify that? – Tagmeme 22/8, 2018 at 16:34

@Tau the problem isn't the language. It can be easily converted to C. However your constant isn't correct – Snooperscope 22/8, 2018 at 16:44

@Anonymous, no, unless you actually mean to be FPU-specific. It is the semantics of the arithmetic that were not fully specified. I have edited, including to specify the particular one the two round-to-nearest modes that I suppose you intended. – Latrell 22/8, 2018 at 17:4

Thank you @JohnBollinger – Tagmeme 22/8, 2018 at 17:9

Presumably, the fact that A is normalized excludes infinity? Otherwise, no such C exists. – Calamander 22/8, 2018 at 20:9

There've been a couple of experimental approaches; here's a proof that C = 1 - ε, where ε is machine epsilon (that is, the distance between 1 and the smallest representable number greater than 1.)

We know that C < 1, of course, so it makes sense to try C = 1 - ε/2 because it's the next representable number smaller than 1. (The ε/2 is because C is in the [0.5, 1) bucket of representable numbers.) Let's see if it works for all A.

I'm going to assume in this paragraph that 1 <= A < 2. If both A and AC are in the "normal" region then it doesn't really matter what the exponent is, the situation will be the same with the exponent 2^0. Now, that choice of C obviously works for A=1, so we are left with the region 1 < A < 2. Looking at A = 1 + ε, we see that AC (the exact value, not the rounded result) is already greater than 1; and for A = 2 - ε we see that it's less than 2. That's important, because if AC is between 1 and 2, we know that the distance between AC and round(AC) (that is, rounding it to the nearest representable value) is at most ε/2. Now, if A - AC < ε/2, then round(AC) = A which we don't want. (If A - AC = ε/2 then it might round to A given the "ties to even" part of the normal FP rounding rules, but let's see if we can do better.) Since we've chosen C = 1 - ε/2, we can see that A - AC = A - A(1 - ε/2) = A * ε/2. Since that's greater than ε/2 (remember, A>1), it's far enough away from A to round away from it.

BUT! The one other value of A we have to check is the minimum representable normal value, since there AC is not in the normal range and so our "relative distance to nearest" rule doesn't apply. And what we find is that in that case A-AC is exactly half of machine epsilon in the region. "Round to nearest, ties to even" kicks in and the product rounds back up to equal A. Drat.

Going through the same thing with C = 1 - ε, we see that round(AC) < A, and that nothing else even comes close to rounding towards A (we end up asking whether A * ε > ε/2, which of course it is). So the punchline is that C = 1-ε/2 almost works but the boundary between normals and denormals screws us up, and C = 1-ε gets us into the end zone.

Professoriate answered 22/8, 2018 at 17:52 Comment(1)

Thank you so much, I'm 100% convinced now. – Tagmeme 22/8, 2018 at 18:12

Due to the nature of floating-point types, C will vary depending on how big the value of A is. You can use nextafter to get the largest value less than 1 which will be the rough value for C

However it's possible that if A is too large or too small, A*C will be the same as A. I'm not able to mathematically prove that nextafter(1.0, 0) will work for all possible A's, therefore I'm suggesting a solution like this

double largestCfor(double A)
{
    double C = nextafter(1.0, 0);
    while (C*A >= A)
        C = nextafter(C, 0);
    return C;
}

If you want a C value that works for any A, even if C*A might not be the largest possible value then you'll need to check for every exponents that the type can represent

double C = 1;
for (double A = 0x1p-1022; isfinite(A); A *= 2) // loop through all possible exponents
{
    double c = largestCfor(A);
    if (c < C) C = c;
}

I've tried running on Ideone and got the result

C                 = 0.999999999999999777955395074969
nextafter(1.0, 0) = 0.999999999999999888977697537484

Edit:

0.999999999999999777955395074969 is 0x1.ffffffffffffep-1 which is also 1 - DBL_EPSILON. That aligns with Sneftel's proof above

Snooperscope answered 22/8, 2018 at 16:26 Comment(6)

isn't it missing outer loop for any A? – Tagmeme 22/8, 2018 at 16:31

@Tagmeme no. Why do you need to loop anything with A? It's value is already fixed in the function. The first line will calculate the value right before 1.0 and if that's still large it'll decrease the value until the largest appropriate C has been found – Snooperscope 22/8, 2018 at 16:33

@Snooperscope I think he means that A is itself not a constant, and so you need to loop over possible values of A. – Renown 22/8, 2018 at 16:34

@ChristianGibbons that's not possible. There's no single constant C that works for all As. Each C only works for a range of values – Snooperscope 22/8, 2018 at 16:35

I'm lookin for universal constant for all A values (almost 2^63 samples), the "outer loop" question was a bit sarcastic, my apologies :) – Tagmeme 22/8, 2018 at 16:37

hey, is it too hard to leave a comment if something was wrong with this before downvoting? – Snooperscope 23/8, 2018 at 1:8

Edit:

Recommended topics

Hot tags