I found Stevens Computing Services – K & R Exercise 2-1 a very thorough answer to K&R 2-1. This slice of the full code computes the maximum value of a float
type in the C programming language.
Unluckily my theoretical comprehension of float
values is quite limited. I know they are composed of significand (mantissa.. ) and a magnitude which is a power of 2.
#include <stdio.h>
#include <limits.h>
#include <float.h>
main()
{
float flt_a, flt_b, flt_c, flt_r;
/* FLOAT */
printf("\nFLOAT MAX\n");
printf("<limits.h> %E ", FLT_MAX);
flt_a = 2.0;
flt_b = 1.0;
while (flt_a != flt_b) {
flt_m = flt_b; /* MAX POWER OF 2 IN MANTISSA */
flt_a = flt_b = flt_b * 2.0;
flt_a = flt_a + 1.0;
}
flt_m = flt_m + (flt_m - 1); /* MAX VALUE OF MANTISSA */
flt_a = flt_b = flt_c = flt_m;
while (flt_b == flt_c) {
flt_c = flt_a;
flt_a = flt_a * 2.0;
flt_b = flt_a / 2.0;
}
printf("COMPUTED %E\n", flt_c);
}
I understand that the latter part basically checks to which power of 2 it's possible to raise the significand with a three variable algorithm. What about the first part?
I can see that a progression of multiples of 2 should eventually determine the value of the significand, but I tried to trace a few small numbers to check how it should work and it failed to find the right values...
======================================================================
What are the concepts on which this program is based upon and does this program gets more precise as longer and non-integer numbers have to be found?
flt_m
computation is perfectly appropriate. It computes one less than twiceflt_m
, with care to avoid loss of precision. And there's nothing wrong with the style, given that it's written in K&R C. – Syncarpousflt_m + flt_m - 1
would be the nearest float to 2 *flt_m
- 1 for all values offlt_m
. Instead, the assignment is written this way so as to be self-documenting (“MAX VALUE OF MANTISSA”). – Prosserflt_m
) that has worse than unit precision. It is plausible that having such a value as one input, an adder might convert the other to the same scale before computing their sum. An implementation that did so (without increasing the precision) would compute a result different from the desired one. – Syncarpousy + y
is exact unless it overflows (in which casey + y - 1.0
is still the best approximation for the mathematical value unless the exponent range is non-standard). By contrast,y + (y - 1.0)
gets some results horribly wrong in the binade in which the ULP is 2 (though that's not the case in this program) – Prosserflt_m + flt_m
is exact even if computed in higher precision, so surelyflt_m + (flt_m - 1)
is not written this way “with care to avoid loss of precision”. Unless you are claiming that all bets are off with respect to floating-point arithmetic, but then what makes you think thatflt_m + (flt_m - 1)
works any better than any alternative? – Prosserflt_m + (flt_m - 1)
is not guaranteed to avoid precision loss -- it protects only against one plausible precision-loss scenario for non-IEEE arithmetic. It nevertheless does protect against precision loss in that scenario, so the code presented works correctly in more environments than code that uses2.0 * flt_m - 1.0
, even if it isn't guaranteed to work in every environment. That's still the best explanation I've considered for the author's intent in choosing that expression. I certainly don't buy self documentation, but I'd consider alternatives. – Syncarpous