TL;DR
Use
f2 = f2*np.max(f1)-np.min(f1)*(f2-1) # f2 is now between min(f1) and max(f1)
and make sure you're using double precision, compare floating point numbers by looking at absolute or relative differences, avoid rounding for adjusting (or comparing) floating point numbers, and don't set the underlying components of floating point numbers manually.
Details
This isn't a very easy error to reproduce, as you have discovered. However, working with floating numbers is subject to error. E.g., adding together 1 000 000 000 + 0 . 000 000 000 1
gives 1 000 000 000 . 000 000 000 1
, but this is too many significant figures even for double precision (which supports around 15 significant figures), so the trailing decimal is dropped. Moreover, some "short" numbers can't be represented exactly, as noted in @Kevin's answer. See, e.g., here, for more. (Search for something like "floating point truncation roundoff error" for even more.)
Here's an example which does demonstrate a problem:
import numpy as np
numpy.set_printoptions(precision=16)
dtype=np.float32
f1 = np.linspace(-1000, 0.001, 3, dtype=dtype)
f2 = np.linspace(0, 1, 3, dtype=dtype)
f2 = (f2-np.min(f2))/(np.max(f2)-np.min(f2)) # f2 is now between 0.0 and 1.0
f2 = f2*(np.max(f1)-np.min(f1)) + np.min(f1) # f2 is now between min(f1) and max(f1)
print (f1)
print (f2)
output
[ -1.0000000000000000e+03 -4.9999951171875000e+02 1.0000000474974513e-03]
[ -1.0000000000000000e+03 -4.9999951171875000e+02 9.7656250000000000e-04]
Following @Mark Dickinson's comment, I have used 32 bit floating point. This is consistent with the error you reported, a relative error of around 10^-7, around the 7th significant figure
In: (5.0230602 - 5.0230593) / 5.0230593
Out: 1.791736760621852e-07
Going to dtype=np.float64
makes things better but it still isn't perfect. The program above then gives
[ -1.0000000000000000e+03 -4.9999950000000001e+02 1.0000000000000000e-03]
[ -1.0000000000000000e+03 -4.9999950000000001e+02 9.9999999997635314e-04]
This isn't perfect, but is generally close enough. When comparing floating point numbers you almost never want to use strict equality because of the possibility of small errors as noted above. Instead subtract one number from the other and check the absolute difference is less than some tolerance, and/or look at the relative error. See, e.g., numpy.isclose
.
Returning to your problem, it seems like it should be possible to do better. After all, f2
has the range 0 to 1, so you should be able to replicate the maximum in f1
. The problem comes in the line
f2 = f2*(np.max(f1)-np.min(f1)) + np.min(f1) # f2 is now between min(f1) and max(f1)
because when an element of f2
is 1 you're doing a lot more to it than just multiplying 1 by the max of f1
, leading to the possibility of floating point arithmetic errors occurring. Notice that you can multiply out the brackets f2*(np.max(f1)-np.min(f1))
to f2*np.max(f1) - f2*np.min(f1)
, and then factorize the resulting - f2*np.min(f1) + np.min(f1)
to np.min(f1)*(f2-1)
giving
f2 = f2*np.max(f1)-np.min(f1)*(f2-1) # f2 is now between min(f1) and max(f1)
So when an element of f2
is 1, we have 1*np.max(f1) - np.min(f1)*0
. Conversely when an element of f2
is 0, we have 0*np.max(f1) - np.min(f1)*1
. The numbers 1 and 0 can be exactly represented so there should be no errors.
The modified program outputs
[ -1.0000000000000000e+03 -4.9999950000000001e+02 1.0000000000000000e-03]
[ -1.0000000000000000e+03 -4.9999950000000001e+02 1.0000000000000000e-03]
i.e. as desired.
Nevertheless I would still strongly recommend only using inexact floating point comparison (with tight bounds if you need) unless you have a very good reason not to do so. There are all sorts of subtle errors that can occur in floating point arithmetic and the easiest way to avoid them is never to use exact comparison.
An alternative approach to that given above, that might be preferable, would be to rescale both arrays to between 0 and 1. This might be the most suitable form to use within the program. (And both arrays could be multiplied by a scaling factor such the original range of f1
, if necessary.)
Re using rounding to solve your problem, I would not recommend this. The problem with rounding -- apart from the fact that it unnecessary reduces the accuracy of your data -- is that numbers that are very close can round in different directions. E.g.
f1 = np.array([1.000049])
f2 = np.array([1.000051])
print (f1)
print (f2)
scale = 10**(-np.floor(np.log10(np.max(f1))) + 4)
f2 = np.round(f2*scale)/scale
f1 = np.round(f1*scale)/scale
print (f1)
print (f2)
Output
[ 1.000049]
[ 1.000051]
[ 1.]
[ 1.0001]
This is related to the fact that although it's common to discuss numbers matching to so many significant figures, people don't actually compare them this way in the computer. You calculate the difference and then divide by the correct number (for a relative error).
Re mantissas and exponents, see math.frexp
and math.ldexp
, documented here. I would not recommend setting these yourself however (consider two numbers that are very close but have different exponents, for example -- do you really want to set the mantissa). Much better to just directly set the maximum of f2
explicitly to the maximum of f1
, if you want to ensure the numbers are exactly the same (and similarly for the minimum).
np.log10(x)
. With some dummy values I get ` np.log10(1.2312412) = 0.090343139501527295` andnp.log10(12.312412) = 1.0903431395015273
, butnp.log10(.12312412) = -0.90965686049847261
– Corr10**(np.log10(1.2312412)) = 1.2312411999999999
,10**(np.log10(1.2312412)+2) = 123.12412000000005
, and10**(np.log10(1.2312412)-2) = 0.012312411999999998
, meaning the error in the mantissa is pushed to the last digit. – Corrlog10
would have probably been a smarter idea than incrementingexp
. That's a good point. Could modify it toscale = 10**(np.floor(np.log10(mm))+4)
. Definitely another workaround. Still curious if I could makenp.max(f1) == np.max(f2)
without doing any rounding and losing precision. – Monophagousf1.max()/f2.max()
to be a power of10
(which is what the normal usage of mantissa would suggest). – Goodardf1.max() == f2.max()
without having to round to some number of significant digits. Mathematically speaking, they should be identical due to the equation. However, floating numbers gunk that up. – Monophagousf2 *= (np.max(f1)-np.min(f1)) + np.min(f1)
. Should this not bef2 = f2 * (np.max(f1)-np.min(f1)) + np.min(f1)
. I.e. you want to rescale by the range off1
, and then add the minimum off1
. It looks to me that as it stands you are multiplying your 0 to 1 range by the whole of the term on the right of the*=
. – Bondholder*=
, does the# 5.0230602 but I need 5.0230593
still hold? – Bondholderfloat32
). – Goodard