Unexpected uint64 behaviour 0xFFFF'FFFF'FFFF'FFFF - 1 = 0?
Asked Answered
A

5

46

Consider the following brief numpy session showcasing uint64 data type

import numpy as np
 
a = np.zeros(1,np.uint64)
 
a
# array([0], dtype=uint64)
 
a[0] -= 1
a
# array([18446744073709551615], dtype=uint64)
# this is 0xffff ffff ffff ffff, as expected

a[0] -= 1
a
# array([0], dtype=uint64)
# what the heck?

I'm utterly confused by this last output.

I would expect 0xFFFF'FFFF'FFFF'FFFE.

What exactly is going on here?

My setup:

>>> sys.platform
'linux'
>>> sys.version
'3.10.5 (main, Jul 20 2022, 08:58:47) [GCC 7.5.0]'
>>> np.version.version
'1.23.1'
Acierate answered 30/4, 2023 at 16:50 Comment(11)
Interestingly I get a OverflowError: long too big to convert for the second decrement. What version of numpy is this, and in what environment?Nkrumah
Does it matter if you make the 1 an np array of uint64 type also?Serapis
Same as Dan, I get OverflowError: Python int too large to convert to C longDiscountenance
b = np.zeros(1,np.uint64); b[0] = 1; a - b give, as one may expect, array([18446744073709551614], dtype=uint64). Similarly a[0] -= np.uint64(1) also works. I suspect the -= is allowing a conversion to the right-hand side-value typeDiscountenance
I'm seeing the same issue as OP. Running 3.11 on MacOS. a[0] -= 1 leaves the value at zero, while a -= 1 sets the value to FF..FF.Spoilsman
I got some strange result as well, but not the same one: 0, 0xffffffffffffffff (good so far), then 0x8000000000000000 (???)Blende
@Serapis see comment by njzk2. numpy scalars like one = np.uint64(1) also work as expected, a[0] -= one even gives an overflow warning at the first subtraction.Acierate
Even a[0] -= 0 does the same for me.Condyloma
I got the same result as @harold. After a little testing, I found that a[0] = 1152921504606847360; a[0] -= 1 results in 1152921504606847488 ( it is increasing). Comparing the bit representations of these numbers, it appears that they are truncated at the upper 52 bits after subtraction. Since this is the same length as the float64 fractional bits, this is probably what is happening.Hydrobomb
@harold: Now that there are answers explaining that floating point conversion is involved, it makes me wonder if you're seeing hardware behaviour. Out-of-range FP to signed-int conversion on x86 produced 0x8000... (INT_MIN / LONG_MIN), the "indefinite integer" value (felixcloutier.com/x86/cvtsd2si). AVX-512 FP to unsigned int conversion produces 0xFFFF... for out-of-range inputs. (felixcloutier.com/x86/vcvtsd2usi).Pharmaceutical
@harold: IDK if either result could be explained that way, or if Python range-checks would come into play. Or if not actual AVX-512 hardware, then perhaps a Python compiled with MSVC's FP -> uint64 conversion routines which define the result of out-of-range conversions to match what AVX-512 hardware does: devblogs.microsoft.com/cppblog/… . The conversion that gives 0xffff... is a float with value -1 which is indeed out-of-range for uint64_t, but the 2nd conversion is also out-of-range for uint64_t.Pharmaceutical
R
41

By default, NumPy converts Python int objects to numpy.int_, a signed integer dtype corresponding to C long. (This decision was made back in the early days when Python int also corresponded to C long.)

There is no integer dtype big enough to hold all values of numpy.uint64 dtype and numpy.int_ dtype, so operations between numpy.uint64 scalars and Python int objects produce float64 results instead of integer results. (Operations between uint64 arrays and Python ints may behave differently, as the int is converted to a dtype based on its value in such operations, but a[0] is a scalar.)

Your first subtraction produces a float64 with value -1, and your second subtraction produces a float64 with value 2**64 (since float64 doesn't have enough precision to perform the subtraction exactly). Both of these values are out of range for uint64 dtype, so converting back to uint64 for the assignment to a[0] produces undefined behavior (inherited from C - NumPy just uses a C cast).

On your machine, this happened to produce wraparound behavior, so -1 wrapped around to 18446744073709551615 and 2**64 wrapped around to 0, but that's not a guarantee. You might see different behavior on other setups. People in the comments did see different behavior.

Rearm answered 1/5, 2023 at 2:1 Comment(13)
I am curious, as someone who only uses numpy occasionally, does the community feel this is a defect? This is incredibly surprising since most languages with explicitly-specified fixed with types have their behaviour defined similarly or identically to C.Pawsner
@Chuu: I dunno about the general community, but I'd personally prefer if mixing uint64 and signed integers produced uint64 output instead of float64 output. (C handles this specific case better, but it's got its own issues - for example, integer arithmetic on two unsigned operands can produce signed overflow and undefined behavior in C, due to how unsigned types smaller than int get promoted to signed int. I'm not sure if NumPy has any protections in place to avoid this issue.)Rearm
Isn’t inheriting UB from C rather worrying, given that the C standard allows a program that invokes UB to do literally anything? Admittedly, most (all?) of the fun stuff that has surfaced lately, like not generating any code at all for an execution path that always invokes UB, hinges on the UB being detected at compile time, which this won’t be, but still...Pantywaist
@TurePålsson: "Isn’t inheriting UB from C rather worrying" - absolutely! The behavior we're seeing here already sucks, but it could be arbitrarily worse some day, and if it does get worse, there's no telling how long the problem might pass unnoticed.Rearm
@TurePålsson: Yup. Fortunately real-world C implementations will produce a value for out-of-range FP to int conversions (at least when the the value being converted is a run-time variable that might be in range), Some compilers even go as far as officially defining the behaviour, such as MSVC: devblogs.microsoft.com/cppblog/… (article re: changing the out-of-range conversion result for FP-to-unsigned to match AVX-512 instructions which produce all-ones, 0xfff... in that case.)Pharmaceutical
@TurePålsson: A C implementation would be allowed to implement FP to integer conversions in a way that crashes on out-of-bounds floats, but I think most users would consider that undesirable. OTOH, current implementations for current CPUs would have to go out of their way to crash; the thing that's most efficient for well-defined cases will just product some value for out-of-range. Maybe not always the same value for every out-of-range input, especially not inputs that are in-range for int64_t vs. ones that aren't. (e.g. modern x86 without AVX-512 can only do FP to int64_t natively)Pharmaceutical
@PeterCordes: By the same argument, I think most users would view as undesirable code generation which would cause a function like unsigned mul_mod_65536(unsigned short x, unsigned short y) { return (x*y) & 0xFFFF; } to arbitrarily corrupt memory if x exceeds INT_MAX/y, but that doesn't mean gcc isn't designed to occasionally behave in such fashion anyhow.Alamode
The third paragraph implies that float64 doesn't have enough precision to subtract one from negative one (-1-1). Any floating point system can do this calculation exactly. Please clarify.Goblet
@Joooeey: float64 does not have enough precision to exactly subtract 1 from 2**64-1. We are not subtracting 1 from -1.Rearm
That doesn't clarify anything. Where does 2**64-1 come from and when is 1 subtracted from that number? Please edit your answer to make it more clear.Goblet
Isn't the float64 with value -1 converted to uint64 before the second subtraction?Goblet
Yes, and the conversion to uint64 produces undefined behavior, which in this case manifested as a result of 2**64-1. Then we try to subtract 1 from that.Rearm
Now I see that this is in the last paragraph. I see that the rationale for organizing your paragraphs is to separate numpy implementation details from environment implementation details but that makes your answer hard to follow.Goblet
C
13

a[0] - 1 is 1.8446744073709552e+19, a numpy.float64. That can't retain all the precision, so its value is 18446744073709551616=264. Which, when written back into a with dtype np.uint64, becomes 0.

Condyloma answered 30/4, 2023 at 18:17 Comment(7)
NumPy floating->integer conversion with a floating point value out of the integer type's range is undefined behavior (inherited from C), so there's no guarantee the result is actually 0 - it could be anything.Rearm
@Rearm Yeah I had actually written a second paragraph about that, also noting harold's different result, but then decided to keep this brief, just explaining the observed behavior of the OP.Condyloma
Doing what OP did produces a problem due to implicit type conversions. If you do a[0] -= np.uint64(1) you get a RuntimeWarning: overflow encountered in scalar subtract. Doing a[0] = a[0] - np.uint64(1) produces the expected result.Hydrostat
@Hydrostat No, I get no warning/error with that, and the number becomes 18446744073709551614. Demo.Condyloma
@KellyBundy Interesting. What version of Python and Numpy? I tried this using Python 3.11.0 and numpy 1.24.3.Hydrostat
@Hydrostat You can try it there yourself. 3.10.6 and 1.24.1.Condyloma
FWIW, SageMathCell gives the same result, both in Python & Sage modes sagecell.sagemath.org/…Supergalaxy
E
1

All the existing answers are correct. I just want to add on Windows 10 I got a different result, namely 9223372036854775808.

Steps to reproduce:

Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.13.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np

In [2]: a = np.zeros(1,np.uint64)

In [3]: a
Out[3]: array([0], dtype=uint64)

In [4]: a[0] -= 1

In [5]: a
Out[5]: array([18446744073709551615], dtype=uint64)

In [6]: a[0] - 1
Out[6]: 1.8446744073709552e+19

In [7]: a[0] - 1 == 2**64
Out[7]: True

In [8]: a[0] -= 1
<ipython-input-8-9ab639258820>:1: RuntimeWarning: invalid value encountered in cast
  a[0] -= 1

In [9]: a
Out[9]: array([9223372036854775808], dtype=uint64)

In [10]: f'{a[0]:b}'
Out[10]: '1000000000000000000000000000000000000000000000000000000000000000'

In [11]: len(_)
Out[11]: 64

In [12]: a[0] == 2**63
Out[12]: True

In [13]: a[0] - 1
Out[13]: 9.223372036854776e+18

In [14]: a[0] - 1 == 2 ** 63
Out[14]: True

In [15]: a[0] -= 1

In [16]: a[0]
Out[16]: 9223372036854775808

In [17]: np.version.version
Out[17]: '1.24.2'

In binary increment by one will change the last bit from zero to one and one to zero, and going from one to zero will change the bit before the last bit, this will keep carry to the left until the leftmost bit goes from zero to one.

In unit64 if you want to subtract one from zero, the number zero can't get any smaller so it is treated as 2^64, and subtract one from it you get 2^64-1, which in binary is '1'*64 and 18446744073709551615 in decimal.

In [6]: a[0] - 1
Out[6]: 1.8446744073709552e+19

In [7]: a[0] - 1 == 2**64
Out[7]: True

Then when the value is operated with a Python int it is converted to a float 1.8446744073709552e+19 which because of the limitation of the format, is actually 2^64.

In [8]: a[0] -= 1
<ipython-input-8-9ab639258820>:1: RuntimeWarning: invalid value encountered in cast
  a[0] -= 1

In [9]: a
Out[9]: array([9223372036854775808], dtype=uint64)

Now this gets interesting, the maximum value uint64 can hold is 2^64 - 1, because 2 ^ 64 is one followed by 64 zeros in binary, so it can't be presented as is in uint64, it is in this case converted to zero before the decrement, as the last 64 bits in 2^64 are zeros.

That's why there is an warning.

But when doing the calculation, somehow it is converted to signed int64, and then converted to uint64 again.

The calculated result is -1, when stored in signed int64 form, is '1'+'0'*63 because the leftmost bit is used for the sign, and the number is negative if the sign bit is set.

Because one bit is used for the sign the maximum value of int64 is 2^63-1 which is 9223372036854775807 in decimal.

When the number negative one in int64 is converted to uint64 it is treated as 2^63 which is 9223372036854775808 in decimal, because the number holds a numerical value of 2^63.

Then the number stays there no matter how many decrements I do, because when the operations happen the uint64 type is converted to a float, which has a value of 2^63, and decrement by one cannot change that value.

Epideictic answered 3/5, 2023 at 10:58 Comment(1)
You mention 2^65 a couple of times. You mean 2^64.Nobody
A
0

Possible workarounds

1. Explicit cast

a[0] -= np.uint64(1)

++

  • clean
  • fast

--

  • cumbersome

2. Fancy indexing

a[[0]] -= 1

+

  • easy to type

--

  • slow

3. Slice indexing

a[0:1] -= 1

-

  • mildly cumbersome
  • not the fastest
Acierate answered 30/4, 2023 at 16:50 Comment(0)
N
-2

The behavior you are seeing is due to how unsigned integer arithmetic works in numpy. When an unsigned integer is decremented, if the result is negative, it "wraps around" to the maximum value of the data type.

In your example, a[0] starts at the value 0xFFFFFFFFFFFFFFFF, which is the maximum value for a 64-bit unsigned integer. When you subtract 1 from it, the result is 0xFFFFFFFFFFFFFFFE, as you expected. However, when you subtract 1 from it again, the result is -1 (which is represented as 0xFFFFFFFFFFFFFFFF in binary). Since this value is negative, it wraps around to the maximum value of the data type, which is 0.

So, the behavior you are seeing is expected due to the properties of unsigned integer arithmetic. If you want to avoid this behavior, you can use a signed integer data type instead.

Nipa answered 5/5, 2023 at 10:34 Comment(1)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Fulgurating

© 2022 - 2024 — McMap. All rights reserved.