Stocking large numbers into numpy array [duplicate]

Asked 17/5, 2016 at 9:3 Answered 17/5, 2016 at 9:50

I have a dataset on which I'm trying to apply some arithmetical method. The thing is it gives me relatively large numbers, and when I do it with numpy, they're stocked as 0.

The weird thing is, when I compute the numbers appart, they have an int value, they only become zeros when I compute them using numpy.

x = np.array([18,30,31,31,15])
10*150**x[0]/x[0]
Out[1]:36298069767006890

vector = 10*150**x/x
vector
Out[2]: array([0, 0, 0, 0, 0])

I have off course checked their types:

type(10*150**x[0]/x[0]) == type(vector[0])
Out[3]:True

How can I compute this large numbers using numpy without seeing them turned into zeros?

Note that if we remove the factor 10 at the beggining the problem slitghly changes (but I think it might be a similar reason):

x = np.array([18,30,31,31,15])
150**x[0]/x[0]
Out[4]:311075541538526549

vector = 150**x/x
vector
Out[5]: array([-329406144173384851, -230584300921369396, 224960293581823801,
   -224960293581823801, -368934881474191033])

The negative numbers indicate the largest numbers of the int64 type in python as been crossed don't they?

Brosine answered 17/5, 2016 at 9:3 Comment(2)

Could you use floating point numbers np.array([18.0, 30, 31, 31, 15]) instead of int? – Benedict 17/5, 2016 at 9:17

No, do not use float values. They may appear to work but their precision will be horrible at those value ranges. Your computations work but the result is wrong (and you don't notice). – Terena 17/5, 2016 at 9:21

As Nils Werner already mentioned, numpy's native ctypes cannot save numbers that large, but python itself can since the int objects use an arbitrary length implementation. So what you can do is tell numpy not to convert the numbers to ctypes but use the python objects instead. This will be slower, but it will work.

In [14]: x = np.array([18,30,31,31,15], dtype=object)

In [15]: 150**x
Out[15]: 
array([1477891880035400390625000000000000000000L,
       191751059232884086668491363525390625000000000000000000000000000000L,
       28762658884932613000273704528808593750000000000000000000000000000000L,
       28762658884932613000273704528808593750000000000000000000000000000000L,
       437893890380859375000000000000000L], dtype=object)

In this case the numpy array will not store the numbers themselves but references to the corresponding int objects. When you perform arithmetic operations they won't be performed on the numpy array but on the objects behind the references.
I think you're still able to use most of the numpy functions with this workaround but they will definitely be a lot slower than usual.

But that's what you get when you're dealing with numbers that large :D
Maybe somewhere out there is a library that can deal with this issue a little better.

Just for completeness, if precision is not an issue, you can also use floats:

In [19]: x = np.array([18,30,31,31,15], dtype=np.float64)

In [20]: 150**x
Out[20]: 
array([  1.47789188e+39,   1.91751059e+65,   2.87626589e+67,
         2.87626589e+67,   4.37893890e+32])

Faretheewell answered 17/5, 2016 at 9:50 Comment(3)

Interesting approach to use a numpy.array(dtype=object). Will keep that in mind. – Terena 17/5, 2016 at 10:22

The dtype=object option seems like a good solution in general. In my case it might be a little more difficult since I then have to apply scipy.special functions such as psi (digamma function) which works on numpy.array but not with the dtype=object option. – Brosine 17/5, 2016 at 10:57

In general you can't count on numpy math operations to work with dtype=object. The fast operations use compiled code - code that works with various standard numeric datatypes. But with object, the array actually contains pointers - to objects else where in memory. In effect such an array is a glorified list (or debased one?). – Cosmos 17/5, 2016 at 23:59

150 ** 28 is way beyond what an int64 variable can represent (it's in the ballpark of 8e60 while the maximum possible value of an unsigned int64 is roughly 18e18).

Python may be using an arbitrary length integer implementation, but NumPy doesn't.

As you deduced correctly, negative numbers are a symptom of an int overflow.

Terena answered 17/5, 2016 at 9:12 Comment(2)

Then is there a way to give another length integer implementation to numpy? I could compute the numbers one after another using vanilla python but that would be very long, and I'd really rather avoid this. – Brosine 17/5, 2016 at 9:29

And it seems weird that the type of 150**x[0]/x[0] is showed as numpy.int64 if vanilla python doesn't use the same length integer implementation. Would it mean that it makes the computation in some type and then store it in another? – Brosine 17/5, 2016 at 9:32

Recommended topics

Hot tags