Accessing the highest digits of large numbers from Python long
Asked Answered
U

3

4

I'm working with numbers with tens of thousands of digits in python. The long type works beautifully in performing math on these numbers, however I'm unable to access the highest digits of these numbers in a sufficiently fast way. Note that I don't know exactly how many digits the number contains. The "highest digits" refers to the digits in the most significant place, the lowest digits can be accessed quickly using modulus.

I can think of two ways to access these digits in python but they're both too slow for my purposes. I have tried converting to a string and accessing digits through array methods, however type conversions are slow when you have 10,000+ digits. Alternatively I could simply mask out bits and truncate, but this requires that I know how many digits are in the long. Finding the number of digits in the long would require a loop over a counter and a mask test, this will surely be slower than string conversion.

From the description here it seems that the long type does in fact contain a bignum array. Is there some way I can access the underlying data structure that stores the long, or possibly check how many digits the long has from the base type?

If people are interested I can provide an example with benchmarks.

Uxmal answered 25/11, 2012 at 0:53 Comment(8)
If you know the order of magnitude, you could just divide by 10**(orderMag-1). An integer division will give you the most significant digitDistrustful
The number of digits is not known.Uxmal
Maybe access it in C using PyLong_AsVoidPtr, interesting thread gossamer-threads.com/lists/python/dev/…Chadwick
Do you need the decimal digits or just the bits?Charinile
Thanks for the link soulseekah, I'll look into it. DSM, bits is fine since its straightforward to convert bits to int.Uxmal
divide the number by a number with as many digits as it has. 1000....000Oneiric
@AdamCadien: then couldn't you use .bit_length() (Python 2.7+) to get the number of bits without looping and then rightshift downward?Charinile
More hardcore stuff: there's also _PyLong_AsByteArray it seems, github.com/schmir/python/blob/2.7/Include/longobject.h#L95, implementation here github.com/schmir/python/blob/2.7/Objects/longobject.c#L616 looks intriguingChadwick
O
9

A simple approach without digging on low level implementation of the long type:

>>> n = 17**987273 # 1.2 million digits number

>>> digits = int(math.log10(n))

>>> k = digits - 24 # i.e. first 24 digits

>>> n / (10 ** k)
9953043281569299242668853L

Runs quite fast on my machine. I tried to get the string representation of this number and it takes a huge time.

For Python 3.x, use n // (10 ** k)

Some timings with this big number (It is 140 times faster):

%timeit s = str(n)[:24]
1 loops, best of 3: 57.7 s per loop

%timeit n/10**(int(math.log10(n))-24)
1 loops, best of 3: 412 ms per loop


# With a 200K digits number (51x faster)

%timeit s = str(n)[:24]
1 loops, best of 3: 532 ms per loop

%timeit n/10**(int(math.log10(n))-24)
100 loops, best of 3: 10.4 ms per loop


# With a 20K digits number (19x faster)

%timeit s = str(n)[:24]
100 loops, best of 3: 5.4 ms per loop

%timeit n/10**(int(math.log10(n))-24)
1000 loops, best of 3: 272 us per loop
Oneiric answered 25/11, 2012 at 1:16 Comment(3)
A slightly modified solution bignum/10**(int(math.log10(bignum))-ndig). Its a great idea using log10 but I'm finding this is only slightly faster than str conversion, I still thinking accessing the bignum array would be fastest.Uxmal
@AdamCadien It is much faster on my timings. I added to the answer... BTW accessing bignum data will probably make your code not portable.Oneiric
I found inefficiencies in my code else-where that were causing the unexpected slow down. log10 solution works great!Uxmal
D
3

Python 2.7 has the bit_length() method on integers.

Droopy answered 25/11, 2012 at 1:19 Comment(0)
G
1

Here is a very ugly one liner that will extract the first few decimal digits:

(x >> (x.bit_length()-50))*(10**(math.fmod((x.bit_length()-50)*math.log(2)/math.log(10), 1)))

If your value for x is around 10,000 decimal digits long, you should get an answer accurate to around 12 digits or so. As x gets larger, your accuracy will decrease.

If you are willing to use external modules, I would look at gmpy2. The gmpy2 library provides access to the GMP (or MPIR) library for multiple-precision integer and fractional arithmetic, the MPFR library for multiple-precision floating point arithmetic, and the MPC library for multiple-precision complex arithmetic. gmpy2 integers are faster than Python's native longs and you can convert a long integer into a floating point number to extract just the leading digits. The above one liner just becomes:

gmpy2.mpfr(x).digits()[0]

The gmpy2 approach will retain accuracy even as the numbers become larger.

Disclaimer: I maintain gmpy2.

Goldwin answered 25/11, 2012 at 2:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.