How do languages such as Python overcome C's Integral data limits?
Asked Answered
D

5

2

While doing some random experimentation with a factorial program in C, Python and Scheme. I came across this fact:

In C, using 'unsigned long long' data type, the largest factorial I can print is of 65. which is '9223372036854775808' that is 19 digits as specified here.

In Python, I can find the factorial of a number as large as 999 which consists of a large number of digits, much more than 19.

How does CPython achieve this? Does it use a data type like 'octaword' ?

I might be missing some fundamental facts here. So, I would appreciate some insights and/or references to read. Thanks!

UPDATE: Thank you all for the explanation. Does that means, CPython is using the GNU Multi-precision library (or some other similar library)?

UPDATE 2: I am looking for Python's 'bignum' implementation in the sources. Where exactly it is? Its here at http://svn.python.org/view/python/trunk/Objects/longobject.c?view=markup. Thanks Baishampayan.

Digamma answered 15/5, 2009 at 7:26 Comment(3)
I guess a better question might be, how does Python use bignums without killing performance? Does it use 32bit ints and promote them into bignums when needed? Does this mean a check on every integral operation?Wentzel
CPython 2.x have int and long. int is like C's int, and long is bignum. And it will promote when needed. Since CPython 3.x, they merged as single type and have slower performance.Dumanian
@kcwu, just because they're one data type instead of two, it doesn't mean the performance is any worse.Excurved
O
9

It's called Arbitrary Precision Arithmetic. There's more here: http://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic

Osbourne answered 15/5, 2009 at 7:33 Comment(4)
Just to be clear, it is perfectly possible to do arbitrary precision arithmetic in pure C using libraries like GMP (gmplib.org). It's fairly simple to write a factorial program using it.Forklift
When did I say it wasn't possible?Osbourne
I didn't mean to imply you had said that.Forklift
And you actually didn't either! ;-)Stemma
F
6

Looking at the Python source code, it seems the long type (at least in pre-Python 3 code) is defined in longintrepr.h like this -

/* Long integer representation.
   The absolute value of a number is equal to
    SUM(for i=0 through abs(ob_size)-1) ob_digit[i] * 2**(SHIFT*i)
   Negative numbers are represented with ob_size < 0;
   zero is represented by ob_size == 0.
   In a normalized number, ob_digit[abs(ob_size)-1] (the most significant
   digit) is never zero.  Also, in all cases, for all valid i,
    0 <= ob_digit[i] <= MASK.
   The allocation function takes care of allocating extra memory
   so that ob_digit[0] ... ob_digit[abs(ob_size)-1] are actually available.

   CAUTION:  Generic code manipulating subtypes of PyVarObject has to
   aware that longs abuse  ob_size's sign bit.
*/

struct _longobject {
    PyObject_VAR_HEAD
    digit ob_digit[1];
};

The actual usable interface of the long type is then defined in longobject.h by creating a new type PyLongObject like this -

typedef struct _longobject PyLongObject;

And so on.

There is more stuff happening inside longobject.c, you can take a look at those for more details.

Frankiefrankincense answered 15/5, 2009 at 19:32 Comment(1)
Thanks a lot! longobject.c it is :) I just looked in Python/ sub-directory of the sources, missing the fact that its all Objects in Python !Digamma
A
4

Data types such as int in C are directly mapped (more or less) to the data types supported by the processor. So the limits on C's int are essentially the limits imposed by the processor hardware.

But one can implement one's own int data type entirely in software. You can for example use an array of digits as your underlying representation. May be like this:

class MyInt {
    private int [] digits;
    public MyInt(int noOfDigits) {
       digits = new int[noOfDigits];
    }
}

Once you do that you may use this class and store integers containing as many digits as you want, as long as you don't run out memory.

Perhaps Python is doing something like this inside its virtual machine. You may want to read this article on Arbitrary Precision Arithmetic to get the details.

Armet answered 15/5, 2009 at 7:34 Comment(2)
Frederick's right, but this code should be seen as a proof of concept, not a recommended design. Real arbitrary precision libraries like GMP (gmplib.org) or BigInteger operate more efficiently, and don't rely unnecessarily on base 10.Forklift
As an aside: on processors which have native support for BCD arithmetic, it is (or used to be) common to do bignums in base 10 to simplify the human interaction part of the code. 'Course, this requires assembly or a compiler that supports BCD types...Ananna
D
3

Not octaword. It implemented bignum structure to store arbitary-precision numbers.

Dumanian answered 15/5, 2009 at 7:32 Comment(0)
B
1

Python assigns to long integers (all ints in Python 3) just as much space as they need -- an array of "digits" (base being a power of 2) allocated as needed.

Bromism answered 15/5, 2009 at 7:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.