J's x-type variables: how are they stored internally?
Asked Answered
P

1

6

I'm coding some J bindings in Python (https://gist.github.com/Synthetica9/73def2ec09d6ac491c98). However, I've run across a problem in handling arbitrary-precicion integers: the output doesn't make any sense. It's something different everytime (but in the same general magnitude). The relevant piece of code:

def JTypes(desc, master):
    newdesc = [item.contents.value for item in desc]
    type = newdesc[0]
    if debug: print type
    rank = newdesc[1]
    shape = ct.c_int.from_address(newdesc[2]).value
    adress = newdesc[3]
    #string
    if type == 2:
        charlist = (ct.c_char.from_address(adress+i) for i in range(shape))
        return "".join((i.value for i in charlist))
    #integer
    if type == 4:
        return ct.c_int.from_address(adress).value
    #arb-price int
    if type == 64:
        return ct.c_int.from_address(adress).value

and

class J(object):
    def __init__(self):
        self.JDll = ct.cdll.LoadLibrary(os.path.join(jDir, "j.dll"))
        self.JProc = self.JDll.JInit()

    def __call__(self, code):
        #Exec code, I suppose.
        self.JDll.JDo(self.JProc, "tmp=:"+code)
        return JTypes(self.deepvar("tmp"),self)

Any help would be apreciated.

Palecek answered 11/6, 2014 at 9:44 Comment(0)
B
11

Short answer: J's extended precision integers are stored in base 10,000.

More specifically: A single extended integer is stored as an array of machine integers, each in the range [0,1e4). Thus, an array of extended integers is stored as a recursive data structure. The array of extended integers has type=64 ("extended integer"), and its elements, each itself (a pointer to) an array, have type=4 ("integer").

So, conceptually (using J notation), the array of large integers:

123456 7890123 456789012x

is stored as a nested array of machine integers, each less than 10,000:

   1e4 #.^:_1&.> 123456 7890123 456789012x
+-------+-------+-----------+
|12 3456|789 123|4 5678 9012|
+-------+-------+-----------+

So, to recover the original large numbers, you'd have to interpret these digits¹ in base 10,000:

   10000x #.&> 12 3456 ; 789 123 ; 4 5678 9012   
123456 7890123 456789012

The only other 'x-type variables' in J are rational numbers, which, unsurprisingly, are stored as pairs of extended precision integers (one for the numerator, the other for the denominator). So if you have an array whose header indicates type='rational' and count=3, its data segment will have 6 elements (2*3). Take these pairwise and you have your array of ratios.

If you're trying to build a complete J-Python interface, you'll also have to handle boxed and sparse arrays, which are similarly nested. You can learn a lot by inspecting the binary and hexadecimal representations of J nouns using the tools built in to J.

Oh, and if you're wondering why J stores bignums in base 10,000? It's because 10,000 is big enough to keep the nested arrays compact, and a power-of-10 representation makes it easy to format numbers in decimal.


¹ Take care to adjust for byte order (e.g. 4 5678 9012 may be represented in memory as 9012 5678 4).

Babette answered 12/6, 2014 at 2:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.