What's the best way to do base36 arithmetic in Perl?
Asked Answered
E

3

6

What's the best way to do base36 arithmetic in Perl?

To be more specific, I need to be able to do the following:

  • Operate on positive N-digit numbers in base 36 (e.g. digits are 0-9 A-Z)

    N is finite, say 9

  • Provide basic arithmetic, at the very least the following 3:

    • Addition (A+B)

    • Subtraction (A-B)

    • Whole division, e.g. floor(A/B).

    • Strictly speaking, I don't really need a base10 conversion ability - the numbers will 100% of time be in base36. So I'm quite OK if the solution does NOT implement conversion from base36 back to base10 and vice versa.

I don't much care whether the solution is brute-force "convert to base 10 and back" or converting to binary, or some more elegant approach "natively" performing baseN operations (as stated above, to/from base10 conversion is not a requirement). My only 3 considerations are:

  1. It fits the minimum specifications above

  2. It's "standard". Currently we're using and old homegrown module based on base10 conversion done by hand that is buggy and sucks.

    I'd much rather replace that with some commonly used CPAN solution instead of re-writing my own bicycle from scratch, but I'm perfectly capable of building it if no better standard possibility exists.

  3. It must be fast-ish (though not lightning fast). Something that takes 1 second to sum up 2 9-digit base36 numbers is worse than anything I can roll on my own :)

P.S. Just to provide some context in case people decide to solve my XY problem for me in addition to answering the technical question above :)

We have a fairly large tree (stored in DB as a bunch of edges), and we need to superimpose order on a subset of that tree. The tree dimentions are big both depth- and breadth- wise. The tree is VERY actively updated (inserts and deletes and branch moves).

This is currently done by having a second table with 3 columns: parent_vertex, child_vertex, local_order, where local_order is an 9-character string built of A-Z0-9 (e.g. base 36 number).

Additional considerations:

  • It is required that the local order is unique per child (and obviously unique per parent),

  • Any complete re-ordering of a parent is somewhat expensive, and thus the implementation is to try and assign - for a parent with X children - the orders which are somewhat evenly distributed between 0 and 36**10-1, so that almost no tree inserts result in a full re-ordering.

Endotoxin answered 19/4, 2010 at 21:3 Comment(4)
By the way, in case you'll tell me "But you can easily do this in SQL, why are you asking this as a Perl question", the answer is: I'd love an SQL-only solution!!! I just don't think it can be done in pure SQL with any degree of efficiency, and efficiency is important when dealing with SQL server used as a singleton resource by entire company :(Endotoxin
Also, I know about Math::BaseCnv and Math::BaseCalc - I don't know how stable/fast they are, thus my asking the SO communitty for what the best practices are. We don't have either of these installed, and installing a new CPAN module is a big deal with Software Engineering team requiring good business justification AND a sign that the module is stable.Endotoxin
Technically "convert to base 10 and back" is the same as converting to binary -- it's not like internally, the machine converts all integers to base-10 strings to perform math and then back again. Base 10 is only a concept used for rendering numbers for display, and nothing more.Victualage
Would a nested set representation (dev.mysql.com/tech-resources/articles/hierarchical-data.html) be practical? The insertions and grafts may mean the answer is "no", but this would shift the burden to the SQL server and simplify other operations that require using the order.Operose
H
12

What about Math::Base36?

Hindermost answered 19/4, 2010 at 21:11 Comment(2)
Thanks - I was so sure that "base36" is a weird thing never used for anything practical I didn't even consider searching for that term! Duh.Endotoxin
Always search first, ask later.Maffa
W
9

I am assuming that Perl core modules are OK?

How about using native (binary) integer math and convert from the base 36 result using POSIX::strtol()

There is HUGE variability in speed in the different methods to convert to/from base 36. Strtol is 80x faster than a Math::Base36:decode_base36 for example and the conversion subs that I have in the listing are 2 to 4X faster than Math::Base36. They also support any integer base up to 62. (easily extended by adding characters to the nums array.)

Here is a quick benchmark:

#!/usr/bin/perl
use POSIX;
use Math::BaseCnv;
use Math::Base36 ':all';
use Benchmark;

{
    my @nums = (0..9,'a'..'z','A'..'Z');
    $chr=join('',@nums);
    my %nums = map { $nums[$_] => $_ } 0..$#nums;

    sub to_base
    {
        my ($base, $n) = @_;
        return $nums[0] if $n == 0;
        return $nums[0] if $base > $#nums;
        my $str = ''; 
        while( $n > 0 )
        {
            $str = $nums[$n % $base] . $str;
            $n = int( $n / $base );
        }
        return $str;
    }

    sub fr_base
    {
        my ($base,$str) = @_;
        my $n = 0;

        return 0 if $str=~/[^$chr]/;

        foreach ($str =~ /[$chr]/g)
        {
            $n *= $base;
            $n += $nums{$_};
        }
        return $n;
    }
}

$base=36;   
$term=fr_base($base,"zzz");

for(0..$term) { push @numlist, to_base($base,$_); }

timethese(-10, {
        'to_base' => sub { for(0..$#numlist){ to_base($base,$_); }  },
        'encode_base36' => sub { for(0..$#numlist){ encode_base36($_); }  },
        'cnv->to 36' => sub { for(0..$#numlist){ cnv($_); }  },
        'decode_base36' => sub { foreach(@numlist){ decode_base36($_); }  }, 
        'fr_base' => sub { foreach(@numlist){ fr_base($base,$_); }  },
        'cnv->to decimal' => sub { foreach(@numlist){ cnv($_,$base,10); }  },
        'POSIX' => sub { foreach(@numlist){ POSIX::strtol($_,$base);}},
} );
Wreckful answered 19/4, 2010 at 21:44 Comment(1)
Now that you mention it, I remember perusing the standard library docs for C many moons ago and being pleasantly surprised that strol() handled base 36 numbers. So, now I didn't even think of looking in the POSIX library. Since POSIX is just a thin wrapper around the C, its not surprising that it is so fast. Good call.Hindermost
D
1

I would bet my money on converting to base10 and back.

If you dont have to do this very often and the numbers are not very large, that is the easiest (and thus least complex => least number of bugs) way to do it.

Of course, another way to do it is to also save the base10 number for computation purposes only, however, Im not sure if this is possible or has any advantage in your case

Despot answered 19/4, 2010 at 21:9 Comment(3)
Computers prefer binary or hex, but I think the point stands with that caveat. Convert to a native number, do your computation, then switch it back.Olvera
Hex is for humans? I prefer counting in decimal. Hex is just for a compact representation of decimals. Moreover, hex (=16) is a power of two, and power of two is not for humans in general. 2 hex nibbles = 1 byte, that is not by coincidence. ;)Despot
Hex is an easier way for humans to see groups of 8 bits. The computer doesn't care about hex at all, but humans don't read binary very well. I read hexdumps quite frequently. It's a lot easier to deal with characters (even Unicode) by looking at their hex representation rather than their decimal representation.Maffa

© 2022 - 2024 — McMap. All rights reserved.