Base10 to base64 url shortening
Asked Answered
F

8

5

I'm coding an url shortener function for a project in which I'm learning php, here is the code (btw I suppose that global here is not a good thing to do :P):

$alphabet = array(1 => "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",
                "A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z",
                "0","1","2","3","4","5","6","7","8","9","_","-");

function shorten($id){
    global $alphabet;
    $shortenedId = "";
    while($id>0){
        $remainder = $id % 64;
        $id = $id / 64;     
        $shortenedId = $alphabet[$remainder].$shortenedId;
    }
    return $shortenedId;
}

The code is taken from this Wikipedia article and adapted to php. My problem is that when I pass a multiple of 64 to the function I get a wrong (for my purpose) result, for instance 128 returns b which is not correct, it should have been aaa, but that's too long for a 3-digit number.

Also I'm starting to think that there's something wrong in this code, if I pass 1'000'000'000'000 as $id I get nItOq... I feel it's wrong because a url shortening service like bit.ly returns a 6 number id if I use it, and I don't think that this algorithm is better than theirs.

So, two questions:

  • do you spot any bug in the above code?
  • how to manage 64-multiple ids? Do I have to just ignore them and pass to the next one?
Finnell answered 7/7, 2010 at 23:59 Comment(5)
You don't need to use global there (I don't see any reason for that?). In fact, it's not been recommended to use global variables in PHP for years (at least since first release of PHP5). Use dependency injection instead.Whitehurst
@Richard Knop: Without that the variable $alphabet was not accessible.Finnell
Oh well, I see now, the scope has changed because you're inside a function. But why not just pass the $alphabet as second parameter to the function? Globals are not recommended in PHP, really.Whitehurst
Mh, ok, I think I'll just put it in the function as nathan did.Finnell
At the time that this question was asked, we did not have a code review site. We do now, if anyone wants their code reviewedBicephalous
K
14

Just a couple of little tweaks needed, the main two were to make the the alphabet zero indexed rather than one-indexed, and to subtract the remainder from the id before dividing

function shorten($id)
{
    $alphabet = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-';
    $shortenedId = '';
    while($id>0) {
        $remainder = $id % 64;
        $id = ($id-$remainder) / 64;     
        $shortenedId = $alphabet{$remainder} . $shortenedId;
    };
    return $shortenedId;
}

and here's a further modified version which... well I just like

function shorten($id, $alphabet='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-')
{
    $base = strlen($alphabet);
    $short = '';
    while($id) {
        $id = ($id-($r=$id%$base))/$base;     
        $short = $alphabet{$r} . $short;
    };
    return $short;
}

EDIT: sorted concatenation to be the same as the OPs

Kicker answered 8/7, 2010 at 1:44 Comment(5)
Is doing this $shortenedId .= $alphabet{$remainder}; the same as doing $shortenedId = $alphabet[$remainder].$shortenedId;? In the second I'm sure that the new digit is appended before, is this also the case for the code you've written?Finnell
@Montecristo $shortenedId .= $alphabet{$remainder}; is the same thing as $shortenedId = $shortenedId . $alphabet{$remainder};Whitehurst
@Richard Knop, @Montecristo : thanks for the catch I've edited the functions to concatenate as per the originalKicker
Is it possible to unshort?Dalt
Yes, it is possible to unshort, with the answer that is here https://mcmap.net/q/373877/-base10-to-base64-url-shorteningHonourable
S
5

In case you're looking for the opposite function to take a base64 number and convert to base10, here's some PHP based off of the JavaScript in this answer: How to convert base64 to base10 in PHP?

function lengthen($id) {
    $alphabet='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-';

    $number=0;
    foreach(str_split($id) as $letter) {
        $number=($number*64) + strpos($alphabet,$letter);
    }
    return $number;
}
Salient answered 15/8, 2015 at 1:4 Comment(0)
S
2

How about this:

function shorten_int($id){
    $hex = base_convert(id, 10, 16);
    $base64 = base64_encode(pack('H*', $hex));
    //$base64 = str_replace("/", "_", $base64); // remove unsafe url chars
    //$base64 = str_replace("+", "-", $base64);
    //$base64 = rtrim($base64, '='); // Remove the padding "=="
    $replacePairs = array('/' => '_',
                          '+' => '-',
                          '=' => '');
    $base64 = strtr($base64, $replacePairs); // optimisation
    return $base64;
}
Sphenoid answered 7/8, 2015 at 11:21 Comment(1)
Thanks, i writed a little fix: https://mcmap.net/q/373877/-base10-to-base64-url-shorteningMandibular
W
1

By the way, check out the base_convert() function (http://php.net/manual/en/function.base-convert.php):

echo base_convert(1000000000, 10, 36);

36 is the longest base it can convert to, though. But in the comments section I found this:

function dec2any( $num, $base, $index=false ) {
    if (! $base ) {
        $base = strlen( $index );
    } else if (! $index ) {
        $index = substr( "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" ,0 ,$base );
    }
    $out = "";
    for ( $t = floor( log10( $num ) / log10( $base ) ); $t >= 0; $t-- ) {
        $a = floor( $num / pow( $base, $t ) );
        $out = $out . substr( $index, $a, 1 );
        $num = $num - ( $a * pow( $base, $t ) );
    }
    return $out;
}

echo dec2any(1000000000, 64, "_-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789");

Maybe it will help?

Whitehurst answered 8/7, 2010 at 10:17 Comment(1)
Thanks for posting this, yes I did find it, though I failed at understanding it... so I used another approach, I dislike having code I don't understand in my projects. :DFinnell
T
1

Paul Greg created some PHP code that converts from Base-10 to another base. This can be tested and the code downloaded here:

http://www.pgregg.com/projects/php/base_conversion/base_conversion.php

I'm using this approach to convert the database row id's to Base-64. Once these numbers have been shortened they can be used in the URL. [details]

Tumbleweed answered 30/3, 2012 at 21:47 Comment(0)
M
1

These two functions are very convenient, thanks to @malhal:

function shorten_int($id)
{
    $id=dechex($id);
    $id=strlen($id)%2===0?hex2bin($id):hex2bin('0'.$id);
    $id=base64_encode($id);
    $id=strtr($id, array('/'=>'_', '+'=>'-', '='=>''));
    return $id;
}

function unshorten_int($id)
{
    $id=strtr($id, array('-'=>'+', '_'=>'/'));
    $id=base64_decode($id);
    $id=bin2hex($id);
    return base_convert($id, 16, 10);
}

echo shorten_int(43121111)."\n";
echo unshorten_int(shorten_int(43121111))."\n";
Mandibular answered 31/10, 2018 at 11:38 Comment(0)
D
0

This is a variation of Nathans code to handle large integers greater than PHP_INT_MAX.

This uses the BC Maths Functions that should be built-in on Windows servers, but this needs to be enabled as an optional extension on Unix servers. This solution also requires a couple of custom BC functions to handle floor and round functions that I copied from the post by Alix Axel.

function shorten($value, $alphabet = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-') {
    $base = strlen($alphabet);
    $result = '';
    while ($value) {
        $mod = bcmod($value, $base);
        $value = bcfloor(bcdiv($value, $base));
        $result = $alphabet[$mod] . $result;
    }
    return $result;
  }

function lengthen($value, $alphabet = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-') {
    $base= strlen($alphabet);
    $result = '';
    for($i = 0, $limit = strlen($value); $i < $limit; $i++) {
        $result = bcadd(bcmul($base, $result), strpos($alphabet, $value[$i]));
    }
    return $result;
}

function bcceil($number) {
    if (strpos($number, '.') !== false) {
        if (preg_match("~\.[0]+$~", $number)) return bcround($number, 0);
        if ($number[0] != '-') return bcadd($number, 1, 0);
        return bcsub($number, 0, 0);
    }
    return $number;
}

function bcfloor($number) {
    if (strpos($number, '.') !== false) {
        if (preg_match("~\.[0]+$~", $number)) return bcround($number, 0);
        if ($number[0] != '-') return bcadd($number, 0, 0);
        return bcsub($number, 1, 0);
    }
    return $number;
}

function bcround($number, $precision = 0) {
    if (strpos($number, '.') !== false) {
        if ($number[0] != '-') return bcadd($number, '0.' . str_repeat('0', $precision) . '5', $precision);
        return bcsub($number, '0.' . str_repeat('0', $precision) . '5', $precision);
    }
    return $number;
}

Examples running PHP 5.6 on Windows (32 bit)

foreach ([0, 1, 9, 10, 115617, bcsub(PHP_INT_MAX, 1), PHP_INT_MAX, bcadd(PHP_INT_MAX, 1234567890)] as $value) {
    $short = shorten($value);
    $reversed = lengthen($short);
    print shorten($value) . " ($value)<br>";
    if ("$value" !== $reversed) {
        print 'ERROR REVERSING VALUE<br>';
    }
}

Outputs

0 (0)
1 (1)
9 (9)
a (10)
sex (115617)
1----_ (2147483646)
1----- (2147483647)
39Bwbh (3382051537)

If the ID is public, avoid using vowels in the string (115617 is shortened to sex for example). This would be the base 54 version that should provide safe words.

$alphabet = '0123456789bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ_-';
Dispeople answered 15/8, 2020 at 3:54 Comment(0)
H
-1

You can use the pack.

$int = 1129717211140920362;

$byte = pack('J*', $int);    
echo base64_encode($byte); //= D62P0WqzFCo=

It will result in D62P0WqzFCo=, it is correct, because the $int is an int64 and uses 64 bits. The Base64 uses 6 bits for each character, so they need ~11 characters.

To decode use:

$base64 = 'D62P0WqzFCo=';

$byte = base64_decode($base64);
echo unpack('J*',  $byte)[1]; //= 1129717211140920362

It will return 1129717211140920362. ;)


It was based in the answer on the Stackoverflow in Portuguese.

Hong answered 1/1, 2018 at 4:43 Comment(1)
@RafaelLima, why it's wrong? The "string" is a pack, which convert the integer (base10) to a string form. The base64 isn't enconding as string directly, it's actually encode as int64. Suppose you have a 1 as string you end up with MQ== instead of AAAAAAAAAAE=, which is the int64 in big ending order, so it takes a number and convert it correctly, in int64.Hong

© 2022 - 2024 — McMap. All rights reserved.