Encoding a numeric string into a shortened alphanumeric string, and back again
Asked Answered
B

4

12

Quick question. I'm trying to find or write an encoder in Python to shorten a string of numbers by using upper and lower case letters. The numeric strings look something like this:

20120425161608678259146181504021022591461815040210220120425161608667

The length is always the same.

My initial thought was to write some simple encoder to utilize upper and lower case letters and numbers to shorten this string into something that looks more like this:

a26Dkd38JK

That was completely arbitrary, just trying to be as clear as possible. I'm certain that there is a really slick way to do this, probably already built in. Maybe this is an embarrassing question to even be asking.

Also, I need to be able to take the shortened string and convert it back to the longer numeric value. Should I write something and post the code, or is this a one line built in function of Python that I should already know about?

Thanks!

Beset answered 26/4, 2012 at 1:32 Comment(9)
base64Kaleena
JBernardo- nailed it. Post it as an answer :)!!Harmonist
It's a good thing you're offering to write the code yourself and post it here.Shirashirah
(Actually, just using base64 on its own actually lengthens the string)Harmonist
JBernardo - Thanks for the suggestion but as David points out, using base64 is not a solution to my problem as it actually lengthens the string significantly. The objective here is to shorten it.Beset
base64 provides a fast tool to convert binary strings. You have just to adapt your numbers (like joining each 2 digits in a byte). You will reduce about 30% of the string sizeKaleena
(Yes, that's what nightcracker does in his answer below)Harmonist
possible duplicate of How to convert an integer to the shortest url-safe string in Python?Tarrasa
https://mcmap.net/q/334601/-python-base-36-encodingSaucy
G
11

This is a pretty good compression:

import base64

def num_to_alpha(num):
    num = hex(num)[2:].rstrip("L")

    if len(num) % 2:
        num = "0" + num

    return base64.b64encode(num.decode('hex'))

It first turns the integer into a bytestring and then base64 encodes it. Here's the decoder:

def alpha_to_num(alpha):
    num_bytes = base64.b64decode(alpha)
    return int(num_bytes.encode('hex'), 16)

Example:

>>> num_to_alpha(20120425161608678259146181504021022591461815040210220120425161608667)
'vw4LUVm4Ea3fMnoTkHzNOlP6Z7eUAkHNdZjN2w=='
>>> alpha_to_num('vw4LUVm4Ea3fMnoTkHzNOlP6Z7eUAkHNdZjN2w==')
20120425161608678259146181504021022591461815040210220120425161608667
Granuloma answered 26/4, 2012 at 1:38 Comment(6)
This looks like it works great. Exactly what I was looking for, thanks. RE: int vs. string: Passing a string to this function actually doesn't work. It does need to be an int. Good work!Beset
This requires arbitrary-precision integers, which fortunately python has.Giovannagiovanni
@ninjagecko: I don't see why this should require arbitrary-precision integers. It works with them - sure - but there is no part of the given functions that relies on arbitrary precision integers.Granuloma
@nightcracker: Yes there is. The fact that you take an integer as input as reason enough to require arbitrary-precision integers. You can test this yourself by trying to do this in another language such as javascript without arbitrary-precision integers. Sufficiently large inputs will be meaningless. This is not to say this makes the answer unreasonable in other languages; the answer will work for any input which does not overflow. Not usually a concern, but the OP was using a string of digits which would overflow in most non-python languages.Giovannagiovanni
@ninjagecko: ah now I see what you mean.Granuloma
I know I'm late. Still, I like the answer and with Python3 it's easier to use base64.b64encode(n.to_bytes(N)) where N must be big enough.Walcoff
T
10

There are two functions that are custom (not based on base64), but produce shorter output:

chrs = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = len(chrs)

def int_to_cust(i):
    result = ''
    while i:
        result = chrs[i % l] + result
        i = i // l
    if not result:
        result = chrs[0]
    return result

def cust_to_int(s):
    result = 0
    for char in s:
        result = result * l + chrs.find(char)
    return result

And the results are:

>>> int_to_cust(20120425161608678259146181504021022591461815040210220120425161608667)
'9F9mFGkji7k6QFRACqLwuonnoj9SqPrs3G3fRx'
>>> cust_to_int('9F9mFGkji7k6QFRACqLwuonnoj9SqPrs3G3fRx')
20120425161608678259146181504021022591461815040210220120425161608667L

You can also shorten the generated string, if you add other characters to the chrs variable.

Tarrasa answered 26/4, 2012 at 2:4 Comment(3)
I guess you're doing the base 64 encoding yourself instead of using the lib.Venezuela
@PaulHoang: I guess your guess is wrong. I presented the function that works in a similar manner, but 1) without the need for padding (try to remove =s from nightcracker's answer), 2) with ability to define your own characters used for representing the converted value. The way it works is very similar to base64, but it is not base64. There is probably a library to do something like that, but did not find which one is it.Tarrasa
I really like this solution. In playing around with the various suggested solutions, I like this the best because I can restrict it to just letters and numbers and have control over adding more characters in the future. Nicely done.Beset
B
2

Do it with 'class':

VALID_CHRS = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
BASE = len(VALID_CHRS)
MAP_CHRS = {k: v
            for k, v in zip(VALID_CHRS, range(BASE + 1))}


class TinyNum:
    """Compact number representation in alphanumeric characters."""

    def __init__(self, n):
        result = ''
        while n:
            result = VALID_CHRS[n % BASE] + result
            n //= BASE
        if not result:
            result = VALID_CHRS[0]
        self.num = result

    def to_int(self):
        """Return the number as an int."""
        result = 0
        for char in self.num:
            result = result * BASE + MAP_CHRS[char]
        return result

Sample usage:

>> n = 4590823745
>> tn = TinyNum(a)
>> print(n)
4590823745
>> print(tn.num)
50GCYh
print(tn.to_int())
4590823745

(Based on Tadeck's answer.)

Beeck answered 18/11, 2019 at 15:42 Comment(0)
F
-1
>>> s="20120425161608678259146181504021022591461815040210220120425161608667"
>>> import base64, zlib
>>> base64.b64encode(zlib.compress(s))
'eJxly8ENACAMA7GVclGblv0X4434WrKFVW5CtJl1HyosrZKRf3hL5gLVZA2b'
>>> zlib.decompress(base64.b64decode(_))
'20120425161608678259146181504021022591461815040210220120425161608667'

so zlib isn't real smart at compressing strings of digits :(

Fluvial answered 26/4, 2012 at 1:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.