Python base 36 encoding
Asked Answered
N

10

54

How can I encode an integer with base 36 in Python and then decode it again?

Neveda answered 25/7, 2009 at 11:32 Comment(1)
possible duplicate of How to convert an integer to the shortest url-safe string in Python?Philanthropy
J
53

Have you tried Wikipedia's sample code?

def base36encode(number, alphabet='0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
    """Converts an integer to a base36 string."""
    if not isinstance(number, int):
        raise TypeError('number must be an integer')
 
    base36 = ''
    sign = ''
 
    if number < 0:
        sign = '-'
        number = -number
 
    if 0 <= number < len(alphabet):
        return sign + alphabet[number]
 
    while number != 0:
        number, i = divmod(number, len(alphabet))
        base36 = alphabet[i] + base36
 
    return sign + base36
 
def base36decode(number):
    return int(number, 36)
 
print(base36encode(1412823931503067241))
print(base36decode('AQF8AA0006EH'))
Julenejulep answered 25/7, 2009 at 11:35 Comment(7)
Christ if they can do str->int in any base, you'd think they'd let you do int->str in any base with a builtin...Parliamentarian
to make it even more pythonic, add import of string and replace alphabet value with string.digits+string.lowercaseAquila
interface between base36encode and base36decode is broken, the latter will fail (possibly silently) to decode anything encoded with custom alphabet argumentHemimorphite
The encoding function allows the user to specify an alphabet, while the decoding function does not, therefore the decoding function is not a true inverse of the encoding function as it relies on the default alphabet.Stein
For Python3 this solution will encounter errors as it uses the type long which is not supported in Python3. You can simply remove the long type from the function call above or see @André C. Andersen solution below.Hypothesis
If this sample code came from wikipedia please provide a link to the articleGilford
It seems that in these 12 years that has passed, the example has been removed from Wikipedia and I can not provide a link to it. That is way the answer was edited and the URL removed. Wayback Machine has stored a variant of it though: web.archive.org/web/20090805171144/http://en.wikipedia.org:80/…Julenejulep
N
39

I wish I had read this before. Here is the answer:

def base36encode(number):
    if not isinstance(number, (int, long)):
        raise TypeError('number must be an integer')
    is_negative = number < 0
    number = abs(number)

    alphabet, base36 = ['0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ', '']

    while number:
        number, i = divmod(number, 36)
        base36 = alphabet[i] + base36
    if is_negative:
        base36 = '-' + base36

    return base36 or alphabet[0]


def base36decode(number):
    return int(number, 36)

print(base36encode(1412823931503067241))
print(base36decode('AQF8AA0006EH'))
assert(base36decode(base36encode(-9223372036721928027)) == -9223372036721928027)
Neveda answered 25/7, 2009 at 11:36 Comment(3)
For including lowercase alphabet see How to convert an integer to the shortest url-safe string in Python?Sweetening
@Tadeck: Because then you have to reverse base36 before you return it.Bb
@JohnY: My mistake, that would not be the same.Knowledgeable
B
38
from numpy import base_repr

num = base_repr(num, 36)
num = int(num, 36)

Here is information about numpy.base_repr.

Brenza answered 21/10, 2015 at 23:15 Comment(1)
I attempted to improve this answer by making it runnable with example data, and adding more documentation resources. The edit was for some reason rejected with the explanation that it was more suitable as its own answer. Thus I've added my improved version of your answer below: https://mcmap.net/q/334601/-python-base-36-encoding/…Orvah
E
28

You can use numpy's base_repr(...) for this.

import numpy as np

num = 2017

num = np.base_repr(num, 36)
print(num)  # 1K1

num = int(num, 36)
print(num)  # 2017

Here is some information about numpy, int(x, base=10), and np.base_repr(number, base=2, padding=0).

(This answer was originally submitted as an edit to @christopher-beland's answer, but was rejected in favor of its own answer.)

Esquivel answered 19/2, 2017 at 19:11 Comment(0)
N
16

You could use https://github.com/tonyseek/python-base36.

$ pip install base36

and then

>>> import base36
>>> assert base36.dumps(19930503) == 'bv6h3'
>>> assert base36.loads('bv6h3') == 19930503
Neufer answered 15/7, 2016 at 5:21 Comment(4)
This is the right answer. I don't know why everyone else wants to reinvent the wheel.Insurmountable
@MichaelScheper Because dependencies are hard. See leftpad. Copy and pasting a trivial function to a file that does what you want is sometimes better than adding a new external dependency.Overpowering
@Overpowering You could download third dependencies to your repository vendor or your private PyPI mirror (just like Golang projects). It may be better than just copy-paste code snippets, for separated test coverage and release plan.Neufer
@MichaelScheper maybe because python-base36 package didn't exist before the end of 2014?Dagney
M
11

terrible answer, but was just playing around with this an thought i'd share.

import string, math

int2base = lambda a, b: ''.join(
    [(string.digits +
      string.ascii_lowercase +
      string.ascii_uppercase)[(a // b ** i) % b]
     for i in range(int(math.log(a, b)), -1, -1)]
)

num = 1412823931503067241
test = int2base(num, 36)
test2 = int(test, 36)
print test2 == num
Moises answered 10/7, 2012 at 4:43 Comment(3)
I like this quite a bit, but perhaps I just have a weakness for shorter code.Semmes
math.log returns a limited-precision float, so round to 14 digits before truncating the fractional part. This avoids turning 5.999999999999999 into 5.0, for instance.Semmes
math.log() fails when a==0 and using it sucks anyway.Pink
D
8

I benchmarked the example encoders provided in answers to this question. On my Ubuntu 18.10 laptop, Python 3.7, Jupyter, the %%timeit magic command, and the integer 4242424242424242 as the input, I got these results:

  • Wikipedia's sample code: 4.87 µs ± 300 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
  • @mistero's base36encode(): 3.62 µs ± 44.2 ns per loop
  • @user1036542's int2base: 10 µs ± 400 ns per loop (after fixing py37 compatibility)
  • @mbarkhau's int_to_base36(): 3.83 µs ± 28.8 ns per loop

All timings were mean ± std. dev. of 7 runs, 100000 loops each.

Update on 2023-04-14:

I wanted to try out perfpy.com, and here are my results for https://perfpy.com/288: enter image description here

Dyeline answered 27/4, 2019 at 16:53 Comment(2)
I love how this answer is both not an answer to this question, but an answer to this question :DProficient
You can use perfpy so that we can see what you see.Whitethroat
R
6

If you are feeling functional

def b36_encode(i):
    if i < 0: return "-" + b36_encode(-i)
    if i < 36: return "0123456789abcdefghijklmnopqrstuvwxyz"[i]
    return b36_encode(i // 36) + b36_encode(i % 36)    

test

n = -919283471029384701938478
s = "-45p3wubacgd6s0fi"
assert int(s, base=36) == n
assert b36_encode(n) == s
Reputable answered 2/3, 2020 at 23:24 Comment(1)
If you want a one-liner: b36 = lambda n: "-" + b36(-n) if n < 0 else "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"[n] if n < 36 else b36(n // 36) + b36(n % 36) then use e.g. b36(-12345)Enwomb
O
5

This works if you only care about positive integers.

def int_to_base36(num):
    """Converts a positive integer into a base36 string."""
    assert num >= 0
    digits = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'

    res = ''
    while not res or num > 0:
        num, i = divmod(num, 36)
        res = digits[i] + res
    return res

To convert back to int, just use int(num, 36). For a conversion of arbitrary bases see https://gist.github.com/mbarkhau/1b918cb3b4a2bdaf841c

Overpowering answered 31/7, 2015 at 13:2 Comment(0)
R
0

Class that can encode and decode using an arbitrary alphabet (might be useful to someone):

class BaseAlphabet:
    alphabet = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'

    def __init__(self, alphabet=None) -> None:
        if alphabet:
            self.alphabet = alphabet.upper()
        self.len = len(self.alphabet)

    def encode(self, number):
        if not isinstance(number, int):
            raise TypeError('num must be an integer')

        result = []
        sign = ''

        if number < 0:
            sign = '-'
            number = -number

        while number:
            number, i = divmod(number, self.len)
            result.append(self.alphabet[i])

        result.reverse()
        return f'{sign}{"".join(result)}'

    def decode(self, value):
        sign = 1
        if value[0] == '-':
            value = value[1:]
            sign = -1

        number = 0
        for n, i in enumerate(value[::-1]):
            number = number + self.alphabet.index(i) * (self.len ** n)

        return number * sign

test:

b = BaseAlphabet('CBA')


def test(n):
    c = b.encode(n)
    print(n, c, b.decode(c))

test(100000)
test(111111)
test(-100000)
test(999999)
test(-93756210)

>> 100000 BACCACBBACB 100000
>> 111111 BABAABCACAC 111111
>> -100000 -BACCACBBACB -100000
>> 999999 BABAABCACACCC 999999
>> -93756210 -ACBBABCACAABCCCAC -93756210
Rik answered 11/3 at 12:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.