Convert int to ASCII and back in Python
Asked Answered
J

6

191

I'm working on making a URL shortener for my site, and my current plan (I'm open to suggestions) is to use a node ID to generate the shortened URL. So, in theory, node 26 might be short.com/z, node 1 might be short.com/a, node 52 might be short.com/Z, and node 104 might be short.com/ZZ. When a user goes to that URL, I need to reverse the process (obviously).

I can think of some kludgy ways to go about this, but I'm guessing there are better ones. Any suggestions?

Joanniejoao answered 9/9, 2010 at 2:47 Comment(0)
P
333

ASCII to int:

ord('a')

gives 97

And back to a string:

  • in Python2: str(unichr(97))
  • in Python3: chr(97)

gives 'a'

Program answered 9/9, 2010 at 2:51 Comment(4)
and just chr() in python3!Wellread
chr words in the range of the ascii characters (0 - 255), however, unichr works for the unicode character set.Daybreak
Documented here.Gebler
"ord" stands for "ordinal" as explained here.Gebler
I
124
>>> ord("a")
97
>>> chr(97)
'a'
Indubitability answered 28/6, 2015 at 5:3 Comment(0)
M
14

If multiple characters are bound inside a single integer/long, as was my issue:

s = '0123456789'
nchars = len(s)
# string to int or long. Type depends on nchars
x = sum(ord(s[byte])<<8*(nchars-byte-1) for byte in range(nchars))
# int or long to string
''.join(chr((x>>8*(nchars-byte-1))&0xFF) for byte in range(nchars))

Yields '0123456789' and x = 227581098929683594426425L

Montez answered 28/2, 2017 at 0:38 Comment(1)
Thanks for asking. I'll grant it's slightly off of the use case in the OP, given that base64 or base58 encoding would be most applicable. I had arrived on this question based on the title, literally converting an integer to ascii text as if the integer has ascii encoded data embedded in its bytes. I posted this answer in the event others arrived here with the same desired outcome.Montez
T
8

What about BASE58 encoding the URL? Like for example flickr does.

# note the missing lowercase L and the zero etc.
BASE58 = '123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ' 
url = ''
while node_id >= 58:
    div, mod = divmod(node_id, 58)
    url = BASE58[mod] + url
    node_id = int(div)

return 'http://short.com/%s' % BASE58[node_id] + url

Turning that back into a number isn't a big deal either.

Tillis answered 9/9, 2010 at 3:53 Comment(1)
This is great. I ended up finding another (more complete) answer here on SO though: #1120222Joanniejoao
K
0

Use hex(id)[2:] and int(urlpart, 16). There are other options. base32 encoding your id could work as well, but I don't know that there's any library that does base32 encoding built into Python.

Apparently a base32 encoder was introduced in Python 2.4 with the base64 module. You might try using b32encode and b32decode. You should give True for both the casefold and map01 options to b32decode in case people write down your shortened URLs.

Actually, I take that back. I still think base32 encoding is a good idea, but that module is not useful for the case of URL shortening. You could look at the implementation in the module and make your own for this specific case. :-)

Karlykarlyn answered 9/9, 2010 at 2:50 Comment(0)
L
0

apparently I'm late to the party, jus like to share a snippet I use very often.

/**
 * 62 = 26 + 26 +10
 *
 * @param id
 * @return
 */
public String base62(long id) {
    StringBuilder sb = new StringBuilder();
    while (id >= 62) {
        int remainer = (int) (id % 62);
        id = id / 62;
        sb.append(index2char(remainer));
    }
    sb.append(index2char(id));

    return sb.reverse().toString();
}

public long reverseBase62(String s) {
    long r = 0;
    for (int i = 0; i < s.length(); i++) {
        r = r * 62;
        int index = char2index(s.charAt(i));
        if (index == -1) {
            throw new IllegalArgumentException(
                String.format("[%s] is in malformation, should only contain 0~9, a~z, A~Z", s));
        }
        r += index;
    }

    return r;
}
private char index2char(long index) {
    if (index < 10) {
        return (char) ('0' + index);
    }
    if (index < 36) {
        return (char) ('a' + index - 10);
    }
    return (char) ('A' + index - 36);
}



private int char2index(char c) {
    if ('0' <= c && c <= '9') {
        return c - '0';
    }
    if ('a' <= c && c <= 'z') {
        return c - 'a' + 10;
    }
    if ('A' <= c && c <= 'Z') {
        return c - 'A' + 36;
    }
    return -1;
}
Laird answered 26/8, 2022 at 7:29 Comment(2)
That's an interesting snippet you got there. What language is that? Because it's clearly not Python... :-DBaal
oh hi Daniil, it's in Java, sorry for the language gap but I guess the adaptation would be no hard work?Laird

© 2022 - 2024 — McMap. All rights reserved.