How to make unique short URL with Python?
Asked Answered
D

12

46

How can I make unique URL in Python a la https://i.stack.imgur.com/turb6.jpg or http://tumblr.com/xzh3bi25y When using uuid from python I get a very large one. I want something shorter for URLs.

Diner answered 30/9, 2009 at 11:16 Comment(0)
C
25

Edit: Here, I wrote a module for you. Use it. http://code.activestate.com/recipes/576918/


Counting up from 1 will guarantee short, unique URLS. /1, /2, /3 ... etc.

Adding uppercase and lowercase letters to your alphabet will give URLs like those in your question. And you're just counting in base-62 instead of base-10.

Now the only problem is that the URLs come consecutively. To fix that, read my answer to this question here:

Map incrementing integer range to six-digit base 26 max, but unpredictably

Basically the approach is to simply swap bits around in the incrementing value to give the appearance of randomness while maintaining determinism and guaranteeing that you don't have any collisions.

Centi answered 30/9, 2009 at 11:32 Comment(0)
S
17

I'm not sure most URL shorteners use a random string. My impression is they write the URL to a database, then use the integer ID of the new record as the short URL, encoded base 36 or 62 (letters+digits).

Python code to convert an int to a string in arbitrary bases is here.

Salient answered 30/9, 2009 at 11:24 Comment(0)
G
9

Python's short_url is awesome.

Here is an example:

import short_url

id = 20  # your object id
domain = 'mytiny.domain' 

shortened_url = "http://{}/{}".format(
                                     domain,
                                     short_url.encode_url(id)
                               )

And to decode the code:

decoded_id = short_url.decode_url(param)

That's it :)

Hope this will help.

Girgenti answered 9/11, 2017 at 9:41 Comment(0)
N
6

Hashids is an awesome tool for this.

Edit:

Here's how to use Hashids to generate a unique short URL with Python:

from hashids import Hashids

pk = 123 # Your object's id
domain = 'imgur.com' # Your domain

hashids = Hashids(salt='this is my salt', min_length=6)
link_id = hashids.encode(pk)
url = 'http://{domain}/{link_id}'.format(domain=domain, link_id=link_id)
Nonna answered 6/7, 2015 at 16:27 Comment(0)
I
5

This module will do what you want, guaranteeing that the string is globally unique (it is a UUID):

http://pypi.python.org/pypi/shortuuid/0.1

If you need something shorter, you should be able to truncate it to the desired length and still get something that will reasonably probably avoid clashes.

Invariant answered 9/1, 2011 at 17:41 Comment(0)
M
5

This answer comes pretty late but I stumbled upon this question when I was planning to create an URL shortener project. Now that I have implemented a fully functional URL shortener(source code at amitt001/pygmy) I am adding an answer here for others.

The basic principle behind any URL shortener is to get an int from long URL then use base62(base32, etc) encoding to convert this int to a more readable short URL.

How is this int generated?

Most of the URL shortener uses some auto-incrementing datastore to add URL to datastore and use the autoincrement id to get base62 encoding of int.

The sample base62 encoding from string program:

# Base-62 hash

import string
import time

_BASE = 62


class HashDigest:
    """Base base 62 hash library."""

    def __init__(self):
        self.base = string.ascii_letters + string.digits
        self.short_str = ''

    def encode(self, j):
        """Returns the repeated div mod of the number.
        :param j: int
        :return: list
        """
        if j == 0:
            return [j]
        r = []
        dividend = j
        while dividend > 0:
            dividend, remainder = divmod(dividend, _BASE)
            r.append(remainder)
        r = list(reversed(r))
        return r

    def shorten(self, i):
        """
        :param i:
        :return: str
        """
        self.short_str = ""
        encoded_list = self.encode(i)
        for val in encoded_list:
            self.short_str += self.base[val]
        return self.short_str

This is just the partial code showing base62 encoding. Check out the complete base62 encoding/decoding code at core/hashdigest.py

All the link in this answer are shortened from the project I created

Manchukuo answered 11/12, 2017 at 19:58 Comment(0)
S
2

The reason UUIDs are long is because they contain lots of information so that they can be guaranteed to be globally unique.

If you want something shorter, then you'll need to do something like generate a random string, checking whether it is in the universe of already generated strings, and repeating until you get an unused string. You'll also need to watch out for concurrency here (what if the same string gets generated by a separate process before you inserted into the set of strings?).

If you need some help generating random strings in Python, this other question might help.

Stormi answered 30/9, 2009 at 11:18 Comment(0)
C
1

It doesn't really matter that this is Python, but you just need a hash function that maps to the length you want. For example, maybe use MD5 and then take just the first n characters. You'll have to watch out for collisions in that case, though, so you might want to pick something a little more robust in terms of collision detection (like using primes to cycle through the space of hash strings).

Crapulous answered 30/9, 2009 at 11:21 Comment(0)
K
1

I don't know if you can use this, but we generate content objects in Zope that get unique numeric ids based on current time strings, in millis (eg, 1254298969501)

Maybe you can guess the rest. Using the recipe described here: How to convert an integer to the shortest url-safe string in Python?, we encode and decode the real id on the fly, with no need for storage. A 13-digit integer is reduced to 7 alphanumeric chars in base 62, for example.

To complete the implementation, we registered a short (xxx.yy) domain name, that decodes and does a 301 redirect for "not found" URLs,

If I was starting over, I would subtract the "starting-over" time (in millis) from the numeric id prior to encoding, then re-add it when decoding. Or else when generating the objects. Whatever. That would be way shorter..

Kowtow answered 30/9, 2009 at 12:9 Comment(0)
C
1

You can generate a N random string:

import string
import random

def short_random_string(N:int) -> str:

    return ''.join(random.SystemRandom().choice(
        string.ascii_letters + \
        string.digits) for _ in range(N)
    )

so,

print (short_random_string(10) )
#'G1ZRbouk2U'

all lowercase

print (short_random_string(10).lower() )
#'pljh6kp328'
Catholic answered 12/7, 2019 at 11:57 Comment(0)
S
0

Try this http://code.google.com/p/tiny4py/ ... It's still under development, but very useful!!

Supercilious answered 31/3, 2011 at 0:43 Comment(0)
T
0

My Goal: Generate a unique identifier of a specified fixed length consisting of the characters 0-9 and a-z. For example:

zcgst5od
9x2zgn0l
qa44sp0z
61vv1nl5
umpprkbt
ylg4lmcy
dec0lu1t
38mhd8i5
rx00yf0e
kc2qdc07

Here's my solution. (Adapted from this answer by kmkaplan.)

import random

class IDGenerator(object):
    ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyz"

    def __init__(self, length=8):
        self._alphabet_length = len(self.ALPHABET)
        self._id_length = length

    def _encode_int(self, n):
        # Adapted from:
        #   Source: https://mcmap.net/q/100621/-how-to-convert-an-integer-to-the-shortest-url-safe-string-in-python
        #   Author: https://stackoverflow.com/users/50902/kmkaplan

        encoded = ''
        while n > 0:
            n, r = divmod(n, self._alphabet_length)
            encoded = self.ALPHABET[r] + encoded
        return encoded

    def generate_id(self):
        """Generate an ID without leading zeros.

        For example, for an ID that is eight characters in length, the
        returned values will range from '10000000' to 'zzzzzzzz'.
        """

        start = self._alphabet_length**(self._id_length - 1)
        end = self._alphabet_length**self._id_length - 1
        return self._encode_int(random.randint(start, end))

if __name__ == "__main__":
    # Sample usage: Generate ten IDs each eight characters in length.
    idgen = IDGenerator(8)

    for i in range(10):
        print idgen.generate_id()
Taeniacide answered 21/1, 2015 at 20:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.