How does a URL Shortener work? [closed]
Asked Answered
C

3

113

I wonder how a URL Shortener works, like how they extract the text from address bar and map it to correct URL, later redirect it. What programming language do they use? How do they maintain the history of the mapping? How do they ensure the uniqueness of the shortened url? How can a lay man unmap it without visiting the URL?

Chillon answered 31/12, 2010 at 23:51 Comment(2)
This was the exact question I had and I'm very disappointed to see that it's closed. It's not ambiguous or incomplete--he's very specific in what he's asking.Reaganreagen
@bmargulies I think question should not be closed , this was very helpful and even answers, also think 43 upvotes and you decided it to be closed..why??Holy
R
50

Wiki Is Your Friend

Basically, a website with a shorter name is used as a place holder, such as bit.ly.

Then, bit.ly generates a key for the user to provide, which is randomly generated to not repeat. With 35 character options and 8 or so values, do the math. That's a lot of possible keys. If a URL is equal to a previously existing key, I remember reading somewhere that they reuse keys as well.

They don't really use a specific programming language, they just use a simple URL redirect, which can be done with HTTP response status code 301, 302, 307 or 308, depending.

Replica answered 1/1, 2011 at 0:6 Comment(2)
The redirect is NOT done with HTML, it's done with HTTP Headers. (Status code 301 or 302, depending).Outrageous
They must be using Location header to redirect.Sociopath
F
33

URL shortners just generate a shortcode, map the target URL to the shortcode, and provide a new URL. Visiting the URL performs a database lookup with the shortcode as a key, and redirects you to the target URL. There is no algorithmic association between a shortened URL and a destination URL, so you can't "unmap" it without going through the URL shortener's systems.

You can do it with any programming language and data store. Code generation is trivial to ensure uniqueness as well; if you had an incrementing primary integer key, you could simply encode the key as base62 and serve that. Since codes are incremental in nature, you'll never have a conflict.

Fiorenza answered 1/1, 2011 at 0:1 Comment(0)
G
21

The process is pretty simple actually: There a script that asks for the URL, generates a random string (and verifies that this string isn't already used), and puts the two in some kind of database. When you request a url, another script looks in the database for the random string, and if its found redirects you to the site.

This is of course more complicated in production due to needed features like abuse prevention, URL filtering, spam prevention, URL verification, etc. But those are pretty simple to implement.


The language is irrelevant, mostly any one will do.

Garvy answered 1/1, 2011 at 0:1 Comment(7)
"and verifies that this string isn't already used" .. HOW ? This is the biggest questionMescaline
@Stewie: SELECT * FROM mappings WHERE key = stringToCheck, and check if any rows returned? Or any similar thing in your database language of choice. Seems like the simplest part of the entire problem to be honest.Almond
@Mescaline Or they can use a HashMap to map <key, url>Allieallied
@DavidLiu Wouldn't one need to keep regenerating and issuing queries until it outputs no results ? As your data size grows, the time to check grows. In worst case scenario, the number of queries to your DB will be n-1 where n is the number of "strings" ; what happens when you have 100M strings ?Mescaline
@Mescaline That's a different question entirely, a problem with the principle of the solution itself. You're right that the regeneration problem would become an issue eventually, but again, that's why I said "verifying the string isn't already used" is the easy part of the solution. There are plenty of alternative solutions to do random without replacement.Almond
" is the easy part of the solution. " it may be "easy" to code, but its not salable at all. So, "verifying the string" remains as a the largest problem to solve. " There are plenty of alternative solutions to do random without replacement." -> please share examples of coupleMescaline
They don't need to issue an additional SQL select to the database. There is probably a unique key enabled on the database field. Additionally, you have to control the uniqueness of keys to make things faster zelark.github.io/nano-id-cc, ~2 days needed, in order to have a 1% probability of at least one collision.Alvinalvina

© 2022 - 2024 — McMap. All rights reserved.