How do short URLs services work?
Asked Answered
E

4

119

How do services like TinyURL or Metamark work?
Do they simply associate the tiny URL key with a [virtual?] web page which merely provide an "HTTP redirect" to the original URL? or is there more "magic" to it ?

[original wording] I often use URL shortening services like TinyURL, Metamark, and others, but every time I do, I wonder how these services work. Do they create a new file that will redirect to another page or do they use subdomains?

Extensible answered 13/10, 2009 at 19:20 Comment(4)
To rephrase the [non-]question: "How do services like TinyURL work? Do they simply associate the tiny URL key with a [virtual?] web page which merely provide an "HTTP redirect" to the orginal URL? Is this what you are asking ?Feisty
Do the shortened urls ever get expired? (i.e. the database entries for those URLs are removed from the servers)Coarsegrained
@thd: yes, but it can depend on (daily) hits, and the policy of short url services provider. They may also allow for never-expiry, some ask a membership for that.Sisneros
Possible duplicate of How does a URL Shortener work?Tropical
S
241

No, they don't use files. When you click on a link like that, an HTTP request is send to their server with the full URL, like http://bit.ly/duSk8wK (links to this question). They read the path part (here duSk8wK), which maps to their database. In the database, they find a description (sometimes), your name (sometimes) and the real URL. Then they issue a redirect, which is a HTTP 302 response and the target URL in the header.

This direct redirect is important. If you were to use files or first load HTML and then redirect, the browser would add TinyUrl to the history, which is not what you want. Also, the site that is redirected to will see the referrer (the site that you originally come from) as being the site the TinyUrl link is on (i.e., twitter.com, your own site, wherever the link is). This is just as important, so that site owners can see where people are coming from. This too, would not work if a page gets loaded that redirects.

PS: there are more types of redirect. HTTP 301 means: redirect permanent. If that would happen, the browser will not request the bit.ly or TinyUrl site anymore and those sites want to count the hits. That's why HTTP 302 is used, which is a temporary redirect. The browser will ask TinyUrl.com or bit.ly each time again, which makes it possible to count the hits for you (some tiny url services offer this).

Sisneros answered 13/10, 2009 at 19:50 Comment(11)
Considering it's just a map, a little light on the lifetime of each shortened url?Dash
Actually I think, Bit.ly uses HTTP 301 instead of 302 (the last I heard)Temper
Since bit.ly won't let you change where one of their URLs points to, 301 makes sense. No need to remember the bit.ly version and recheck it.Kibe
@KennyCason / @Joost Schuur: it is indeed HTTP 301 that is used, however, with a timestamp. This turns it into a Moved not Moved Permanently. This is a subtle difference. By adding the timestamp, the browser considers it should check whether the resource is changed or not when this timeout it reached. Others, like is.gd, use a normal 301 Moved Permanently and the browser doesn't need to re-check (but often will). Finally, services like url4.eu do not redirect at all, but show you an advertisement first. With the 301 the services can still count unique visitors, but not all hits.Sisneros
@abel do these services check for duplicates and assign the id to a url that has already been added? so if two people add google.com should it give back the id of abc? or is that just a smart feature?Nelrsa
@Steve: I've seen services where you can have an account to see what urls you matched how many hits you have etc. There you can also see that you can have multiple ids / short urls to the same link. However, that is not to say that other services can make a smarter algorithm and reuse the same id for the same url. But that would remove the possibility of monitoring hits per user, though.Sisneros
The example bitly URL is now a real one and actually redirects back to this question ;-) See bitly.com/duSk8wK+ for the info page.Musculature
I clicked the link from three different browsers and it consistently showed the same number of clicks. How does bit.ly know that these shouldn't be new 'click' counts?Euton
@Costa: there are many ways of counting clicks, it is possible that it keeps track of your network card ID, which is a way of tracking whether a request comes from the same computer. It is possible to fake or change that ID, it is not foolproof. Also, it is possible that it tracks or uses third-party cookies that have been set earlier and to the same user on different browsers, which you can check by clearing all session data and using an anonymous browser.Sisneros
If bitly uses 301 which is permanent redirection then how would it keep track of number of hits?Weese
@mad, see my comment above, it explains exactly that and also that 301 is not always permanent foreverSisneros
K
114

Others have answered how the redirects work but you should also know how they generate their tiny urls. You'll mistakenly hear that they create a hash of the URL in order to generate that unique code for the shortened URL. This is incorrect in most cases, they aren't using a hashing algorithm (where you could potentially have collisions).

Most of the popular URL shortening services simply take the ID in the database of the URL and then convert it to either Base 36 [a-z0-9] (case insensitive) or Base 62 (case sensitive).

A simplified example of a TinyURL Database Table:

ID       URL                           VisitCount
 1       www.google.com                        26
 2       www.stackoverflow.com               2048
 3       www.reddit.com                        64
...
 20103   www.digg.com                         201
 20104   www.4chan.com                         20

Web Frameworks that allow flexible routing make handling the incoming URL's really easy (Ruby, ASP.NET MVC, etc).

So, on your webserver you might have a route action that looks like (pseudo code):

Route: www.mytinyurl.com/{UrlID}
Route Action: RouteURL(UrlID);

Which routes any incoming request to your server that has any text after your domain www.mytinyurl.com to your associated method, RouteURL. It supplies the text that is passed in after the forward slash in your URL to that method.

So, lets say you requested: www.mytinyurl.com/fif

"fif" would then be passed to your method, RouteURL(String UrlID). RouteURL would then convert "fif" to its base10 equivalent, 20103, and a database request will be made to redirect to whatever URL is stored under the ID 20103 (in this case, www.digg.com). You would also increase the visit count for Digg by one before redirecting to the correct URL.

This is a really simplified example but you should be able to get the general idea.

Kibler answered 13/10, 2009 at 20:40 Comment(3)
Thanks for the nice explanation. So what happens when someone tries to create a short URL for an already existing long URL? Do they perform a full text search on the database? I do not think so as it will be too much time consuming. Hash or message digest based approach looks more practical.Deservedly
@PiyushKansal you could use the hash internally to do a O(1) lookup to find duplicates; and then route the existing tiny URL for that, or could choose to generate a new one. As far as I can tell, goo.gl reuses the tiny urls for the same URL; try this on your end for this page: Do you get this >> goo.gl/8gVb8X ?Snead
How do they handle url parameters? For example www.digg.com?filter=123Scrope
J
7

As an extension to @A Salcedo answer:

Some url shortening services (Tinyarro.ws) go to extreme by using Unicode (UTF-8) to encode characters in shortened url - which allows higher amount of websites before having to add additional symbol. Since most of UTF-8 is accepted for use ((IRI) RFC 3987 handled by most browsers) that bumps from 62 sites per symbol to ~1,112,064.

To put in perspective one can encode 1.2366863e+12 sites with 2 symbols (1,112,064*1,112,064) - in November 2009, shortened links on bit.ly were accessed 2.1 billion times (Around that time, bit.ly and TinyURL were the most widely used URL-shortening services.) which is ~600 times less than you can fit in just 2 symbols, so for full duration of existence of all url shortening services it should last another 20 years minimum till adding third symbol.

Johanajohanan answered 13/6, 2016 at 15:45 Comment(0)
C
7

In simple words, URL shortener maps an arbitrary long sequence of character ( original, long crappy url ) into a short and slick sequence of characters. This is nothing but Hashing, which is most commonly used to create lookup tables, HashMap, md5 Hash for cryptographic purposes etc.

To understand the URL-Shortening process I have created a demo project on GitHub and also a blog post. Do refer to this and let me know if it was helpful.

Blog Post : URL Shortening

Curt answered 23/10, 2016 at 4:33 Comment(1)
With Hashmap, won't we face potential collisions?Gehring

© 2022 - 2024 — McMap. All rights reserved.