URL shortener with no database
Asked Answered
A

4

9

I'd like to write a URL shortener that doesn't have to use a database. Instead, to have as few moving parts as possible, the script would just create a unique hash for my URL based on an algorithm (like md5, except an md5 would be too long). I'm not really sure how I'd go about doing this. Any advice?

If it matters, I'd prefer to write this in Ruby.

Ananias answered 27/1, 2011 at 15:40 Comment(3)
Does not compute. A hash can't be resolved to the original content. :-) Additionally, trying to do this without a database or (better still) key/value store is a folly.Burgwell
My end goal is to have the simplest possible URL shortener. A key/value store sounds good. How would you recommend I go about doing that part of it?Ananias
I've added an answer with some links that will hopefully prove useful. (Don't really do Ruby, so I'm not sure what bindings there are out there for use with such things.) Incidentally, I've love to know why someone's voted to close this question - it's perfectly legitimate.Burgwell
B
1

Ultimately if you're after a short link, you simply need to generate a suitably legible unique code (try to omit similar letters/numbers such as zero and 'o', in case some poor bugger actually has to type it in) and associate that code with the original URL in some form of store.

Whilst I can understand why you don't want to use a database, in many ways it's the perfect form of storage, especially if you look at one of the dedicated key/value stores such as Cassandra, Redis, MongoDB, etc. (That said, a simple "traditional" SQL database may be an easy first step if you're in unfamiliar territory.)

Burgwell answered 27/1, 2011 at 15:55 Comment(0)
F
3

What you need, is a way to compress and decompress a String. Where the resulting compressed version is a string too. This is nearly impossible, because an URL is already very short. Encoding and lossless compression always add minimal overhead, which will result in a string that is larger than the original, for most URLS.

For very long URLs, however, it may work.

So, in the end, you will almost always need a lookup-table in storage (database).

Base64 is the most logical solution. On itself, however, Base64 encoding returns longer strings than the original, for short strings (which URL are, generally); due to the padding, mostly. So we'll also try with zlib, to compress the string.

require "uri"
require "base64"
require "zlib"

shortner_url = URI.parse("https://s.to")
long = "https://mcmap.net/q/1216267/-url-shortener-with-no-database"
url = URI.parse(long) 
stripped = url.host + url.path
stripped.length #=> 66

# Let's see that Base64 on its own does not shorten the url.
encoded = Base64.encode64(stripped)
encoded.length #=> 90

# So, using zlib. To compress.
compressed = Zlib::Deflate.deflate(stripped)
encoded = Base64.encode64(compressed)
encoded.length #=> 94 
# It became worse.

# Now, with a long url (they can be much longer even), in a oneliner; to simplify omit the stripping part:
long = "http://www.thelongestlistofthelongeststuffatthelongestdomainnameatlonglast.com/wearejustdoingthistobestupidnowsincethiscangoonforeverandeverandeverbutitstilllookskindaneatinthebrowsereventhoughitsabigwasteoftimeandenergyandhasnorealpointbutwehadtodoitanyways.html"
long.length #=> 263
Base64.encode64(Zlib::Deflate.deflate(long)).length #=> 228

# In order to turn this into a valid short URL, however, we need `urlsaf_encode64()`

shortner_url.path = "/" + Base64.urlsafe_encode64(Zlib::Deflate.deflate(long))
shorther_url.to_s #=> "https://s.to/eJxNjkEWwyAIRG-U7HsbElFpEPIE68vti6t2BcwbZn51v1_7PufcvCKrFDRnMtf8u81HzuA_IWkDEoGG4EtiMN9ObftE6Pgey0FSvK6gIx7GTUl0GsmJSz1Biqpk7fjBDpL-xjGcopKYWfWyiySBRBFJABw9UnB9xaWj1LDCQWUGAQYzBVLECPbyxFLBJDqA7-DxSJ5YIbkGnoM8Ex7bqjf-AiodbYM="
 shortner_url.to_s.length #=> 237 WE SAVED 26 characters!

Note on stripping: can remove 'https://'. A Real implementation would need to add a piece to the string, to determine https or http: '1'+result for https, '0'+result for http. Another "hack" would be to make the url-shortening service use http for http urls and https for https urls.

If you always have the same domain, you can disgard the domain part too.

If you have a lot of slashes, or other repeating characters such as a dash, the compression works better.

Fishing answered 31/7, 2019 at 18:27 Comment(0)
A
2

You could do this with several of the string manipulation tools available to transform a URL into something obscured however as you noted in your question the url's you get from doing this would be longer than is typical for a url shortener.

url's don't compress very well.

Aleedis answered 27/1, 2011 at 15:51 Comment(0)
B
1

Ultimately if you're after a short link, you simply need to generate a suitably legible unique code (try to omit similar letters/numbers such as zero and 'o', in case some poor bugger actually has to type it in) and associate that code with the original URL in some form of store.

Whilst I can understand why you don't want to use a database, in many ways it's the perfect form of storage, especially if you look at one of the dedicated key/value stores such as Cassandra, Redis, MongoDB, etc. (That said, a simple "traditional" SQL database may be an easy first step if you're in unfamiliar territory.)

Burgwell answered 27/1, 2011 at 15:55 Comment(0)
S
0

You won't be able to resolve the original URL from a hash code without looking it up in some kind of database.

About the only thing you can do without a database is compress the URL and then decompress it when you resolve the URL.

Strictly speaking, I guess you could just hash the URL. But of what possible value would that be if you are not able to resolve it back to the original URL?

Sutphin answered 27/1, 2011 at 15:48 Comment(2)
That's what I was going for but didn't say right: I want to compress and decompress the URL.Ananias
I think that's a bad idea. You can check out the many compression algorithms but many would be longer than the original URL when working with such a short input. Also, many require storing characters that are not valid within a URL. There's a very good reason all URL shorteners out there use a database.Sutphin

© 2022 - 2024 — McMap. All rights reserved.