What's the best way to generate a short hash string from a longer string
Asked Answered
P

2

6

I'm trying to create short non-colliding strings from longer strings in Ruby. What's the best way to do this? Base64 encode a MD5 hash?

This is the use case:

loop do
  key = short_hash("#{user_id}-#{timestamp}")
  break if $redis.setnx(key, "0")
end

I don't want key to be too long.

Petersburg answered 17/2, 2011 at 0:42 Comment(4)
There's a bunch of questions on this site about similar topics. Try searching for hashing topics. Here's one: https://mcmap.net/q/965341/-developing-a-url-shortener/…Banger
@Sugerman: That question is in Python.Burrton
What you might gather from the response in that (and other) threads if you read them is that the "best way" to do this is language agnostic. Choose your hashing algorithm first, and then worry about the language-specific implementation.Banger
Understood -- I looked through the site and couldn't find a recommendation for a hashing algorithm that produces "short" (less than, say, 12 characters) strings. I completely understand that collisions are possible with a short hash, but my loop would ensure that collisions get tried again.Petersburg
M
5

I often use a SHA has for this similar to the example you have. It's not guaranteed to be unique, but is usually good enough for most purposes:

require 'digest/sha1'
Digest::SHA1.hexdigest("#{user_id}-#{Time.now.to_i}-#{rand}")

The ruby UUID gem is another option.

But in your specific case since you're using redis, why not just use the redis INCR command? Then you can guarantee the uniqueness at least within your database. For example:

unique_key = $redis.incr('users:next')
Mullin answered 17/2, 2011 at 3:11 Comment(2)
Hmm I was thinking about using incr, but I need to store a value for the unique_key... I guess I could do uid = $r.incr('uids'); $r.set(uid, value)Petersburg
So I ended up going with incr -- but as for my original question, I was hoping to have a shorter hash than a Digest::SHA1.hexdigest. I guess I could use base64 encoding...Petersburg
M
4

You can use a hash function to create shorter strings that aren't likely to collide. However, the Pigeonhole principle guarantees that you will be able to find two longer strings that will hash to the same value.

To generate truly unique values, you may have to assign a sequential identification number. But this would also require that you keep track of which identification number you have associated with which input string.

Madriene answered 17/2, 2011 at 0:49 Comment(1)
Sorry, I forgot to mention that I will be checking for collisions and retrying. I just want to avoid "retries" as much as possible.Petersburg

© 2022 - 2024 — McMap. All rights reserved.