Algorithm for Gift Card Codes
Asked Answered
M

13

22

I recently posted this question about codes for a gift-card-like voucher that users can redeem online. I wanted to find the best tradeoff between large keyspace, low guessability, and human readability. Now that I'm into implementation I realize I've got another problem altogether, more of an algorithmic challenge.

Let's assume I adopt some code format - say 10 characters from A to Z for simplicity, and I start generating vouchers. What is the correct algorithm to do this?!

My first approach is to number all possible codes from 0 to 308,915,776, then start generating random numbers in that range. This obviously has a big problem though - I have to check my random number against all previously generated voucher codes and if it collides with an existing one I'll have to discard the code and try another. As the system accumulates more data it will slow down. At the extreme when there is only one code left it will be nearly impossible for the system to guess it correctly.

I could pre-generate all codes and shuffle them, then consume them in order. But this means I have to store many codes, and in fact my keyspace is bigger than the one i described, so we're talking about a very large amount of data. So that's also not too desirable.

So this leaves me with using the codes sequentially. I do not want guessable voucher codes though. The user who buys voucher "AAAAAAAAAY" should not have a good chance of getting another valid code if they type in "AAAAAAAAAZ".

I can shuffle my alphabet and my positions so that instead of

'ABCDEFGHIJKLMNOPQRSTUVWXYZ' i use

'LYFZTGKBNDRAPWEOXQHVJSUMIC'

and so that instead of positions

9 8 7 6 5 4 3 2 1 0 the positions are

1 8 0 7 5 4 3 9 2 6

Using this logic, given the code

LNWHDTECMA

the next code would be

LNEHDTECMA

This is definitely way less guessable. But they're still only one character off from each other, and given just two of these vouchers you would know which position is incrementing, and you would have a 90% chance of getting the next code in 24 guesses or less.

My "escape hatch" is to ditch all this and go with GUIDs. They have more characters than I wanted my users to have to type in, and contain similar characters like I/1 and O/0, but they magically make all of the above headaches go away. Still, I'm having fun thinking about this, maybe you are too. I'd love to hear some alternate suggestions. What have you got?

Thanks!

Mara answered 18/1, 2010 at 15:19 Comment(7)
"As the system accumulates more data it will slow down." Well, sort of. It might slow down with log N. If so, a check that took 10 msec when you had 1,000 cards will take 30 msec when you have a billion — hardly worth worrying about. Keep in mind that every time the gift card is used, you're going to incur exactly the same kind of lookup, and there's absolutely nothing you can do to avoid that.Tani
GUID's absolutely do not "magically" make collisions impossible - they are just long enough that collisions become so unlikely that it doesn't matter for nearly all purposes - and there are different types of GUID, some of which are quite predictable.Salts
M. Borgwardt, I was not suggesting that GUIDs actually use magic to make collisions impossible. Everybody knows they use pixie dust.Mara
Hmm which answer do I pick, the most technically correct, most creative, or the one I'm really going to use? Everybody gets upvotes anyhow! Thanks.Mara
Note that in your shuffled alphabet example, the two generated codes are only 1 letter different, and the two letters ("W" and "E") are right next to each other on a standard US keyboard! What I'm saying is, you might want to watch out for codes that could be mis-typed by a single character. Perhaps ensure that each code generated is at least 2 characters different from every other code. It will take time, but computers are pretty fast these days...Stepdame
I agree with M. Borgwardt, and thanks for bringing my attention to the probabilities involved with my keyspace. My real-life project, which uses a 12-character string with a 32-character alphabet, will allow me to generate a sufficiently large number of vouchers before the possibility of too many collisions becomes a problem. Thanks everybody for all the good ideas!Mara
Amazon says to recover unreadable gift card code: Solve the issue here: If the claim code on your Amazon.com Gift Card is unreadable, please contact us. You’ll need to provide 3 consecutive characters from anywhere in the claim code, other than the first 2 or last 4 characters. They also ask for the serial card #, looks like there are several layers to recovery although you don't need the card number to redeem just claim code.Visitation
S
15

The likelihood of two randomly generated code colliding is basically the same as a user guessing a valid code - and you cannot prevent users from guessing. So you must have a key space so much larger than the number of actually used codes that random collisions are extremely unlikely as well (though, thanks to the birthday paradox, probably not unlikely enough to ignore them completely, at least if you want your codes to be reasonably short), and checking against existing codes and re-generating in case of a collision is a perfectly viable strategy.

Salts answered 18/1, 2010 at 15:33 Comment(3)
+1 for storing and regenerating on collision. To get to a 50% chance of regenerating an individual key with just 10 uppercase letters, you'd have to have (26^10)/2 keys stored which is around 70.5 trillion, so the likelihood of having to regenerate so many times that it becomes infeasible is negligible. 90% of the time (made up number) you're not going to have to regenerate.Maxson
If you had to regenerate even 10% of the time, it would mean that a random guess has a 10% chance of hitting a valid code - you want that likelihood to be much lower. I'd say that making the code space 1000 times larger than the number of codes you could theoretically expect to need should be OK.Salts
That will be fun code to test. Once executed in a lifetime and then probably broken.Marrano
S
11

Use an N-bit serial number R, combined with an M-bit hash H of the concatenated pair (R, S) where S is some secret "salt" S which you do NOT publish. Then encode the pair (R,H) alphanumerically in any reversible way you like. If you like algorithms like MD5* or SHA, but the bit count is too high, then just take the M least significant bits of a standard hash algorithm.

You can verify easily: decode the alphanumeric encoding so you can see R and H. Then compute H' = hash(R+S) and verify that H = H'.

edit: R can be an incrementing serial number or random or whatever, just make sure you use each value not more than once.

*before someone says "MD5 is broken", let me remind you that the known weaknesses for MD5 are collision attacks, and not preimage attacks. Also, by using an unpublished, secret salt value, you deny an attacker the ability to test your security mechanism, unless he/she can guess the salt value. If you feel paranoid, pick two salt values Sprefix and Ssuffix, and calculate the hash of the concatenated triple (Sprefix,R,Ssuffix).

Scuppernong answered 18/1, 2010 at 17:19 Comment(2)
But what advantage does a cryptographic hash have over random numbers?Tani
simplicity! pseudorandom numbers used for this purpose may be unguesssable, but they would still need to be unique and verifiable.Scuppernong
I
5

Some random number generators have an interesting property: Used right they do not generate duplicate numbers in a long time. They produce something called a full cycle. Use one of the algorithms described there, seed it, and you will have many unique numbers,

Add a smart way to map digits to characters and you got your codes.

Illuse answered 18/1, 2010 at 15:27 Comment(12)
Be aware that this - in some specific circumstances - can pose a serious security risk. If an attacker manages to get his hands on many vouchers that he knows have been generated subsequentially he can obtain the codes for all the vouchers by "reverse engineering" (probably not the technical term for this) the pseudo-RNG.Uxorial
@Andreas - And if you use a salt value: voucher(previousVoucher + salt)?Marrano
t. jung - could you salt the value without losing the "full cycle" property of the algorithm?Mara
@Barry - It would not mess with the RNG if: voucher(previousVoucher, salt) = rng(previousVoucher) + salt. The question is if this makes the approach safer. If you use only the RNG and it and one voucher is known it is broken.Marrano
@Thomas is that any better than just adding some random junk to the end of the code and then stripping those digits off before checking it?Teresita
Martin: Yes. Checking the random junk is the important part because that's the part that an attacker, however clever, can't predict.Tani
@Martin - I would say yes. Adding and removing junk can be seen as part of the RNG algorithm. The approach has to work even when the RNG and vouchers are known. So you have to add a secret that is not known and will not be communicated. I would like to hear from someone if the salt approach is dumb idea.Marrano
Just use a cryptographically secure PRNG (en.wikipedia.org/wiki/…); don't listen to this salt nonsense. An example of a good generator would be SHA1-HMAC(LongSecretPassphrase, counter). Even our best super-computers have yet to find two SHA1 values that hash to the same thing, so you are unlikely to do it by accident generating a few hundred-thousand gift card codes.Kilovolt
See #1650685 for in implementation of the RNG idea I did a while back.Instruct
@BlueRaja that avoids the collision by having a hash space much larger than the gift-card number space. Which is what we are trying to avoid. Yes it's difficult to calculate what to add to a known file to generate a SHA1 collision, but if you simply start hashing consecutive numbers - you are back to the birthday paradoxTeresita
@Martin: As stated by Michael above, you must have a keyspace much larger than your gift-card space. As for the birthday paradox, that is a problem with or without a "salt": The birthday paradox states that you need to hash 2^80 SHA1 values before you have a 50% probability of having a collision. That's 1 hash every nanosecond for 33 million years. That's exactly the reason no one has even found two values that hash to the same thing.Kilovolt
But Sha1 means you need a 160bit result to code for a 3-4 digit gift card number - the OP wanted a way of having a series of much smaller (easier to type) codes without a collision. Thats why the RNG is a good solution.Teresita
M
4

I would say to use a "perfect hash" - http://en.wikipedia.org/wiki/Perfect_hash_function combined with a 4-digit random number...

So just increment your voucher code each time, then hash it, add a 4 digit random number and I would also add a check digit to the end (as Alix Axel suggested).

This would be very secure with no clashes - for example if someone worked out your hashing algorithm, they would also have to guess the 4-digit code at the end...

Machination answered 18/1, 2010 at 15:51 Comment(0)
E
4

Programming Pearls has several examples of algorithms to generate sets of random numbers, you should read it if you're interested in this kind of problem.

The book shows that if you generate m random numbers with value less than n, the simple approach of generating numbers and throwing out duplicates will generate no more than 2m random numbers if m < n / 2. Here it is, in C++:

void gensets(int m, int n)
{
    set<int> S;
    set<int>::iterator i;
    while (S.size() < m) {
        int t = bigrand() % n;
        S.insert(t);
    }
    for (i = S.begin(); i != S.end(); ++i)
        cout << *i << "\n";
}

Obviously, if you're worried about people guessing values, you will want m to be much less than n / 2.

There's even a set-based algorithm to generate m random numbers less than n with each value being equally likely, no duplicates, and a guarantee not to generate more than m random numbers:

void genfloyd(int m, int n)
{
    set<int> S;
    set<int>::iterator i;
    for (int j = n-m; j < n; j++) {
        int t = bigrand() % (j+1);
        if (S.find(t) == S.end())
            S.insert(t); // t not in S
        else
            S.insert(j); // t in S
    }
    for (i = S.begin(); i != S.end(); ++i)
        cout << *i << "\n";
}

The order of the numbers isn't random, though, so this is probably not a good choice for you.

Espino answered 18/1, 2010 at 15:57 Comment(0)
S
3

I read the whole comment and I found out something many people in other to protect use very clever and sophisticated means. the chances of getting a guess on my algorithm is 1/2600000 all you have to do is to change the salt prefix salt suffix after each generation

  • I chose a salt prefix of 4 numbers
  • and suffix of 4 numbers
  • then the main code is 9 numbers interchangeable
  • then using this format sprefix +random_numbers+ssuffix
  • I'll hash the format storing it into database immediately
  • the query can help to remove similar codes
  • and the suffix and prefix should be changed once you very printed 9! (362880) times.
Shortly answered 7/6, 2014 at 10:55 Comment(1)
Please format your answer to be readable; it's not bad, but currently all lumped in one wall of text.Frech
U
2

I answered the other question too :P

The best way is to generate one alphanumeric character at a time, randomly, until you have 8 of them. This will then be your voucher.

Ideally the best way would be to choose a sequence long enough so that you can safely assume if there will be any duplicates. Do note that, perhaps counter-intuitively, this happens more often than you think because of the Birthday problem.

For example, with 8 characters you have 1785793904896 possible combinations, but if you generate only 1,573,415 vouchers you will have a 50% chance to have a duplicate.

So, it all depends on how many you want to generate, and the maximum length of the code you're comfortable with. If you are generating many and you want to keep it short, you should save the ones you previously generated, and check against the database for duplicates.

Uxorial answered 18/1, 2010 at 15:24 Comment(6)
Hello again Andreas! I agree that for a sufficiently-sized keyspace the odds start to look pretty good. I guess I just have my computer scientist hat on right now and want a general purpose approach that is guaranteed to work - not just something that is likely to work given the current inputs.Mara
That's not completely true about GUIDs at least for v1 algorithm, unless you generate more than (I think) 4 per ms and assuming the machine that's generating it has a device with a valid MAC address you're guaranteed uniqueness due to the timestamp component of the GUID as well as the MAC address component (and then a few random things thrown in). Other versions may not have a strict guarantee and may indeed be close enough to guaranteed by the sheer size of the GUID pool.Maxson
If there are 1785793904896 possible combinations, then after generating about 1,573,415 at random, without testing for duplicates, the probability of collision reaches 0.5. See en.wikipedia.org/wiki/Birthday_problem .Tani
-1: The birthday paradox means that the probability of a collision increases dramatically with the number of used codes and starts approaching 50% somewhere around 1.3 million codes. GUIDs get around this by being just ridiculously long (too long to be practical for this purpose)Salts
@Davy8: Well, still, take a look here: en.wikipedia.org/wiki/…Uxorial
Jason/Micheal: you're right! I'll edit my question to reflect thisUxorial
T
2

This is a summary of the best bits of all the other answers. :)

You need to generate gift card numbers that are:

  • unique
  • unguessable

Random numbers are unguessable but not necessarily unique. The numbers produced by various algorithms are unique but guessable (the algorithm can be reverse-engineered). I don't know of a single algorithm that gives both properties, and because of the need to defy reverse engineering, it falls in the domain of cryptography. Non-experts, of course, shouldn't try to design cryptosystems.

Fortunately you don't have to get both properties from the same algorithm. Your gift card codes can consist of two parts: a part that is unique (generated using a linear congruential generator, perhaps, or modulo arithmetic, or even just an integer that you increment each time) and a part that is unguessable (just random numbers).

Tani answered 18/1, 2010 at 22:23 Comment(2)
You've noted "unique" and "unguessable" but you're missing the property of verifiability. Your solution would work, although it would require a database storage to map each unique ID to a single valid random number.Scuppernong
Jason S: You have to store the codes in a database anyway, to record the amount of the gift and the amount already spent.Tani
C
1

I think the best way to go is that suggested by Andreas. But my answer is about an interesting related discussion.

You want to generate a sequence of numbers that together form a permutation of S = {1, ..., MAX}. One way to do this is to take the elements of a cyclic group over S. For example, the numbers R = {x modulo p, x^2 modulo p, x^3 modulo p, ..., x^(p-1) modulo p} form a cyclic group over {1, ..., p-1}, provided p is a prime and x is coprime to p. So if you choose MAX as a prime number you do use this sequence.

You want a "tough-to-crack" sequence. A generator for the sufficiently-tough-to-crack sequence is called a pseudorandom generator (ofcourse you probably don't need that tough-to-crack). An example is the last digit of elements in R above, provided p is kept secret (am I correct?). But the answer by Andreas already uses a source of (pseudo-) random numbers, so cannot be called a pseudorandom generator.

If you are interested in pseudorandom generators, they are discussed in detail in volume 2 of Knuth's well-known book.

Coition answered 18/1, 2010 at 15:38 Comment(0)
S
1

Based on Jason Orendoff's answer, I put together an algorithm to generate gift card codes. Basically, it has two 40-bit numbers: one of them to assure it is unique and another to assure it is hard to guess.

  • the 40-bit random number part is enough for 1 in 2^40 chances of guessing;
  • the 40-bit sequential number part is enough for 34.8 years of uniqueness (assuming we generate one gift card per ms.)

The total 80-bit sequence is then converted to a 16-character string using Base32.

import java.security.SecureRandom;
import java.util.Random;
import java.util.concurrent.atomic.AtomicLong;

import org.apache.commons.codec.binary.Base32;

public class GiftCardUtil {

    private AtomicLong sequence;
    private Random random;

    public GiftCardUtil() {
        // 1325383200000L == 1 Jan 2012
        sequence = new AtomicLong(System.currentTimeMillis() - 1325383200000L);
        random = new SecureRandom();
    }

    public String generateCode() {
        System.out.println(sequence.get());
        byte[] id = new byte[10];
        longTo5ByteArray(sequence.incrementAndGet(), id);
        byte[] rnd = new byte[5];
        random.nextBytes(rnd);
        System.arraycopy(rnd, 0, id, 5, 5);
        return new Base32().encodeAsString(id);
    }

    private void longTo5ByteArray(long l, byte[] b) {
        b[0] = (byte) (l >>> 32);
        b[1] = (byte) (l >>> 24);
        b[2] = (byte) (l >>> 16);
        b[3] = (byte) (l >>> 8);
        b[4] = (byte) (l >>> 0);
    }
}
Salep answered 27/4, 2012 at 22:53 Comment(0)
B
1

What may work effectively is simply using the time of creation to your advantage. Say, last two digits of the year, the two digit month, the two digit day, the two digit hour, two digit minutes, two digit seconds, then carry the seconds out to, say, the microsecond. If further obfuscation is desired, have them prescrambled (e.g. MYmdshhdMmYs instead of YYMMddhmmss). Then change the base (to pentadecimal, perhaps) to turn away any guessing attempts further. This caries two major benefits: 1-Using the date, including the year, will destroy any duplication as the same time will not pass twice. Only after a hundred years is there any risk. The only concern is potentially having two created at the same microsecond, for which it would be a simple task to disallow the creation of more than one at a time. A millisecond delay would fix the problem.

2-Guessing will be very difficult. Not only is figuring out what base and order the numbers (and letters!) are in going to be a daunting task, but going out to the microsecond makes sequence largely irrelevant. Not mention how hard it would be for a customer to figure how what microsecond they bought at and how their clock matches up with yours.

The objection may be "Wait! That is 17 digits (YYMMDDhhmmss.sssss) but brought out to a larger base afterwards would diminish it. Going to base 36, using 10 numbers and 26 letters, means a 11 digit code would cover every possibility. If uppercase and lowercase are not interchangeable, the data can be compressed to the goal of 10 digits with zero problems.

Barretter answered 25/3, 2013 at 5:13 Comment(1)
Nichi, this is an interesting answer. My program will be generating vouchers in batches, so I might run off 1000 of them at once and their creation times will be very close to each other (probably at least a microsecond apart though!) If I ran the timecode through a Perfect Hash this could be a reasonable solution.Mara
J
0

Here is a though:

  • ID = each voucher has an unique (auto-incremented?) ID
  • CHECKSUM = apply N iterations of the Verhoeff or Luhn algorithm on the ID
  • VOUCHER = base convert the generated CHECKSUM from base 10 to base 36

See also this related SO question: Ideas to create a small (<10 digits), not (very) secure “hash”.


One simple way to make this method more secure is to use a non auto-incremented ID value, one option might be to use ID as the last 6 or 7 digits of a UNIX timestamp and calculate the checksum.

Jernigan answered 18/1, 2010 at 15:39 Comment(5)
I'm not familiar with those algorithms but i'll go read about them, thanks.Mara
So basically you depend on people trying to guess codes to not know N and which algorithm you're using? This is called "security through obscurity", and not good.Salts
@Michael: Indeed, but he stated he wanted "low guessability" not "no guessability at all".Jernigan
If you can figure out how to generate valid codes at will, it's not even guessing anymore - the security is simply broken. And all it takes is one person to figure it out (or know it, e.g. a disgruntled employee), and a few days later 10,000 people have downloaded a voucher generator...Salts
Using timestamps is even worse - no less predictable, but now you're generating duplicates all the time.Salts
T
0

I second the use of a cryptographic hash—taking bits from MD5 is very simple. To make things readable, I hit on the following idea: take a list of words, and use bits of the key to index a list of words. My word list is about 100 000 words, so about 16 bits per word, which for four words gives a 64-bit keyspace. The results are usually quite readable.

For example, the cryptographic signature of the preceding paragraph is

kamikaze's freshet mansion's expectorating

(My word list is tilted toward a larger keyspace; if you want shorter phrases, you have fewer words.)

If you have an MD5 library handy, this strategy is very easy to implement—I do it in about 40 lines of Lua.

Temptation answered 22/1, 2010 at 20:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.