How can I generate an order number with similar results as Amazon when they do it?
Asked Answered
R

2

8

Note: I have already read through older questions like What is the best format for a customer number, order number? , however my question is a little more specific.

Generating pseudo-random numbers encounter the "birthday problem" before long. For example, if I am using a 27-bit field for my order number, after 15000 entries, the chances of collision increase to 50%.

I am wondering whether large ecommerce businesses like Amazon generates its order number in any other way - for example :

  • pre-generate the entire set and pick from them randomly (a few hundred GB of database)
  • Use lexicographical "next_permutation" starting from a particular seed number
  • MD5 or SHA-1 hash of the date, user-id, etc parameters, truncated to 14 digits
  • etc

All I want is a non-repeating integer (doesnt need to be very random except to obfuscate total number of orders) of a certain width. Any ideas on how this can be achieved ?

Renny answered 22/8, 2012 at 4:54 Comment(11)
Are you asking how Amazon does it or how you can do it?Excommunication
for something this big you need a mathematician and/or an engineer, you can't expect to solve a problem like this in a Q&A section.Uhl
"if I am using a 27-bit field for my order number, after 15000 entries, the chances of collision increase to 50%." Use a longer number then ... 27 bits are not even five characters.Shadow
@Excommunication - I'm asking how I can do it. Amazon was the closest example I saw. I'm not looking to reverse engineer their algo. Thilo - well, I used it to illustrate that one needs to be a bit more careful with random numbers. I just want to understand how one can be more rigorous about the generation methodology.Renny
@Sandeep: If you query the database before inserting a new order number, your chances of a collision are zero. If your task is to reduce the number of database operations, you can create a pool of order numbers within your database and pop one out of the queue when you need one.Excommunication
@Excommunication - I know a DB will ensure primary key uniqueness. But remember that this is for a transactional system. I would strongly prefer to have a deterministic latency for order creation. If I have even 10% database insert fails because of collisions, it would be suboptimalRenny
10% is a very high estimate. If your hashing algorithm outputs 14 essentially random alphanumeric characters, the chance of a collision is 1/36^14, which is definitely smaller than 0.1. The only way for you to make an algorithm like this is either to query the database to check for a collision every time you generate a new order number, or make a secret bijection that maps order numbers to the actual transaction IDs.Excommunication
Blender - that is incorrect. Please check the "birthday problem" or the "birthday attack". The percentage of collision is much higher.Renny
To avoid collisions, I'd imagine that their internal order # (1,2,..., 25534435,...) is mapped 1-to-1 to a 17 digit order number that they display to customers.Calefacient
Did you get any solution for this question ? I am also curious to know.Seaborg
@stom not really - would appreciate it if you end up finding something and posting here.Renny
H
5

Suggest starting with the date in reverse format then starting at 1, followed by a check (or random) digit. If you are likely to never exceed 100 orders per day you need add two digits plus a check/random digit.

The year need include only the final two digits, possibly only the final digit, depending on how long you keep records of orders: 7 years or so is usually enough, meaning the records from 2009 (beginning with 9) could be deleted during 2018 in preparation to use the order numbers again in 2019. You could use mmdd for the next 4 digits, or simply number the days through the year and use just 3 digits - it depends how human-friendly you want the number to be. It's also possible just to omit the day of the month and restart the sequential numbers at the start of each month, rather than every day.

Today is 2 Nov 2017, let's suppose this is order no 16 today, your order no would be 71102168 (where the 8 is a check digit or random digit). If you're likely to have up to, but not exceeding a thousand, you'll need an extra digit, thus: 711020168. To avoid limiting yourself the number of digits, you might prefer to use a hyphen: 71102-168 … you could include another hyphen before the check/random digit if you wish: 71102-16-8.

If you have several areas dealing with orders, you may wish to include a depot number, perhaps at the beginning or after the date, allowing you to use the sequence numbers at each depot - eg depot 5 might be: 5-71102-168, 71102-5-168 or 711025168. Again, if you don't use hyphens, you'll need to assess whether you need up to ten, a hundred or a thousand (etc) possible depot numbers. I hope this helps!

Hyperbaric answered 2/11, 2017 at 11:59 Comment(0)
S
0

This problem has been solved, why not use the UUID. See RFC 4122. These are close enough to globally unique you can easily combine many systems and never ever have a duplicate just because the number space is so massive.

Structure answered 7/6, 2022 at 22:6 Comment(1)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Jeth

© 2022 - 2024 — McMap. All rights reserved.