Engineer at the Heroku API team here: we went with the simplest approach to generate app names, which is basically what you suggested: keep arrays of adjectives and nouns in memory, pick an element from each at random and combine it with a random number from 1000 to 9999.
Not the most thrilling code I've written, but it's interesting to see what we had to do in order to scale this:
At first we were picking a name, trying to INSERT
and then rescuing the uniqueness constraint error to pick a different name. This worked fine while we had a large pool of names (and a not-so-large set of apps using them), but at a certain scale we started to notice a lot of collisions during name generation.
To make it more resilient we decided to pick several names and check which ones are still available with a single query. We obviously still need to check for errors and retry because of race conditions, but with so many apps in the table this is clearly more effective.
It also has the added benefit of providing an easy hook for us to get an alert if our name pool is low (eg: if 1/3 of the random names are taken, send an alert).
The first time we had issues with collisions we just radically increased the size of our name pool by going from 2 digits to 4. With 61 adjectives and 74 nouns this took us from ~400k to ~40mi names (61 * 74 * 8999
).
But by the time we were running 2 million apps we started receiving collision alerts again, and at a much higher rate than expected: About half of the names were colliding, what made no sense considering our pool size and amount of apps running.
The culprit as you might have guessed is that rand
is a pretty bad pseudorandom number generator. Picking random elements and numbers with SecureRandom
instead radically lowered the amount of collisions, making it match what we expected in first place.
With so much work going to scale this approach we had to ask whether there's a better way to generate names in first place. Some of the ideas discussed were:
Make the name generation a function of the application id. This would be much faster and avoid the issue with collisions entirely, but on the downside it would waste a lot of names with deleted apps (and damn, we have A LOT of apps being created and deleted shortly after as part of different integration tests).
Another option to make name generation deterministic is to have the pool of available names in the database. This would make it easy to do things like only reusing a name 2 weeks after the app was deleted.
Excited to see what we'll do next time the collision alert triggers!
Hope this helps anyone working on friendly name generation out there.