How to generate random SHA1 hash to use as ID in node.js?
Asked Answered
A

6

157

I am using this line to generate a sha1 id for node.js:

crypto.createHash('sha1').digest('hex');

The problem is that it's returning the same id every time.

Is it possible to have it generate a random id each time so I can use it as a database document id?

Argot answered 23/2, 2012 at 5:51 Comment(4)
Don't use sha1. It is no longer considered secure (collision-resistant). This is why naomik's answer is better.Langue
@Niels Abildgaard it is trivial to check for duplicates after generation, and risk of collision is not necessarily relevant, you need to check for duplicates no matter what you use as the size of your database increases. "Secure" has nothing to do with cardinality.Tuckie
@Tuckie None of that is necessarily true: you have no idea what the IDs are used for, how items with IDs are created, which assumptions apply to them in the concrete application (and therefore if collision resistance is significant). And you don't need to check for duplicates anyway, that's the entire point of collision-resistant IDs, and why they are especially used in large and decentralized applications.Langue
@ Niels Abildgaard, OP just wants a balanced synthetic key. From thin air. Normally we would reverse a timestamp or hash the current count of rows. It only has to be well distributed and unique in the table, which the key will enforce. Another go will then be required. Collision resistance is good, but never absolute.Tuckie
B
69

Have a look here: How do I use node.js Crypto to create a HMAC-SHA1 hash? I'd create a hash of the current timestamp + a random number to ensure hash uniqueness:

var current_date = (new Date()).valueOf().toString();
var random = Math.random().toString();
crypto.createHash('sha1').update(current_date + random).digest('hex');
Brilliantine answered 23/2, 2012 at 6:28 Comment(3)
For a much better approach, see @naomik's answer below.Brilliantine
This was also a great answer Gabi, and just a tiny bit faster, about 15%. Great job both! I actually like to see a Date() in the salt, it gives a developer easy confidence that this will be unique value in all but the most insane parallel computing situations. I know its silly and randomBytes(20) is going to be unique, but its just a confidence we can have because we may not be familiar with internals of random generation of another library.Dramaturgy
@Dmitri R117 What is the benefit of salting a primary key? Unique values are not required to be guaranteed as you can detect duplicates upon insert. Some shops pre-allocate blocks. It is more important that the distribution be homogenous for a flatter index or equally sized partitions.Tuckie
G
707

243,583,606,221,817,150,598,111,409x more entropy

I'd recommend using crypto.randomBytes. It's not sha1, but for id purposes, it's quicker, and just as "random".

var id = crypto.randomBytes(20).toString('hex');
//=> f26d60305dae929ef8640a75e70dd78ab809cfe9

The resulting string will be twice as long as the random bytes you generate; each byte encoded to hex is 2 characters. 20 bytes will be 40 characters of hex.

Using 20 bytes, we have 256^20 or 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976 unique output values. This is identical to SHA1's 160-bit (20-byte) possible outputs.

Knowing this, it's not really meaningful for us to shasum our random bytes. It's like rolling a die twice but only accepting the second roll; no matter what, you have 6 possible outcomes each roll, so the first roll is sufficient.


Why is this better?

To understand why this is better, we first have to understand how hashing functions work. Hashing functions (including SHA1) will always generate the same output if the same input is given.

Say we want to generate IDs but our random input is generated by a coin toss. We have "heads" or "tails"

% echo -n "heads" | shasum
c25dda249cdece9d908cc33adcd16aa05e20290f  -

% echo -n "tails" | shasum
71ac9eed6a76a285ae035fe84a251d56ae9485a4  -

If "heads" comes up again, the SHA1 output will be the same as it was the first time

% echo -n "heads" | shasum
c25dda249cdece9d908cc33adcd16aa05e20290f  -

Ok, so a coin toss is not a great random ID generator because we only have 2 possible outputs.

If we use a standard 6-sided die, we have 6 possible inputs. Guess how many possible SHA1 outputs? 6!

input => (sha1) => output
1 => 356a192b7913b04c54574d18c28d46e6395428ab
2 => da4b9237bacccdf19c0760cab7aec4a8359010b0
3 => 77de68daecd823babbb58edb1c8e14d7106e83bb
4 => 1b6453892473a467d07372d45eb05abc2031647a
5 => ac3478d69a3c81fa62e60f5c3696165a4e5e6ac4
6 => c1dfd96eea8cc2b62785275bca38ac261256e278

It's easy to delude ourselves by thinking just because the output of our function looks very random, that it is very random.

We both agree that a coin toss or a 6-sided die would make a bad random id generator, because our possible SHA1 results (the value we use for the ID) are very few. But what if we use something that has a lot more outputs? Like a timestamp with milliseconds? Or JavaScript's Math.random? Or even a combination of those two?!

Let's compute just how many unique ids we would get ...


The uniqueness of a timestamp with milliseconds

When using (new Date()).valueOf().toString(), you're getting a 13-character number (e.g., 1375369309741). However, since this a sequentially updating number (once per millisecond), the outputs are almost always the same. Let's take a look

for (var i=0; i<10; i++) {
  console.log((new Date()).valueOf().toString());
}
console.log("OMG so not random");

// 1375369431838
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431839
// 1375369431840
// 1375369431840
// OMG so not random

To be fair, for comparison purposes, in a given minute (a generous operation execution time), you will have 60*1000 or 60000 uniques.


The uniqueness of Math.random

Now, when using Math.random, because of the way JavaScript represents 64-bit floating point numbers, you'll get a number with length anywhere between 13 and 24 characters long. A longer result means more digits which means more entropy. First, we need to find out which is the most probable length.

The script below will determine which length is most probable. We do this by generating 1 million random numbers and incrementing a counter based on the .length of each number.

// get distribution
var counts = [], rand, len;
for (var i=0; i<1000000; i++) {
  rand = Math.random();
  len  = String(rand).length;
  if (counts[len] === undefined) counts[len] = 0;
  counts[len] += 1;
}

// calculate % frequency
var freq = counts.map(function(n) { return n/1000000 *100 });

By dividing each counter by 1 million, we get the probability of the length of number returned from Math.random.

len   frequency(%)
------------------
13    0.0004  
14    0.0066  
15    0.0654  
16    0.6768  
17    6.6703  
18    61.133  <- highest probability
19    28.089  <- second highest probability
20    3.0287  
21    0.2989  
22    0.0262
23    0.0040
24    0.0004

So, even though it's not entirely true, let's be generous and say you get a 19-character-long random output; 0.1234567890123456789. The first characters will always be 0 and ., so really we're only getting 17 random characters. This leaves us with 10^17 +1 (for possible 0; see notes below) or 100,000,000,000,000,001 uniques.


So how many random inputs can we generate?

Ok, we calculated the number of results for a millisecond timestamp and Math.random

      100,000,000,000,000,001 (Math.random)
*                      60,000 (timestamp)
-----------------------------
6,000,000,000,000,000,060,000

That's a single 6,000,000,000,000,000,060,000-sided die. Or, to make this number more humanly digestible, this is roughly the same number as

input                                            outputs
------------------------------------------------------------------------------
( 1×) 6,000,000,000,000,000,060,000-sided die    6,000,000,000,000,000,060,000
(28×) 6-sided die                                6,140,942,214,464,815,497,21
(72×) 2-sided coins                              4,722,366,482,869,645,213,696

Sounds pretty good, right ? Well, let's find out ...

SHA1 produces a 20-byte value, with a possible 256^20 outcomes. So we're really not using SHA1 to it's full potential. Well how much are we using?

node> 6000000000000000060000 / Math.pow(256,20) * 100

A millisecond timestamp and Math.random uses only 4.11e-27 percent of SHA1's 160-bit potential!

generator               sha1 potential used
-----------------------------------------------------------------------------
crypto.randomBytes(20)  100%
Date() + Math.random()    0.00000000000000000000000000411%
6-sided die               0.000000000000000000000000000000000000000000000411%
A coin                    0.000000000000000000000000000000000000000000000137%

Holy cats, man! Look at all those zeroes. So how much better is crypto.randomBytes(20)? 243,583,606,221,817,150,598,111,409 times better.


Notes about the +1 and frequency of zeroes

If you're wondering about the +1, it's possible for Math.random to return a 0 which means there's 1 more possible unique result we have to account for.

Based on the discussion that happened below, I was curious about the frequency a 0 would come up. Here's a little script, random_zero.js, I made to get some data

#!/usr/bin/env node
var count = 0;
while (Math.random() !== 0) count++;
console.log(count);

Then, I ran it in 4 threads (I have a 4-core processor), appending the output to a file

$ yes | xargs -n 1 -P 4 node random_zero.js >> zeroes.txt

So it turns out that a 0 is not that hard to get. After 100 values were recorded, the average was

1 in 3,164,854,823 randoms is a 0

Cool! More research would be required to know if that number is on-par with a uniform distribution of v8's Math.random implementation

Grassy answered 14/2, 2013 at 7:25 Comment(2)
crypto.randomBytes is definitely the way to go ^^Grassy
Wouldn't A millisecond timestamp and Math.random uses only 4.11e-27 percent of SHA1's 160-bit potential! actually be 4.11e-25 percent? 6e21 / 256**20 * 100 is 4.1053665947016126e-25. That would also mean that 0.00000000000000000000000000411% should actually be 0.000000000000000000000000411%, etc.Quarrelsome
B
69

Have a look here: How do I use node.js Crypto to create a HMAC-SHA1 hash? I'd create a hash of the current timestamp + a random number to ensure hash uniqueness:

var current_date = (new Date()).valueOf().toString();
var random = Math.random().toString();
crypto.createHash('sha1').update(current_date + random).digest('hex');
Brilliantine answered 23/2, 2012 at 6:28 Comment(3)
For a much better approach, see @naomik's answer below.Brilliantine
This was also a great answer Gabi, and just a tiny bit faster, about 15%. Great job both! I actually like to see a Date() in the salt, it gives a developer easy confidence that this will be unique value in all but the most insane parallel computing situations. I know its silly and randomBytes(20) is going to be unique, but its just a confidence we can have because we may not be familiar with internals of random generation of another library.Dramaturgy
@Dmitri R117 What is the benefit of salting a primary key? Unique values are not required to be guaranteed as you can detect duplicates upon insert. Some shops pre-allocate blocks. It is more important that the distribution be homogenous for a flatter index or equally sized partitions.Tuckie
G
33

Do it in the browser, too !

EDIT: this didn't really fit into the flow of my previous answer. I'm leaving it here as a second answer for people that might be looking to do this in the browser.

You can do this client side in modern browsers, if you'd like

// str byteToHex(uint8 byte)
//   converts a single byte to a hex string 
function byteToHex(byte) {
  return ('0' + byte.toString(16)).slice(-2);
}

// str generateId(int len);
//   len - must be an even number (default: 40)
function generateId(len = 40) {
  var arr = new Uint8Array(len / 2);
  window.crypto.getRandomValues(arr);
  return Array.from(arr, byteToHex).join("");
}

console.log(generateId())
// "1e6ef8d5c851a3b5c5ad78f96dd086e4a77da800"

console.log(generateId(20))
// "d2180620d8f781178840"

Browser requirements

Browser    Minimum Version
--------------------------
Chrome     11.0
Firefox    21.0
IE         11.0
Opera      15.0
Safari     5.1
Grassy answered 16/10, 2014 at 17:10 Comment(4)
Number.toString(radix) doesn't always guarantee a 2 digit value (ex: (5).toString(16) = "5", not "05"). This doesn't matter unless you are depending on your final output to be exactly len characters long. In this case you can use return ('0'+n.toString(16)).slice(-2); inside of your map function.Lai
Great code, thanks. Just wanted to add: if you're going to use it for the value of an id attribute, make sure the ID starts with a letter: [A-Za-z].Ratiocinate
Great answer (and comments) - really appreciated that you included the browser requirements in the answer, as well!Pood
The browser requirements are incorrect. Array.from() is not supported in IE11.Isolt
G
2

If Want To Get Unique Identifiers, You should use UUID (Universally Unique Identifier) / GUID (Globally Unique Identifier).

A Hash is Supposed to be Deterministic & Unique & of Fixed Length For Input of any size. So no matter how many times you run the hash function, the output will be the same if you use the same input.

UUIDs Are Unique & Randomly Generated! There Is A Package called 'uuid' you can install it via npm by

npm install uuid

& In your code import the module by

const { v4:uuidv4} = require('uuid');

// Call The Method uuidv4 or whatever you name it while importing & log it or store it or assign it. The method return a UUID in the form of a string.

console.log(uuidv4()); // Example Output : '59594fc8-6a35-4f50-a966-4d735d8402ea'

Here is the npm link (if you need it) : https://www.npmjs.com/package/uuid

Gotten answered 15/9, 2021 at 13:7 Comment(0)
F
0

Using crypto is a good approach cause it's native and stable module, but there are cases where you can use bcrypt if you want to create a really strong and secure hash. I use it for passwords it has a lot of techniques for hashing, creating salt and comparing passwords.

Technique 1 (generate a salt and hash on separate function calls)

const salt = bcrypt.genSaltSync(saltRounds);
const hash = bcrypt.hashSync(myPlaintextPassword, salt);

Technique 2 (auto-gen a salt and hash):

const hash = bcrypt.hashSync(myPlaintextPassword, saltRounds);

For more examples you can check here: https://www.npmjs.com/package/bcrypt

Fowling answered 26/8, 2020 at 19:22 Comment(0)
C
0

Concise method that works in the browser:

// Returns a 256 bit string, or the equivalent Uint8Array if `false` is
// passed in.

function get256RandomBits(returnAsString = true) {
  const uint8Array = new Uint8Array(32); // 32 bytes = 256 bits
  const rng = crypto.getRandomValues(uint8Array);
  if (returnAsString) {
    return Array.from(rng).map(b => b.toString(16).padStart(2, '0')).join('');
  }
  else {
    return rng;
  }
}

Example outputs:
- c2fec24e465658aad1208d0a3f863585aa2e5fd30f4e0712e2c74239419700d3
- 8f9ff25c2948b8ee39f77303e0678b6eae1382e3d2517e15a1c4de6840f5673b
- d212ff805630edb41d22c9b6fdd8db6e7f910ea25483bad8b598249a6c73d950

I'm using 256 bits instead of 160 seeing as this question was asked over 10 years ago, but the idea is the same; just modify the second line as needed.

Cartan answered 28/11, 2023 at 9:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.