You are after a 16-byte, universally unique, chronologically sortable unique identifier.
SQL Server already has that: uniqueidentifier
.
Many people are under the mistaken impression that a uniqueidentifer
is a random value, and that its use as a clusting key is bad because it causes I/O to be scattered all over the place, torn pages, and poor temporaral locality.
That is only true for "random" uuids - known as a "type 4" uuids.
Sequential sortable UUIDs
But there are other types of UUIDs:
- Type 1: stuffs MAC address+datetime into 128 bits
- Type 3: stuffs an MD5 hash into 128 bits
- Type 4: stuffs random data into 128 bits
- Type 5: stuffs an SHA1 hash into 128 bits
- Type 6: sequential sortable UUIDs
Rather than being built from random data, or an MD5 hash, or an SHA1 hash, we can use a hash based on a timestamp , with a resolution of 100 ns
(i.e. 0.0001 ms
).
And similarly to how you can default a uniqueidentifier colunm to a "type 4 random uuid":
CustomerUUID uniqueidentifier DEFAULT newid()
SQL Server also supports sequential type-1 UUIDs:
CustomerUUID uniqueidentifier DEFAULT newsequentialid()
And those uuids sort chronologically - as long as they're created on the same machine
NodeID, MAC address, and Multiple Machines
There is the issue that the type-1 UUID contains a 6-byte nodeID
, which is the client's MAC address.
00112233-4455-6677-8899-AABBCCDDEEFF
Timestamp-High |
Timestamp-Mid |
Timestamp-Low |
Sequence |
NodeID |
00112233 |
4455 |
6677 |
8899 |
AABBCCDDEEFF |
And SQL Server, for whatever reason, decided to sort UUIDs by the nodeID
first.
SQL Server sorts guids first by f, then e, then d, c, b, a, 9, ..., 2, 1, 0:
{00112233-4455-6677-9988-ffeeddccbbaa}
In other words, it sorts by:
Node[0]
Node[1]
Node[2]
Node[3]
Node[4]
Node[5]
UInt16 Sequence
(little endian)
UInt16 TimestampLow
(big endian)
UInt16 TimestampMid
(big endian)
UInt32 TimestampHigh
(big endian)
Which means:
- if all the UUIDs are generated by the same machine (whether it be the database server itself, or a web web-server) then the UUIDs will be sequential based on the timestamp
- but if the UUIDs are generated on different clients, with different
nodeIDs
(i.e. MAC address) then you'll have them sorted first by node, and then by timestamp
That may or may not be an issue for you. But if it is, the solution is to shuffle around the bytes of the uniqueidentifier/UUID so that:
- if SQL Server is gonna sort the 6-byte NodeID first
- then i'll just put the timestamp there
- and move the
nodeID
to where the timestamp was
In C# we use the following class to generate an "SQL Server sortable UUID". Other databases may sort UUIDs/uniqueidentifiers differently.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Web;
/// <summary>
/// Summary description for SortableGuid
/// </summary>
public class SortableGuid
{
//UUID timestamps are 100ns ticks starting from October 15, 1582 (date we switched to Gregorian calendar)
//Windows FILETIME is 100ns ticks starting from January 1, 1601 (the date of the start of the first 400-year cycle of the Gregorian calendar)
private static readonly Int64 guidEpochOffset = -5748192000000000; //6653 days --> 159672 hours --> 9580320 minutes --> 574819200 seconds -> 574819200000 ms -> 574819200000000 us --> 5748192000000000 ticks
private static Int64 _lastTick = DateTime.UtcNow.Ticks + guidEpochOffset;
/// <summary>
///
/// </summary>
/// <param name="location">The destination, whose value is compared with comparand and possibly replaced.</param>
/// <param name="comparison">The value that is compared to the value at location1.</param>
/// <param name="newValue">The value that replaces the destination value if the comparison results in equality.</param>
/// <returns>true if comparand was greater than location, and location was updated to newValue.
/// Otherwise false.</returns>
public static Boolean InterlockedExchangeIfGreaterThan(ref Int64 location, Int64 newValue, Int64 comparand)
{
//Thank you Raymond Chen: https://mcmap.net/q/467377/-interlocked-compareexchange-lt-int-gt-using-greaterthan-or-lessthan-instead-of-equality
Int64 currentValue;
do
{
currentValue = Interlocked.Read(ref location); //a read of a 64-bit location is not atomic on x86. If location was 32-bit you could get away with just "currentValue = location;"
if (currentValue >= comparand) return false;
}
while (System.Threading.Interlocked.CompareExchange(ref location, newValue, currentValue) != currentValue);
return true;
}
/// <summary>
/// Returns a new sortable guid. These guid's are suitable as clustering keys in SQL Server.
/// </summary>
/// <returns></returns>
public static Guid NewGuid()
{
/*
Blatently borrowed from Entity Framework.
https://github.com/dotnet/efcore/blob/master/src/EFCore/ValueGeneration/SequentialGuidValueGenerator.cs
Except with two differences:
- they start with an initial timetime, generated statically once - and keep increasing that.
That causes the "timestamp" to drift further and further from reality.
We generate a timestamp each time, and only rely on incrementing if we're not greater than the last timestamp.
A CPU is capable of generating ~200k GUIDs per 100ns tick - so it's not like we can ignore the duplciate ticks problem.
- UUID timestamp ticks start at October 15, 1582 (date of gregorian calendar).
Windows, and DateTime.Ticks returns number of ticks since January 1, 1601 (the date of the first 400 year Gregorian cycle).
We'll offset the timestamp to match the UUID spec.
- We place the version number type-7: Sortable by SQL Server with a timestamp.
SQL Server sorts first by the NodeID (i.e. the final 6-bytes):
16-byte guid Microsoft clsid string representation
===========--=====--=====--=====--================= ======================================
33 22 11 00 55 44 77 66 99 88 ff ee dd cc bb aa ==> {00112233-4455-6677-9988-ffeeddccbbaa}
\_______________________/ \___/ \_______________/
Timestamp and Version ^ Clk Node ID
The goal is to create a sequential uuid (e.g. UuidCreateSequential), but without having to rely on a MAC address.
(i.e. Does an Azure web-site even **have** a MAC address?
We certainly can't P/Invoke out to UuidCreateSequental when we're not running on Windows)
So we conceptually move the 8-byte timestamp to it's new location in NodeID+ClockSequence
And what used to be Timestamp+Version becomes random.
And, like type-4 Uuids being called Type-4 to help reduce the chances of collisions between types,
we call this new version type-7 (sortable by SQL Server with a timestamp).
*/
//Start from a new random (type 4) uuid.
Byte[] guidBytes = Guid.NewGuid().ToByteArray();
//Generate 8-byte timestamp
Int64 currentTicks = DateTime.UtcNow.Ticks + guidEpochOffset;
//Make sure that currentTicks is greater than the last tick count (in case they're generating guids faster than 100ns apart)
Int64 last;
do
{
last = Interlocked.Read(ref _lastTick); //a read of a 64-bit value isn't atomic; so it has to be interlocked.
if (currentTicks <= last)
currentTicks = last + 1;
} while (Interlocked.CompareExchange(ref _lastTick, currentTicks, last) != last);
Byte[] counterBytes = BitConverter.GetBytes(currentTicks);
if (!BitConverter.IsLittleEndian)
{
Array.Reverse(counterBytes);
}
//This Guid started it's life as a Type 4 (Random) guid, but the final 8 bytes were changed to contain a sequential counter.
//I want to tag this as a different kind of Uuid algorithm. In Delphi we already called this a Type 7 uuid.
guidBytes[07] = (Byte)(0x70 | (guidBytes[07] & 0x0f));
//Put the timestamp in place - in the proper SQL Server sorting order.
guidBytes[08] = counterBytes[1];
guidBytes[09] = counterBytes[0];
guidBytes[10] = counterBytes[7];
guidBytes[11] = counterBytes[6];
guidBytes[12] = counterBytes[5];
guidBytes[13] = counterBytes[4];
guidBytes[14] = counterBytes[3];
guidBytes[15] = counterBytes[2];
Guid result = new Guid(guidBytes);
return result;
}
}
identity
specification, which I guess is what you meant by mentioning "basic incremental value"? If yes, tell us these reasons so we can help you. If no, go for a bigint identity. – Heptateuchidentity
is quite simple. Security. I want accountId be an unique specified which client will have inJWT
(between API and client app). And fromidentity
you can 1) find out how many users were already registered by creating new account and checking that value 2) do iteration attacks (depends on implementation), so thats why I want to use some kind ofUUID
instead ofint
/bigint
identity
and through thoseUUID
docs I came toULID
. I useidentity
on other columns that are not exposed to client side. – Billposter