It's important to note that the UUIDs generated by UuidCreateSequential will not be sequential when ordered by SQL Server.
- SQL Server follows the RFC when it comes to sorting UUIDs
- the RFC got it wrong
UuidCreateSequential
did it right
- but
UuidCreateSequential
creates something different from what SQL Server expects
Background
The Type 1 UUIDs created by UuidCreateSequential don't sort in SQL Server.
SQL Server's NewSequentialID uses UuidCreateSequential, with some byte shuffling applied. From the Books Online:
NEWSEQUENTIALID (Transact-SQL)
NEWSEQUENTIALID is a wrapper over the Windows UuidCreateSequential function, with some byte shuffling applied
which then references an MSDN blog post:
How to Generate Sequential GUIDs for SQL Server in .NET (archive)
public static Guid NewSequentialId()
{
Guid guid;
UuidCreateSequential(out guid);
var s = guid.ToByteArray();
var t = new byte[16];
t[3] = s[0];
t[2] = s[1];
t[1] = s[2];
t[0] = s[3];
t[5] = s[4];
t[4] = s[5];
t[7] = s[6];
t[6] = s[7];
t[8] = s[8];
t[9] = s[9];
t[10] = s[10];
t[11] = s[11];
t[12] = s[12];
t[13] = s[13];
t[14] = s[14];
t[15] = s[15];
return new Guid(t);
}
It all starts with the number of ticks since 1582-10-15 00:00:00
(October 15, 1592, the date of
Gregorian reform to the Christian calendar). Ticks is the number of 100 ns intervals.
For example:
- 12/6/2017 4:09:39 PM UTC
- = 137,318,693,794,503,714 ticks
- =
0x01E7DA9FDCA45C22
ticks
The RFC says that we should split this value into three chunks:
- UInt32 low (4 bytes)
- Uint16 mid (2 bytes)
- UInt32 hi (2 bytes)
So we split it up:
0x01E7DA9FDCA45C22
| Hi | Mid | Low |
|--------|--------|------------|
| 0x01E7 | 0xDA9F | 0xDCA45C22 |
And then the RFC says that these three integers should be written out in the order of:
- Low: 0xDCA45C22
- Mid: 0xDA9F
- High: 0x01E7
If you follow the RFC, these values must be written in big-endian (aka "network byte order"):
DC A4 5C 22 DA 9F x1 E7 xx xx xx xx xx xx xx xx
This was a bad design, because you cannot take the first 8 bytes of the UUID and treat them either as a big-endian UInt64, nor as a little-endian UInt64. It's a totally dumb encoding.
UuidCreateSequential gets it right
Microsoft followed all the same rules so far:
- Low: 0xDCA45C22
- Mid: 0xDA9F
- High: 0x1E7
But they write it out in Intel little-endian order:
22 5C A4 DC 9F DA E7 x1 xx xx xx xx xx xx xx xx
If you look at that, you've just written out a little-endian Int64
:
225CA4DC9FDAE701
Meaning:
- if you wanted to extract the timestamp
- or sort by the timestamp
it's trivial; just treat the first 8 bytes as a UInt64.
With the RFC, you have no choice but to perform all kinds of bit fiddling. Even on big-endian machines, you can't treat the 64-bit timestamp as a 64-bit timestamp.
How to reverse it
Given a little-endian guid from UuidCreateSequential
:
DCA45C22-DA9F-11E7-DDDD-FFFFFFFFFFFF
with the raw bytes of:
22 5C A4 DC 9F DA E7 11 DD DD FF FF FF FF FF FF
This decodes into:
Low Mid Version High
-------- ---- ------- ---- -----------------
DCA45C22-DA9F-1 1E7 -DDDD-FFFFFFFFFFFF
- Low: 0xDCA45C22
- Mid: 0xDA9F
- High: 0x1E7
- Version: 1 (type 1)
We can write this back out in RFC big-endian order:
DC A4 5C 22 DA 9F 11 E7 DD DD FF FF FF FF FF FF
Short version
| Swap | Swap | Swap | Copy as-is
Start index | 0 1 2 3 | 4 5 | 6 7 |
End index | 3 2 1 0 | 5 4 | 7 6 |
---------------|-------------|-------|-------|------------------------
Little-endian: | 22 5C A4 DC | 9F DA | E7 11 | DD DD FF FF FF FF FF FF
Big-endian: | DC A4 5C 22 | DA 9F | 11 E7 | DD DD FF FF FF FF FF FF