Why is SQL Server Big Endian?
Asked Answered
S

1

11

From what I've read, all Windows versions and .NET are little endian. So why the departure from the Microsoft norm for SQL Server?

What I mean by "SQL Server is big endian" is this:

SELECT CONVERT(VARBINARY, 255);

gives:

0x000000FF

and not

0xFF000000

the way something like .NET's BitConverter.GetBytes() does. I guess SQL Server could be storing the number internally as little endian and then CONVERT is just switching it around for some reason. But either way, why?

Edit:

Just noticed this...

DECLARE @q UNIQUEIDENTIFIER = '01234567-89ab-cdef-0123-456789abcdef';
SELECT @q;
SELECT CONVERT(VARBINARY, @q);

gave me:

01234567-89AB-CDEF-0123-456789ABCDEF

0x67452301AB89EFCD0123456789ABCDEF

What the heck?

Secession answered 15/2, 2014 at 4:19 Comment(6)
Probably for compatibility. Back in ancient times, MS didn't write the original SQL Server - Sybase did. And they supported it on various OSes and hardware. So, probably for compatibility, the picked an endianness and stuck with it.Subtype
That sounds… pretty plausible, actually. But then what is up with the GUID thing I added? They didn't have GUIDs till way after they'd already bought it, I believe…Secession
See this answer on DBA.SE for a detailed explanation.Gunning
@Secession I was wondering if you ever had a chance to review my answer. Thoughts? Thanks :).Robby
@srutzky I thought you didn't answer the question "why?". The rest of MS's stuff acts like it's one way, and this acts like the other; your answer does not address thisSecession
@Secession Thanks for that reply. I re-read my answer and I actually had addressed the "why". There is no direct answer as you are not seeing what you think you are seeing by the test that you did. I have updated my answer to specifically address BitConverter.GetBytes() and why its output is misleading and not analogous to CONVERT(VARBINARY,. Please read my answer again, and if you still have questions, please let me know. Thanks.Robby
R
18

Yes: Windows and .NET are Little Endian.

So why is SQL Server Big Endian? Easy: it's not ;-). The MSDN page for Collation and Unicode Support (within SQL Server) even states:

Because the Intel platform is a little endian architecture, Unicode code characters are always stored byte-swapped.

So why do you get a Big Endian binary value when converting the Int value of 255? Here is where the confusion is. This question is flawed because it is based on a false premise: that you should see the endianness of hardware and/or software reflected in the converted value. But why would you? Endianness affects the internal representation of a value, how it is stored. But it doesn't change the thing itself. You can convert a DATETIME to an INT and you will see an Integer. But if you save that Integer in an INT field, it will be stored as 4 bytes in reverse order since this is a Little Endian system. But that has nothing to do with what you see when you request that value back from the system and it is displayed to you.

For example, run the following to see that converting the INT value of 301 to a BINARY(2) results in 0x012D, because 0x012D = 301, just in hexadecimal. And so converting 0x012D back to INT returns 301, as expected. If the original Int to Binary conversion gave you 0x2D01, well, that does not equate to 301.

SELECT CONVERT(BINARY(2), 301), CONVERT(INT, 0x012D)
-- 0x012D,  301

HOWEVER, if you create a table with an INT column, and insert a value of "301" into that column, and use DBCC PAGE to view the data page as it exists on disk, you will see the following hex digits in the order shown:

2D 01 00 00

Also, to address some of the evidence supporting the premise of the question:

Yes, doing BitConverter.ToString(BitConverter.GetBytes(255)) in .NET will return:

FF-00-00-00

BUT, that is not a conversion as GetBytes() is not converting the "value" but is instead intending on showing the internal system representation, which changes depending on if the system is Little Endian or Big Endian. If you look at the MSDN page for BitConverter.GetBytes, it might be clearer as to what it is actually doing.

When converting actual values, the results won't (and can't) be different across different systems. An integer value of 256 will always be 0x0100 across all systems (even calculators) because Endianness has nothing to do with how you convert values between base 10, base 2, base 16, etc.

In .NET, if you want to do this conversion, you can use String.Format("{0:X8}", 255) which will return:

000000FF

which is the same as what SELECT CONVERT(BINARY(4), 255); returns since they are both converting the value. This result isn't being shown as Big Endian, but is being shown as it truly is, which just happens to match the byte ordering of Big Endian.

In other words, when starting with a bit sequence of 100000000, that can be represented in decimal form as 256, or in hexadecimal form (known as BINARY / VARBINARY within SQL Server) as 0x0100. Endianness has nothing to do with this as these are merely different ways of representing the same underlying value.

Further evidence of SQL Server being Little Endian can be seen when converting between VARBINARY and NVARCHAR. Since NVARCHAR is a 16-bit (i.e. 2 byte) encoding, we can see the byte ordering since there is no numeric equivalent for characters (unlike the 256 -> 0x0100 example) and so there is really nothing else to show (showing Code Point values is not an option due to Supplementary Characters).

As you can see below, a Latin capital A, which has a Code Point of U+0041 (which is numerically the same as 65) converts to a VARBINARY value of 0x4100, because that is the UTF-16 Little Endian encoded value of that character:

SELECT CONVERT(VARBINARY(10), N'A'), -- 0x4100
       CONVERT(NVARCHAR(5), 0x4100), -- A
       CONVERT(INT, 0x4100),         -- 16640
       UNICODE(N'A'),                -- 65
       CONVERT(VARBINARY(8), 65);    -- 0x00000041

SELECT CONVERT(VARBINARY(10), N'ᄀ'), -- 0x0011
       CONVERT(NVARCHAR(5), 0x0011),  -- ᄀ
       CONVERT(INT, 0x0011),          -- 17
       UNICODE(N'ᄀ'),                -- 4352
       CONVERT(VARBINARY(8), 4352);   -- 0x00001100

Also, the "Pile of Poo" emoji (Code Point U+01F4A9) can be seen using the Surrogate Pair "D83D + DCA9" (which the NCHAR function allows for), or you can inject the UTF-16 Little Endian byte sequence:

SELECT NCHAR(0xD83D) + NCHAR(0xDCA9) AS [SurrogatePair],
       CONVERT(NVARCHAR(5), 0x3DD8A9DC) AS [UTF-16LE];
-- 💩   💩

UNIQUEIDENTIFIER is similar in that "what it is" and "how it is stored" are two different things and they do not need to match. Keep in mind that UUID / GUID is not a basic datatype like int or char, but is more of an entity that has a defined format, just like JPG or MP3 files. There is more discussion about UNIQUEIDENTIFIERs in my answer to a related question on DBA.StackExcange (including why it is represented by a combination of Big Endian and Little Endian).

Robby answered 24/11, 2015 at 6:7 Comment(7)
This still doesn't make sense. See individual notes below.Secession
First, you seem to be saying that 0x0100 is a value (as in a number) and not a series of bytes, which is not correct when it comes to a conversion to BINARY, as the type is explicitly intended to be a series of bytes, not a number. The textual representation 0x0100 looks like and may be interpreted as a number, but it's only a visualization of what BINARY types are doing.Secession
Second, I'm not really interested in internal representations nor underlying architectures; I'm only interested in the phenomenon as a black box. If it did 2560x0100 and 0x0100256 then I would call it big endian, and if it did 2560x0001 and 0x0001256 then I would call it little endian, regardless of anything else.Secession
Third, when I ask "why", I don't mean what are the mechanics of it. I mean what was Microsoft's justification/intention/goal when designing the system this way.Secession
@Secession Your point #1 above is incorrect. Converting the integer types to [VAR]BINARY in SQL Server is not intended to show the underlying encoded byte sequence. Converting from TINYINT, SMALLINT, INT, BIGINT to [VAR]BINARY shows the hexadecimal equivalent, same as String.Format("{X8}", intVal) in .NET. BitConverter, on the other hand, does show underlying encoding. If you are asking "why" did Microsoft decide to not expose the underlying encoding when converting INT -> VARBINARY, I would say because nobody would ever expect (or want) it to since 256 is a value, not a byte sequence.Robby
@Secession I have updated my answer again with more examples, this time using NVARCHAR as it clearly shows Little Endian encoding.Robby
The "hexadecimal equivalent" is a string of characters, shown as a convenience to the user of SSMS or what-have-you; a BINARY type, on the other hand, is a sequence of bytes, regardless of how it's displayed. In choosing what to convert the numerical value 256 to, you either have to choose to put the MSB first, or last. This choice decides whether you are implementing the conversion as little endian, or as big endian. To say that one way is natural and expected is merely to take a side in a big endian/little endian war.Secession

© 2022 - 2024 — McMap. All rights reserved.