Why does BitConverter.ToInt32 read one byte at a time if the data is not aligned at the given offset?
Asked Answered
P

1

6

Sorry for the confusing title, but I can't think of a better way to explain it.

While browsing the source code of BitConverter recently, I came across a strange segment of code:

public static unsafe int ToInt32(byte[] value, int startIndex)
{
    fixed (byte* pbyte = &value[startIndex])
    {
        if (startIndex % 4 == 0) // data is aligned 
            return *((int*)pbyte);
        else
        { 
            if (IsLittleEndian)
            {  
                return (*pbyte) | (*(pbyte + 1) << 8)  | (*(pbyte + 2) << 16) | (*(pbyte + 3) << 24); 
            } 
            else
            { 
                return (*pbyte << 24) | (*(pbyte + 1) << 16)  | (*(pbyte + 2) << 8) | (*(pbyte + 3));                         
            } 
        }
    }
}

How can casting pbyte to an int* (line 6) violate data alignment in this scenario? I left it out for brevity, but the code has proper argument validation so I'm pretty sure it can't be a memory access violation. Does it lose precision when casted?

In other words, why can't the code be simplified to:

public static unsafe int ToInt32(byte[] value, int startIndex)
{
    fixed (byte* pbyte = &value[startIndex])
    {
        return *(int*)pbyte;
    }
}

Edit: Here is the section of code in question.

Pandybat answered 26/8, 2015 at 1:7 Comment(3)
Operations on data aligned on data size boundary are faster and on some CPUs access to non-aligned words/dwords/float/doubles is access viaolation... (comment as I don't have good link handy).Donndonna
@AlexeiLevenkov Still, you create two additional branches, multiple bitwise operations, and dereference pbyte + 1, pbyte + 2, etc. (3 of which are not going to be "aligned") just so you can avoid that non-alignment. Seems pretty overkill to me.Pandybat
Not sure about your comment - there are CPUs that you flat out can't access unaligned int - #1238463 - so what alternative implementation do you suggest compared to reading byte-by-byte?Donndonna
O
1

I'd bet that this has to do with this part of §18.4 in version 5.0 of the C# specification (emphasis mine):

When one pointer type is converted to another, if the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined if the result is dereferenced.

Bytewise copying in the "unaligned" case is done to avoid relying on explicitly undefined behavior.

Osmo answered 29/12, 2015 at 13:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.