Faster way to swap endianness in C# with 16 bit words
Asked Answered
T

5

10

There's got to be a faster and better way to swap bytes of 16bit words then this.:

public static void Swap(byte[] data)
{
    for (int i = 0; i < data.Length; i += 2)
    {
        byte b = data[i];
        data[i] = data[i + 1];
        data[i + 1] = b;
    }
}

Does anyone have an idea?

Tauten answered 23/10, 2009 at 0:50 Comment(7)
Your solution seems like a good one except that IF YOUR DATA IS EVER BE ODD-LENGTHED, your code will throw array out of bound exception.Pippa
If he's swapping 16 bit words, then his data will never have odd length.Chantey
Yes, this will be a private method and it will be guaranteed to have 16bit words.Tauten
Why are you looking for faster and better? Is there some metric you are aiming for?Grip
this is an O(n) solution, with no new memory allocations. Other then declaring b outside the loop so it won't be allocated each time, why do you want to improve this?Gerek
@initialZero, the only thing i can think of for improving this, is minimizing the opcodes on an MSIL level inside the loop. but you have to iterate throuh the entire array. Another approach would be to declare your own byte array type, which can return both little and big endian by overriding the [] operator. that way you won't need to swap the memory.Gerek
I think bit shifts would make this faster, but fwiw, i agree you shouldn't be checking for even-length arrays. At this level in a framework for private methods it's more than reasonable to make this assumption and rely on the out-of-bound exception.Espinal
B
7

In my attempt to apply for the Uberhacker award, I submit the following. For my testing, I used a Source array of 8,192 bytes and called SwapX2 100,000 times:

public static unsafe void SwapX2(Byte[] source)  
{  
    fixed (Byte* pSource = &source[0])  
    {  
        Byte* bp = pSource;  
        Byte* bp_stop = bp + source.Length;  

        while (bp < bp_stop)  
        {
            *(UInt16*)bp = (UInt16)(*bp << 8 | *(bp + 1));  
            bp += 2;  
        }  
    }  
}

My benchmarking indicates that this version is over 1.8 times faster than the code submitted in the original question.

Brentonbrentt answered 16/12, 2011 at 22:35 Comment(1)
This code has a bug that it reads and writes outside the buffer when the length is odd. This might lead to memory corruption or a page fault. Change "bp + source.Length" into "bp + (source.Length & 0xFFFFFFFE)". Or throw an exception when the buffer size is odd.Snug
R
6

This way appears to be slightly faster than the method in the original question:

private static byte[] _temp = new byte[0];
public static void Swap(byte[] data)
{
    if (data.Length > _temp.Length)
    {
        _temp = new byte[data.Length];
    }
    Buffer.BlockCopy(data, 1, _temp, 0, data.Length - 1);
    for (int i = 0; i < data.Length; i += 2)
    {
        _temp[i + 1] = data[i];
    }
    Buffer.BlockCopy(_temp, 0, data, 0, data.Length);
}

My benchmarking assumed that the method is called repeatedly, so that the resizing of the _temp array isn't a factor. This method relies on the fact that half of the byte-swapping can be done with the initial Buffer.BlockCopy(...) call (with the source position offset by 1).

Please benchmark this yourselves, in case I've completely lost my mind. In my tests, this method takes approximately 70% as long as the original method (which I modified to declare the byte b outside of the loop).

Rosco answered 23/10, 2009 at 1:57 Comment(5)
Uberhacker award! I would expect a large resize of temp to be more expensive than an in place swap. And it's sorta a memory-leak just having temp sit around.Tauten
@initialZero: in my benchmark, I initially sized it to the same size as my test array, so there was no cost there. Temp variables like this in general buy speed at the expense of memory, which may or may not be a good decision.Rosco
@Rosco you could always set temp = null at the end of swap to fix the leak. It's definitely a cool solution. Thanks.Tauten
@FengYuan: not every method needs to be thread-safe; it entirely depends upon how it's being used.Rosco
You don't want to have to know that a swap method is not thread safe. This goes to the heart of what good engineering is.Espinal
E
5

I always liked this:

public static Int64 SwapByteOrder(Int64 value) 
{
  var uvalue = (UInt64)value;
  UInt64 swapped = 
       ( (0x00000000000000FF) & (uvalue >> 56)
       | (0x000000000000FF00) & (uvalue >> 40)
       | (0x0000000000FF0000) & (uvalue >> 24)
       | (0x00000000FF000000) & (uvalue >> 8)
       | (0x000000FF00000000) & (uvalue << 8)
       | (0x0000FF0000000000) & (uvalue << 24)
       | (0x00FF000000000000) & (uvalue << 40)
       | (0xFF00000000000000) & (uvalue << 56));
  return (Int64)swapped;
}

I believe you'll find this is the fastest method as well a being fairly readable and safe. Obviously this applies to 64-bit values but the same technique could be used for 32- or 16-.

Espinal answered 16/12, 2011 at 19:51 Comment(0)
S
2

Next method, in my test, almost 3 times faster as the accepted answer. (Always faster on more than 3 characters or six bytes, a bit slower on less or equal to three characters or six bytes.) (Note that the accepted answer can read/write outside the bounds of the array.)

(Update While having a pointer there's no need to call the property to get the length. Using that pointer is a bit faster, but requires either a runtime check or, as in next example, a project configuration to build for each platform. Define X86 and X64 under each configuration.)

static unsafe void SwapV2(byte[] source)
{
    fixed (byte* psource = source)
    {
#if X86
        var length = *((uint*)(psource - 4)) & 0xFFFFFFFEU;
#elif X64
        var length = *((uint*)(psource - 8)) & 0xFFFFFFFEU;
#else
        var length = (source.Length & 0xFFFFFFFE);
#endif
        while (length > 7)
        {
            length -= 8;
            ulong* pulong = (ulong*)(psource + length);
            *pulong = ( ((*pulong >> 8) & 0x00FF00FF00FF00FFUL)
                      | ((*pulong << 8) & 0xFF00FF00FF00FF00UL));
        }
        if(length > 3)
        {
            length -= 4;
            uint* puint = (uint*)(psource + length);
            *puint = ( ((*puint >> 8) & 0x00FF00FFU)
                     | ((*puint << 8) & 0xFF00FF00U));
        }
        if(length > 1)
        {
            ushort* pushort = (ushort*)psource;
            *pushort = (ushort) ( (*pushort >> 8)
                                | (*pushort << 8));
        }
    }
}

Five tests with 300.000 times 8192 bytes

  • SwapV2: 1055, 1051, 1043, 1041, 1044
  • SwapX2: 2802, 2803, 2803, 2805, 2805

Five tests with 50.000.000 times 6 bytes

  • SwapV2: 1092, 1085, 1086, 1087, 1086
  • SwapX2: 1018, 1019, 1015, 1017, 1018

But if the data is large and performance really matters, you could use SSE or AVX. (13 times faster.) https://pastebin.com/WaFk275U

Test 5 times, 100000 loops with 8192 bytes or 4096 chars

  • SwapX2 : 226, 223, 225, 226, 227 Min: 223
  • SwapV2 : 113, 111, 112, 114, 112 Min: 111
  • SwapA2 : 17, 17, 17, 17, 16 Min: 16
Snug answered 12/3, 2018 at 0:36 Comment(0)
S
1

Well, you could use the XOR swapping trick, to avoid an intermediate byte. It won't be any faster, though, and I wouldn't be surprised if the IL is exactly the same.

for (int i = 0; i < data.Length; i += 2)
{
    data[i] ^= data[i + 1];
    data[i + 1] ^= data[i];
    data[i] ^= data[i + 1];
}
Swound answered 23/10, 2009 at 1:18 Comment(2)
Good one. But this is somewhat more confusing to read. And wikipedia claims "On modern (desktop) CPUs, the XOR technique is considerably slower than using a temporary variable to do swapping." Oh well. Thanks for the sexy solution.Tauten
Yeah, I think all we can say about it is that it's neat. I just benchmarked it, and got results identical to your solution. So, one of the forms is likely optimized to the other.Swound

© 2022 - 2024 — McMap. All rights reserved.