Improve performance of Bitconverter.ToInt16
Asked Answered
U

2

6

I am collecting data from a USB device and this data has to go to an audio output component. At the moment I am not delivering the data fast enough to avoid clicks in the output signal. So every millisecond counts.

At the moment I am collecting the data which is delivered in a byte array of 65536 bytes.The first two bytes represent 16 bits of data in little endian format. These two bytes must be placed in the first element of a double array. The second two bytes, must be placed in the first element of a different double array. This is then repeated for all the bytes in the 65536 buffer so that you end up with 2 double[] arrays of size 16384.

I am currently using BitConverter.ToInt16 as shown in the code. It takes around 0.3ms to run this but it has to be done 10 times to get a packet to send off to the audio output. So the overhead is 3ms which is just enough for some packets to not be delivered on time eventually.

Code

byte[] buffer = new byte[65536];
double[] bufferA = new double[16384];
double[] bufferB = new double[16384]

for(int i= 0; i < 65536; i +=4)
{
    bufferA[i/4] = BitConverter.ToInt16(buffer, i);
    bufferB[i/4] = BitConverter.ToInt16(buffer, i+2);
}

How can I improve this? Is it possible to copy the values with unsafe code? I have no experience in that. Thanks

Urrutia answered 22/10, 2018 at 2:57 Comment(3)
Take a look at the BitConverter.ToInt16 source code, then remove all the checks you don't need and lift the fixed statement out of the loop.Unaware
@MichaelLiu i didn't know that C# has become open source now.Demonstration
Short history lesson: The reference source for .NET Framework has been available in one form or another since 2007. The reference source website Michael linked to had a major improvement overhaul in 2014. .NET Core is the fully open-sourced version of .NET, first released in 2016. The C# is language itself has been an open standard (ECMA-334) since 2001. So now you know. :)Forthright
C
4

This gets me about triple the speed in release, using Pointers and unsafe. There maybe other micro-optimisations, however I'll leave those details up to the masses

Updated

My original algorithm had a bug, and could have been improved

Modified Code

public unsafe (double[], double[]) Test2(byte[] input, int scale)
{
   var bufferA = new double[input.Length / 4];
   var bufferB = new double[input.Length / 4];

   fixed (byte* pSource = input)
      fixed (double* pBufferA = bufferA, pBufferB = bufferB)
      {
         var pLen = pSource + input.Length;
         double* pA = pBufferA, pB = pBufferB;

         for (var pS = pSource; pS < pLen; pS += 4, pA++, pB++)
         {
            *pA = *(short*)pS;
            *pB = *(short*)(pS + 2);
         }
      }

   return (bufferA, bufferB);
}

Benchmarks

Each test is run 1000 times, garbage collected before each run, and scaled to various array lengths. All results are checked against the original OP version

Test Environment

----------------------------------------------------------------------------
Mode             : Release (64Bit)
Test Framework   : .NET Framework 4.7.1 (CLR 4.0.30319.42000)
----------------------------------------------------------------------------
Operating System : Microsoft Windows 10 Pro
Version          : 10.0.17134
----------------------------------------------------------------------------
CPU Name         : Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
Description      : Intel64 Family 6 Model 58 Stepping 9
Cores (Threads)  : 4 (8)      : Architecture  : x64
Clock Speed      : 3901 MHz   : Bus Speed     : 100 MHz
L2Cache          : 1 MB       : L3Cache       : 8 MB
----------------------------------------------------------------------------

Results

--- Random Set of byte ------------------------------------------------------
| Value    |    Average |    Fastest |    Cycles | Garbage | Test |    Gain |
--- Scale 16,384 -------------------------------------------- Time 13.727 ---
| Unsafe   |  19.487 µs |  14.029 µs |  71.479 K | 0.000 B | Pass | 59.02 % |
| Original |  47.556 µs |  34.781 µs | 169.580 K | 0.000 B | Base |  0.00 % |
--- Scale 32,768 -------------------------------------------- Time 14.809 ---
| Unsafe   |  40.398 µs |  31.274 µs | 145.024 K | 0.000 B | Pass | 56.62 % |
| Original |  93.127 µs |  79.501 µs | 329.320 K | 0.000 B | Base |  0.00 % |
--- Scale 65,536 -------------------------------------------- Time 18.984 ---
| Unsafe   |  68.318 µs |  43.550 µs | 245.083 K | 0.000 B | Pass | 68.34 % |
| Original | 215.758 µs | 160.171 µs | 758.955 K | 0.000 B | Base |  0.00 % |
--- Scale 131,072 ------------------------------------------- Time 22.620 ---
| Unsafe   | 120.764 µs |  79.208 µs | 428.626 K | 0.000 B | Pass | 71.24 % |
| Original | 419.889 µs | 322.388 µs |   1.461 M | 0.000 B | Base |  0.00 % |
-----------------------------------------------------------------------------
Commix answered 22/10, 2018 at 3:43 Comment(15)
The benchmarks will be interesting. I'm not sure that my use of the StopWatch class has merit in my code.Urrutia
Excellent. Thanks very muchUrrutia
@Urrutia no problems, if you need the endian code its just shifting on the first variable ((*pbyte << 8) | (*(pbyte + 1)) alos youll get a small gain from or'ing instead of plusing *pS | (*(pS+1) << 8);Commix
Also I think it's worth noting, a similar solution to this can be implemented without unsafe, by first copying the bytes into new byte[16384 * sizeof(double)] and then using Buffer.BlockCopy to copy those bytes into the double array(s) at the end.Steinman
@Steinman No block copy wont work, as he needs to stagger the items between 2 arrays.Commix
Yes, you need to stagger the bytes manually. Basically just explicitly handling the pointer logic which happens behind the scenes in the unsafe solution (aka moving the pointer in sizeof(double) steps for each cycle).Steinman
@Steinman ahh ok i see what you mean, i can benchmark that as well, its a worthy noteCommix
The unsafe will be a little faster, this is just a managed workaround if unsafe isn't desired for whatever reason.Steinman
@KookieMonster: BlockCopy won't work because the bit representation of double differs from the bit representation of int. There's an implicit int to double conversion in the OP's code.Unaware
Urgh ofc you do. Hmmm... I mean it's not like you can't fix it manually, but then it's just getting overcomplicated. Yeah my bad lmao :sSteinman
Hi, now that I tried this, it doesn't work. The resulting spectrum that is derived from this it not at all what it should be. Perhaps I got the endian part wrong. I put in *pA = (*pS++ << 8) + *pS++ ; *pB = (*pS++ << 8) + *pS++ ; as well and it didn't work. The destination arrays are doubles. I don't see any logic for that?Urrutia
@Urrutia ok I'll have a look at this when I'm at work, honestly I didn't test itCommix
Sorry, I tried a number of things...I still can't wrap my head around pointers lol...Urrutia
Working great thank you very much. Unfortunately, there are still bottle necks in other parts of the code that I cannot trace down.Urrutia
Since this did not help the real time performance of my application, I think I may have found where my issue is. I posted this here: #52941697Urrutia
L
-3

"So every millisecond counts." If that is the case, you are dealing with Realtime Programming here. And for all it's power, the .NET Runtime is not ideal for Realtime Programming.

Garbage Collection Memory Management alone is usually a disqualifier for Realtime Programming.

Now you can change .NET from GC memory management to direct management. And squeeze a bit of performance out by going to unsafe code and using naked pointers. But that is pretty much the point where you removed every selling point of .NET. And it would have been better two write the whole thing/that part in native C++ to begin with.

Lueluebke answered 22/10, 2018 at 4:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.