What is the fastest way to convert a float[] to a byte[]?
Asked Answered
O

9

12

I would like to get a byte[] from a float[] as quickly as possible, without looping through the whole array (via a cast, probably). Unsafe code is fine. Thanks!

I am looking for a byte array 4 time longer than the float array (the dimension of the byte array will be 4 times that of the float array, since each float is composed of 4 bytes). I'll pass this to a BinaryWriter.

EDIT: To those critics screaming "premature optimization": I have benchmarked this using ANTS profiler before I optimized. There was a significant speed increase because the file has a write-through cache and the float array is exactly sized to match the sector size on the disk. The binary writer wraps a file handle created with pinvoke'd win32 API. The optimization occurs since this lessens the number of function calls.

And, with regard to memory, this application creates massive caches which use plenty of memory. I can allocate the byte buffer once and re-use it many times--the double memory usage in this particular instance amounts to a roundoff error in the overall memory consumption of the app.

So I guess the lesson here is not to make premature assumptions ;)

Octave answered 6/3, 2009 at 14:32 Comment(7)
What do you actually want? Every float cast to a byte, or an array four times longer containing the byte representation of the floats?Peder
What does "four times longer" mean?Tattletale
An array of floats into an array of bytes? So 2 floats would take 8 bytes? Or have I misunderstood.Verein
It would help to know what you plan to use the bytes for after. The answer yopu accepted is not optimal in several situations if you are willing to use unsafe code...Patiencepatient
He says in the question... "I'll pass this to a BinaryWriter".Tryma
That's not what he wants, that's how he is trying to achieve what he wants. if this is going into a stream he can do better than binary writer...Patiencepatient
Nick, check out my answer below. It'll do the job: no iteration, no memory allocations. If you can live with the "hackiness" of it, then go for it.Keyes
I
7

If you do not want any conversion to happen, I would suggest Buffer.BlockCopy().

public static void BlockCopy(
    Array src,
    int srcOffset,
    Array dst,
    int dstOffset,
    int count
)

For example:

float[] floatArray = new float[1000];
byte[] byteArray = new byte[floatArray.Length * 4];

Buffer.BlockCopy(floatArray, 0, byteArray, 0, byteArray.Length);
Inessential answered 6/3, 2009 at 15:5 Comment(8)
This will double the amount of memory allocation in addition to iterating over your two arrays (once to copy, once to write). Very inefficient both speed-wise and memory-wise. Not recommended.Bunn
Doesn't the last parameter need to be multiplied by sizeof(float)?Tryma
Actually, you should probably just use Buffer.ByteLength: msdn.microsoft.com/en-us/library/system.buffer.bytelength.aspxTryma
You are better off to just iterate over the float[] array and call Write for each float. This solution is highly inefficient.Bunn
Didn't know about that method, thanks! As for efficiency, whenever I have used BlockCopy, I had a byte[] and needed a float[] so there was no unneeded duplication. Plus if you stick with BlockCopy, you do not need unsafe code which can be advantageous. Pick the best method for your needs.Inessential
@Jeremy: I didn't either, until 5 seconds before that comment :) @Vlad: Please just rate it up or down. No need to repeatedly post the same comment (while advertizing for your answer). Let the asker and the users decide what is helpful. That's why the rating system exists.Tryma
Posted answer which confirms @Vlad's suspicionsTetrode
@rstevens: you would have to use Marshal.SizeOf(typeof(float)), but the CLI standard says sizeof(float) should be 32bits.Tetrode
J
22

There is a dirty fast (not unsafe code) way of doing this:

[StructLayout(LayoutKind.Explicit)]
struct BytetoDoubleConverter
{
    [FieldOffset(0)]
    public Byte[] Bytes;

    [FieldOffset(0)]
    public Double[] Doubles;
}
//...
static Double Sum(byte[] data)
{
    BytetoDoubleConverter convert = new BytetoDoubleConverter { Bytes = data };
    Double result = 0;
    for (int i = 0; i < convert.Doubles.Length / sizeof(Double); i++)
    {
        result += convert.Doubles[i];
    }
    return result;
}

This will work, but I'm not sure of the support on Mono or newer versions of the CLR. The only strange thing is that the array.Length is the bytes length. This can be explained because it looks at the array length stored with the array, and because this array was a byte array that length will still be in byte length. The indexer does think about the Double being eight bytes large so no calculation is necessary there.

I've looked for it some more, and it's actually described on MSDN, How to: Create a C/C++ Union by Using Attributes (C# and Visual Basic), so chances are this will be supported in future versions. I am not sure about Mono though.

Jacie answered 6/3, 2009 at 15:46 Comment(5)
+1 Nice technique! Is this reference aliasing safe from the garbage collector perspective?Originality
Just to avoid anybody getting the wrong idea, a System.Double (or in C# simply double) is 8 bytes (or 64 bits) and not 4 bytes (or 32 bits).Ziegler
This can also let you access uninitialized or other memory. Buffer overflow exploits go!Cardoso
"sizeof' is unsafe.Acerate
Maybe Length, heap address,... store in cli array, then Bytes.Length and Doubles.Length get same address and then same value. That not safe when using outside functionProvencal
T
21

Premature optimization is the root of all evil! @Vlad's suggestion to iterate over each float is a much more reasonable answer than switching to a byte[]. Take the following table of runtimes for increasing numbers of elements (average of 50 runs):

Elements      BinaryWriter(float)      BinaryWriter(byte[])
-----------------------------------------------------------
10               8.72ms                    8.76ms
100              8.94ms                    8.82ms
1000            10.32ms                    9.06ms
10000           32.56ms                   10.34ms
100000         213.28ms                  739.90ms
1000000       1955.92ms                10668.56ms

There is little difference between the two for small numbers of elements. Once you get into the huge number of elements range, the time spent copying from the float[] to the byte[] far outweighs the benefits.

So go with what is simple:

float[] data = new float[...];
foreach(float value in data)
{
    writer.Write(value);
}
Tetrode answered 6/3, 2009 at 15:37 Comment(4)
I have benchmarked this using ANTS profiler before I optimized. There was a significant speed increase because the file has a write-through cache and the float array is exactly sized to match the sector size on the disk. The binary writer wraps a file handle created with win32 API. ;)Octave
Good good, but I would add that unless you're writing millions of floats or executing this thousands of times, ~200ms is an unimportant number in the grand scheme of program execution.Tetrode
There is a sweet spot at 10,000, 3 times faster (or is it a typo? - should it be 30.34 ms?) - how do you explain that?Repetitious
Are you comparing this against foreach (byte b in bytedata) { writer.Write(b); }? Because that's a fairly silly compare, the whole reason why you want this to bytes is so you can use writer.Write(bytedata) directly, skipping the massive overhead per Write call. Writing 1MB to disk should not take 2 seconds, that's just plain absurd. You'd need a week to write a full PC backup this way.Hyperesthesia
K
17

There is a way which avoids memory copying and iteration.

You can use a really ugly hack to temporary change your array to another type using (unsafe) memory manipulation.

I tested this hack in both 32 & 64 bit OS, so it should be portable.

The source + sample usage is maintained at https://gist.github.com/1050703 , but for your convenience I'll paste it here as well:

public static unsafe class FastArraySerializer
{
    [StructLayout(LayoutKind.Explicit)]
    private struct Union
    {
        [FieldOffset(0)] public byte[] bytes;
        [FieldOffset(0)] public float[] floats;
    }

    [StructLayout(LayoutKind.Sequential, Pack = 1)]
    private struct ArrayHeader
    {
        public UIntPtr type;
        public UIntPtr length;
    }

    private static readonly UIntPtr BYTE_ARRAY_TYPE;
    private static readonly UIntPtr FLOAT_ARRAY_TYPE;

    static FastArraySerializer()
    {
        fixed (void* pBytes = new byte[1])
        fixed (void* pFloats = new float[1])
        {
            BYTE_ARRAY_TYPE = getHeader(pBytes)->type;
            FLOAT_ARRAY_TYPE = getHeader(pFloats)->type;
        }
    }

    public static void AsByteArray(this float[] floats, Action<byte[]> action)
    {
        if (floats.handleNullOrEmptyArray(action)) 
            return;

        var union = new Union {floats = floats};
        union.floats.toByteArray();
        try
        {
            action(union.bytes);
        }
        finally
        {
            union.bytes.toFloatArray();
        }
    }

    public static void AsFloatArray(this byte[] bytes, Action<float[]> action)
    {
        if (bytes.handleNullOrEmptyArray(action)) 
            return;

        var union = new Union {bytes = bytes};
        union.bytes.toFloatArray();
        try
        {
            action(union.floats);
        }
        finally
        {
            union.floats.toByteArray();
        }
    }

    public static bool handleNullOrEmptyArray<TSrc,TDst>(this TSrc[] array, Action<TDst[]> action)
    {
        if (array == null)
        {
            action(null);
            return true;
        }

        if (array.Length == 0)
        {
            action(new TDst[0]);
            return true;
        }

        return false;
    }

    private static ArrayHeader* getHeader(void* pBytes)
    {
        return (ArrayHeader*)pBytes - 1;
    }

    private static void toFloatArray(this byte[] bytes)
    {
        fixed (void* pArray = bytes)
        {
            var pHeader = getHeader(pArray);

            pHeader->type = FLOAT_ARRAY_TYPE;
            pHeader->length = (UIntPtr)(bytes.Length / sizeof(float));
        }
    }

    private static void toByteArray(this float[] floats)
    {
        fixed(void* pArray = floats)
        {
            var pHeader = getHeader(pArray);

            pHeader->type = BYTE_ARRAY_TYPE;
            pHeader->length = (UIntPtr)(floats.Length * sizeof(float));
        }
    }
}

And the usage is:

var floats = new float[] {0, 1, 0, 1};
floats.AsByteArray(bytes =>
{
    foreach (var b in bytes)
    {
        Console.WriteLine(b);
    }
});
Keyes answered 26/8, 2010 at 16:30 Comment(10)
-1 for being completely non-portable. Have you even tried this on a 64-bit machine?Terramycin
nope - it's a hack. If and when I get access to a 64 bit machine, I might check it out and perhaps adapt it. It is also not future proof. In CLR v.Next it might be completely broken. There is a trade-of here: You can use a more robust solution and pay in performance, or use the fastest way I can think of and live on the edge :-)Keyes
I got a chance to use this on a 64-bit machine, so I made the code portable.Keyes
+1 :-) Thanks for this! I use this method with custom structures, and it is indeed hellza helpful.Width
+1 Pretty rad. I must ask, did you find any documentation on the memory layout for the type and length "fields" (for lack of a better word) of the arrays? I mean, how did you come up with this: FLOAT_ARRAY = *(UIntPtr*)(((byte*) pFloats) - 2*PTR_SIZE); ?Ia
Note to self and others: This article gets to the deeper end of the pool regarding internal type representation for .NET 2.0. codeproject.com/Articles/20481/…Ia
Thanks. I deduced the array header metadata fields using "reverse engineering" and some trial and error: I opened a memory window in visual studio, tinkered with the values, and deduced the layout. I updated the code to make it a little clearer.Keyes
This hack is corrupting the internal garbage collector data structures. It will cause intermittent crashes, data corruptions, and security bugs of the same class as use-after-free in C++. Hacking internal garbage collector data structures like this is absolutely not supported by the .NET runtime. github.com/HelloKitty/Reinterpret.Net/issues/1 has a long discussion about the crashes that this hack will lead to.Mick
@JanKotas thanks for the discussion link. Very interesting! I guess I could pin the array for the entire scope of the As{Float,Byte}Array() functions to prevent such corruptions. What do you think?Keyes
@OmerMor, I think you are right because (a) the garbage collector won't move it while pinned, and (b) the garbage collector won't traverse it because it is an array of simple values.Resolvable
I
7

If you do not want any conversion to happen, I would suggest Buffer.BlockCopy().

public static void BlockCopy(
    Array src,
    int srcOffset,
    Array dst,
    int dstOffset,
    int count
)

For example:

float[] floatArray = new float[1000];
byte[] byteArray = new byte[floatArray.Length * 4];

Buffer.BlockCopy(floatArray, 0, byteArray, 0, byteArray.Length);
Inessential answered 6/3, 2009 at 15:5 Comment(8)
This will double the amount of memory allocation in addition to iterating over your two arrays (once to copy, once to write). Very inefficient both speed-wise and memory-wise. Not recommended.Bunn
Doesn't the last parameter need to be multiplied by sizeof(float)?Tryma
Actually, you should probably just use Buffer.ByteLength: msdn.microsoft.com/en-us/library/system.buffer.bytelength.aspxTryma
You are better off to just iterate over the float[] array and call Write for each float. This solution is highly inefficient.Bunn
Didn't know about that method, thanks! As for efficiency, whenever I have used BlockCopy, I had a byte[] and needed a float[] so there was no unneeded duplication. Plus if you stick with BlockCopy, you do not need unsafe code which can be advantageous. Pick the best method for your needs.Inessential
@Jeremy: I didn't either, until 5 seconds before that comment :) @Vlad: Please just rate it up or down. No need to repeatedly post the same comment (while advertizing for your answer). Let the asker and the users decide what is helpful. That's why the rating system exists.Tryma
Posted answer which confirms @Vlad's suspicionsTetrode
@rstevens: you would have to use Marshal.SizeOf(typeof(float)), but the CLI standard says sizeof(float) should be 32bits.Tetrode
D
3

You're better-off letting the BinaryWriter do this for you. There's going to be iteration over your entire set of data regardless of which method you use, so there's no point in playing with bytes.

Daffy answered 6/3, 2009 at 15:23 Comment(0)
B
1

Although you can obtain a byte* pointer using unsafe and fixed, you cannot convert the byte* to byte[] in order for the writer to accept it as a parameter without performing data copy. Which you do not want to do as it will double your memory footprint and add an extra iteration over the inevitable iteration that needs to be performed in order to output the data to disk.

Instead, you are still better off iterating over the array of floats and writing each float to the writer individually, using the Write(double) method. It will still be fast because of buffering inside the writer. See sixlettervariables's numbers.

Bunn answered 6/3, 2009 at 14:44 Comment(7)
Not sure what you mean. I just want byte-level indexing into the floating-point array (actually, I'm passing the array to a Writer).Octave
@Vlad: What is this supposed to mean? How can a datatype not be representable as bytes? See my answer.Tattletale
it means that the binary representation of (float)0 and that of (byte)0 are not the same (for one they don't have the same size.)Bunn
Doesn't seem to work: error CS1503: Argument '1': cannot convert from 'byte*' to 'byte[]'Octave
Vlad is correct, you cannot fake the bits in memory that consitute a float[] as a byte[]. You CAN get a byte* to the front of the arry which is likely sufficient for your needs but a byte* cannot be magiked into a byte[]Patiencepatient
Please see my edit which explains why, in my specific case, Jeremy's answer does indeed speed up execution as confirmed by a profiler.Octave
Actually you CAN fake the bits in memory to represent a byte[]. Check out my answer to see how it's done.Keyes
S
1

Using the new Span<> in .Net Core 2.1 or later...

byte[] byteArray2 = MemoryMarshal.Cast<float, byte>(floatArray).ToArray();

Or, if Span can be used instead, then a direct reinterpret cast can be done: (very fast - zero copying)

Span<byte> byteArray3 = MemoryMarshal.Cast<float, byte>(floatArray);
// with span we can get a byte, set a byte, iterate, and more.
byte someByte = byteSpan[2]; 
byteSpan[2] = 33;

I did some crude benchmarks. The time taken for each is in the comments. [release/no debugger/x64]

float[] floatArray = new float[100];
for (int i = 0; i < 100; i++) floatArray[i] = i *  7.7777f;
Stopwatch start = Stopwatch.StartNew();
for (int j = 0; j < 100; j++)
{
    start.Restart();
    for (int k = 0; k < 1000; k++)
    {
        Span<byte> byteSpan = MemoryMarshal.Cast<float, byte>(floatArray);
    }
    long timeTaken1 = start.ElapsedTicks; ////// 0 ticks  //////

    start.Restart();
    for (int k = 0; k < 1000; k++)
    {
        byte[] byteArray2 = MemoryMarshal.Cast<float, byte>(floatArray).ToArray();
    }
    long timeTaken2 = start.ElapsedTicks; //////  26 ticks  //////

    start.Restart();
    for (int k = 0; k < 1000; k++)
    {
        byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
        for (int i = 0; i < floatArray.Length; i++)
            BitConverter.GetBytes(floatArray[i]).CopyTo(byteArray, i * sizeof(float));
    }
    long timeTaken3 = start.ElapsedTicks;  //////  1310  ticks //////

    start.Restart();
    for (int k = 0; k < 1000; k++)
    {
        byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
        Buffer.BlockCopy(floatArray, 0, byteArray, 0, byteArray.Length);
    }
    long timeTaken4 = start.ElapsedTicks;  ////// 33 ticks  //////

    start.Restart();
    for (int k = 0; k < 1000; k++)
    {
        byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
        MemoryStream memStream = new MemoryStream();
        BinaryWriter writer = new BinaryWriter(memStream);
        foreach (float value in floatArray)
            writer.Write(value);
        writer.Close();
    }
    long timeTaken5 = start.ElapsedTicks;   ////// 1080 ticks   //////

    Console.WriteLine($"{timeTaken1/10,6} {timeTaken2 / 10,6} {timeTaken3 / 10,6} {timeTaken4 / 10,6} {timeTaken5 / 10,6} ");
}
Sharpie answered 8/5, 2022 at 23:24 Comment(1)
Neat, hadn't seen that one yet. Though you could also use .AsBytes() which might have slightly lower overheads since it doesn't need to validate the destination type & span lengths.Thracophrygian
D
0

We have a class called LudicrousSpeedSerialization and it contains the following unsafe method:

    static public byte[] ConvertFloatsToBytes(float[] data)
    {
        int n = data.Length;
        byte[] ret = new byte[n * sizeof(float)];
        if (n == 0) return ret;

        unsafe
        {
            fixed (byte* pByteArray = &ret[0])
            {
                float* pFloatArray = (float*)pByteArray;
                for (int i = 0; i < n; i++)
                {
                    pFloatArray[i] = data[i];
                }
            }
        }

        return ret;
    }
Docket answered 6/3, 2009 at 16:39 Comment(0)
C
-3

Although it basically does do a for loop behind the scenes, it does do the job in one line

byte[] byteArray = floatArray.Select(
                    f=>System.BitConverter.GetBytes(f)).Aggregate(
                    (bytes, f) => {List<byte> temp = bytes.ToList(); temp.AddRange(f); return temp.ToArray(); });
Counterstamp answered 6/3, 2009 at 15:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.