Fast string to byte[] conversion
Asked Answered
S

1

8

Currently I am using this code for converting string to byte array:

var tempByte = System.Text.Encoding.UTF8.GetBytes(tempText);

I call this line very often in my application, and I really want to use a faster one. How can I convert a string to a byte array faster than the default GetBytes method? Maybe with an unsafe code?

Schopenhauer answered 28/11, 2013 at 19:27 Comment(9)
Are you a) actually running into performance problems and b) sure it is this part that is causing those problems?Ablaut
I like to optimize the code, and this line is the most critical one in time according to the profiler.Schopenhauer
Why would unsafe code help? What makes you think this code is a bottleneck? What makes you think it can be improved? What are your performance requirements?Horseradish
GetBytes does use unsafe code already.Smog
First, why do you want to optimize it? Is it actually problematic as it is? And second, have you considered optmizing the code, instead of trying to make the most-called-function faster? Perhaps you can do other things like loop unrolling or a better algorithm that will call this method less often. Use caching, dynamic programming, etc, etc. More often than not, trying to optimize a built-in function is not the way to go.Ablaut
If you need to be using UTF8 a lot, it might be faster to simply work with byte arrays rather than convert from Unicode to UTF8 all the time.Smog
I dont know if this could be improved, that is why asked the question. A lot of built in functions can be outrun by a faster implementation, like the GDI or the Crypto ones.Schopenhauer
Peter Ritchie just gave me an idea, thank you, it can be a huge improvement!Schopenhauer
How about that approach: #473406?Hepcat
F
14

If you don't care too much about using specific encoding and your code is performance-critical (for instance it's some kind of DB serializer and needs to be run millions of times per second), try

fixed (void* ptr = tempText)
{
    System.Runtime.InteropServices.Marshal.Copy(new IntPtr(ptr), tempByte, 0, len);
}

Edit: Marshal.Copy was around ten times faster than UTF8.GetBytes and gets you UTF-16 encoding. For converting it back to string you can use:

fixed (byte* bptr = tempByte)
{
    char* cptr = (char*)(bptr + offset);
    tempText = new string(cptr, 0, len / 2);
}
Forwards answered 28/11, 2013 at 21:8 Comment(7)
This is utterly bizarre. Optimise converting to UTF8 by, er, what exactly?Horseradish
By using UTF-16 instead of UTF-8 and expliting fact, that internal memory representation of .NET string is already in that format and all you need to do to get it is copy memory block instead of actually converting string character by character to desired encoding.Forwards
I just cannot see how it relates to the question which clearly and deliberately converts to UTF8. If you want a UTF16 representation then the code in your answer is just as pointless. Just take a copy of the string reference! Why even bother with byte[]. And the use of unsafe code here seems pointless also.Horseradish
I had very silimar problem to Wheeler and for my project speed was much more important than particular encoding used (as long as there was fast way to decode it as well), so I shared my opinion on this topic. Wheeler wrote he needs to convert string to byte array and my code snippets do just that. If you disagree with my answer, you are free to downvote it and provide yours.Forwards
I'm coming at this from the perspective of answering the question that was asked rather than solving the problem of the question asker.Horseradish
@Forwards "If you don't care too much about using specific encoding". My comment will be "you have to". The problem with this approach is endianness. This code is dangerous if you want to use it on different machines. Maybe it works in many situations, but it is contrary to the standards. It probably causes problems when you want to scale. You should care about encoding after all. To solve performance problems you'd better deal with binary arrays instead.Tiptop
how to use? is that a method? and len was undefinedLarrainelarrie

© 2022 - 2024 — McMap. All rights reserved.