Getting null terminated string from System.Text.Encoding.Unicode.GetString
Asked Answered
C

4

18

I have an array of bytes that I receive from an external entity. It is a fixed size. The bytes contain a unicode string, with 0 values to pad out the rest of the buffer:

So the bytes might be:

H \0 E \0 L \0 L \0 \0 \0 \0 \0 \0 ... etc 

I'm getting that buffer and converting it to a string like so:

byte[] buffer = new byte[buffSize];
m_dataStream.Read(buffer, 0, buffSize);
String cmd = System.Text.Encoding.Unicode.GetString(buffer);

What I get back is a string that looks like this:

"HELLO\0\0\0\0\0\0\0\0..."

How can I tell GetString to terminate the string at the first Unicode null (ie so I just get back "HELLO")?

Thanks for any input.

Cashbox answered 14/5, 2009 at 16:7 Comment(0)
C
18

If you're sure the rest is all \0, this would work:

cmd = cmd.TrimEnd('\0');

Otherwise, if you just want to get everything before the first null:

int index = cmd.IndexOf('\0');
if (index >= 0)
   cmd = cmd.Remove(index);

Note that Unicode.GetString will take care of double \0s. You should just look for a single \0.

Cither answered 14/5, 2009 at 16:11 Comment(0)
F
5

For UTF8/ASCII encodings you can achieve this without reprocessing the string by looking for the first occurrence of the null terminator in the buffer (using System.Array.IndexOf). You can then use the overloaded System.Text.Encoding.Unicode.GetString method to create a string up to the given buffer size.

The example below also caters for a buffer containing no null bytes:

byte[] buffer = new byte[buffSize];
m_dataStream.Read(buffer, 0, buffSize);
var size = System.Array.IndexOf(buffer, (byte)0);
String cmd = System.Text.Encoding.Unicode.GetString(buffer, 0, size < 0 ? buffSize : size);

For UTF16 you could use a similar approach with a for loop (looking for the first pair of null characters ... such as if (buffer[i] == (byte)0 && buffer[i] == buffer[i+1]).

If creating temporary strings is of no concern then the accepted answer is the best solution.

Floozy answered 23/3, 2018 at 10:7 Comment(2)
WARNING: this does not work with multibyte character encodings (like Unicode/UTF-16 from your example) where 'abc' would be encoded as {97, 0, 98, 0, 99, 0} (and an optional two bytes long null-terminator). You also cannot find the latest non-null byte, because the preceding character would no longer be encoded properly (for 'a', you would have the invalid {97} instead of {97,0})Annulation
@Annulation - fixedFloozy
O
2

The easiest way would be to trim the string after conversion, as already suggested.

If you know the character count in advance, you could use the GetString overload that takes a start index and a count of bytes, in order to get the correct string, without trimming.

If you do not know the character count in advance, and would like to avoid trimming the string afterwards, you need to trim the byte array first, so you only pass the bytes you are interested in. For Unicode, this would mean removing any bytes after and including the first pair of zeroes.

Ommatidium answered 14/5, 2009 at 16:15 Comment(0)
D
0

You can get the length from Stream.Read(). In this case, the \0 from the stream will not be counted and you will get a length of 5. Then you can trim your string with Encoding.UTF8.GetString by the length.

int length = peerStream.Read(buffer, 0, buffer.Length);
receive = Encoding.UTF8.GetString(buffer, 0, length);
Deceit answered 25/5, 2021 at 12:2 Comment(2)
I believe the answer from Class Skeleton covers your suggestion as well. You could perhaps explain your code better.Papacy
This is so typical for StackOverflow. This is the actual correct answer, the others are unnecessarily convoluted and use additional steps for no good reason. This should be the accepted answer. Why would you do an expensive array search, or a trim which may fail when the buffer has been used before with longer values, or a string search? The amount of data that has been read is already available and GetString has the ability to limit the length to that amount. This website drives me crazy.Patron

© 2022 - 2024 — McMap. All rights reserved.