Getting a string from an unsafe byte pointer to a fixed char array
Asked Answered
A

2

6

I'm trying to understand how to get a string from an unsafe byte pointer in the following struct. SDL_TEXTINPUTEVENT_TEXTSIZE is 32.

[StructLayout(LayoutKind.Sequential)]
public unsafe struct SDL_TextInputEvent
{
    public SDL_EventType type;
    public UInt32 timestamp;
    public UInt32 windowID;
    public fixed byte text[SDL_TEXTINPUTEVENT_TEXT_SIZE];
}

I've tried:

byte[] rawBytes = new byte[SDL_TEXTINPUTEVENT_TEXT_SIZE];

unsafe
{
    Marshal.Copy((IntPtr)rawEvent.text.text, rawBytes, 0, SDL_TEXTINPUTEVENT_TEXT_SIZE);
}

string text = System.Text.Encoding.UTF8.GetString(rawBytes);

Which kind of works, but gives me a string with a lot of extra bytes beyond the character that was actually entered. Should I parse the byte array and search for a 0-terminating character to avoid the excess?

Am I totally misunderstanding something?

For reference, the original C struct that is being marshaled into the .NET runtime is:

typedef struct SDL_TextInputEvent
{
    Uint32 type;
    Uint32 timestamp;
    Uint32 windowID;
    char text[SDL_TEXTINPUTEVENT_TEXT_SIZE];
} SDL_TextInputEvent;
Atalayah answered 9/1, 2014 at 0:35 Comment(2)
Do you need to actually use byte in the .NET struct? You should be able to keep the char array signature I would think and then turn that into a trimmed string when needed.Donaghue
@Ty char is 16 bit in C# but B bit in the unmanaged codeAeroneurosis
A
7

You do need to locate the null-terminator. And Marshal.Copy will not do that. You could use Marshal.PtrToStringAnsi if your text was ANSI encoded. But there's no such function for UTF-8. So you need to iterate over the array looking for a zero byte. When you encounter that you know the actual length of the buffer and you can modify your existing code to use that length rather than the maximum possible length.

Aeroneurosis answered 9/1, 2014 at 2:40 Comment(1)
If it is any consolation, Marshal.PtrToStringUTF8 exists in .NET Core and I've just used it for exactly the same purpose as the OP whilst converting a Lazy Foo tutorial form C++ to C#.Impetigo
A
0

I just encountered the same issue with .NET Core. Fortunately, since .NET Core 1.1 / .NET Standard 2.1, there is a method Marshal.PtrToStringUTF8, which offers conversion of native UTF-8 strings.

Given this struct:

[StructLayout(LayoutKind.Sequential, Pack = 1)]
struct NativeType
{
    public int SomeNumber;
    public unsafe fixed byte SomeString[16];
}

We can decode binary data as ASCII and UTF-8 as follows:

var byteArrayAscii = new byte[] { 0x78, 0x56, 0x34, 0x12, 0x41, 0x53, 0x43, 0x49, 0x49, 0x21, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 };
var byteArrayUtf8 = new byte[] { 0xef, 0xcd, 0xab, 0x89, 0x45, 0x6d, 0x6f, 0x6a, 0x69, 0x3a, 0x20, 0xf0, 0x9f, 0x91, 0x8d, 0x21, 0x00, 0x00, 0x00, 0x00 };

using var outputStream = File.OpenWrite("output.txt");
using var outputWriter = new StreamWriter(outputStream);

unsafe
{
    var decoded1 = MemoryMarshal.Read<NativeType>(byteArrayAscii);
    outputWriter.WriteLine($"Number 1: {decoded1.SomeNumber:x8}");
    outputWriter.WriteLine($"String 1: {Marshal.PtrToStringAnsi(new IntPtr(decoded1.SomeString))}");
}

unsafe
{
    var decoded2 = MemoryMarshal.Read<NativeType>(byteArrayUtf8);
    outputWriter.WriteLine($"Number 2: {decoded2.SomeNumber:x8}");
    outputWriter.WriteLine($"String 2: {Marshal.PtrToStringUTF8(new IntPtr(decoded2.SomeString))}");
}

Output:

Number 1: 12345678
String 1: ASCII!
Number 2: 89abcdef
String 2: Emoji: 👍!

(contains "thumbsup" emoji, may be rendered incorrectly by some browsers)

Notes:

  • The native string must be 0-terminated.
  • Using char for native strings does not work for ASCII or UTF-8 encoded data, since in C# char always has a size of 16 bits (UTF-16):

    Fixed size char buffers always take two bytes per character, regardless of the encoding. This is true even when char buffers are marshaled to API methods or structs with CharSet = CharSet.Auto or CharSet = CharSet.Ansi.

Altitude answered 1/12, 2020 at 9:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.