Unicode RTF text in RichEdit

Asked 23/11, 2009 at 11:0 Answered 19/10, 2011 at 15:4

I'm having trouble getting a RichEdit control to display unicode RTF text. My application is Unicode, so all strings are wchar_t strings.
If I create the control as "RichEdit20A" I can use e.g. SetWindowText, and the text is displayed with the proper formatting. If I create the control as "RichEdit20W" then using SetWindowText shows the text verbatim, i.e. all the RTF code is displayed. The same happens if I use the EM_SETTEXTEX parameter, specifying codepage 1200 which MSDN tells me is used to indicate unicode.
I've tried using the StreamIn function, but this only seems to work if I stream in ASCII text. If I stream in widechars then I get empty text in the control. I use the SF_RTF|SF_UNICODE flags, and MSDN hints that this combination may not be allowed.

So what to do? Is there any way to get widechars into a RichEdit without losing RTF interpretation, or do I need to encode it? I've thought about trying UTF-8, or perhaps use the encoding facilities in RTF, but am unsure what the best choice is.

Martita answered 23/11, 2009 at 11:0 Comment(0)

I had to do this recently, and noticed the same sorts of observations you're making.

It seems that, despite what MSDN almost suggests, the "RTF" parser will only work with 8-bit encodings. So what I ended up doing was using UTF-8, which is an 8 bit encoding but still can represent the full range of Unicode characters. You can get UTF-8 from a PWSTR via WideCharToMultiByte():

PWSTR WideString = /* Some string... */;
DWORD WideLength = wcslen(WideString) + 1;
PSTR Utf8;
DWORD Length;
INT ReturnedLength;

// A utf8 representation shouldn't be longer than 4 times the size
// of the utf16 one.
Length = WideLength * 4;
Utf8 = malloc(Length);
if (!Utf8) { /* TODO: handle failure */ }

ReturnedLength = WideCharToMultiByte(CP_UTF8,
                                     0,
                                     WideString,
                                     WideLength-1,
                                     Utf8,
                                     Length-1,
                                     NULL,
                                     NULL);
if (ReturnedLength)
{
   // Need to zero terminate...
   Utf8[ReturnedLength] = 0;
}
else { /* TODO: handle failure */ }

Once you have it in UTF-8, you can do:

SETTEXTEX TextInfo = {0};

TextInfo.flags = ST_SELECTION;
TextInfo.codepage = CP_UTF8;

SendMessage(hRichText, EM_SETTEXTEX, (WPARAM)&TextInfo, (LPARAM)Utf8);

And of course (I left this out originally, but while I'm being explicit...):

free(Utf8);

Inconformity answered 23/11, 2009 at 11:9 Comment(0)

RTF is ASCII, any charactor out of ASCII would be encoded using escape sequence. RTF 1.9.1 specification (March 2008)

Dreher answered 23/11, 2009 at 17:14 Comment(1)

The file format spec may say that, but you can get the RichEdit control to load RTF with any multi-byte encoding, including CP_UTF8 as shown in my answer. – Inconformity 23/11, 2009 at 18:35

Take a look at \uN literal in rtf specification so you have to convert your wide string to string of unicode characters like \u902?\u300?\u888? http://www.biblioscape.com/rtf15_spec.htm#Heading9 The numbers in this case represent the characters decimal code and the question mark is the character which will replace the unicode char in case if RichEdit does not support unicode (RichEdit v1.0).

For example for unicode string L"TIME" the rtf data will be "\u84?\u73?\u77?\u69?"

Unclose answered 19/10, 2011 at 15:4 Comment(1)

Its one of the solutions as I have successfully tested, but not very convenient. In this case rtf string like for example "{\\rtf1 bla bla \\u931?}" (syntax of C++ language is used for coding backslash characters) can be coded in UTF-16 and will be correctly displayed. When coded in UTF-16 directly (without control word \uN) i.e. "{\\rtf1 bla bla Σ}", greek letter sigma (code 0x03A3) will be displayed as pound £ (code 0xA3). – Niehaus 24/5, 2012 at 8:46

Recommended topics

Hot tags