Are resource files compiled as UNICODE or ANSI code-page?
Asked Answered
B

3

11

First - my apologies if this has been answered a hundred times over! D'oh!

But my search-fu apparently sucks, as I'm having no luck answering this basic question:

How are resources stored in the EXE/DLL? As UNICODE (UCS-2, Windows native internal character format), or as multibyte characters using the code-page of the resources block?

  • How does one embed UNICODE strings into one's resources (.rc)?
  • Can UNICODE (UCS-2) text be inserted into the language strings from within VS 2012?
  • Is Windows still using UCS-2, or is it using UTF16 internally?

I'm just looking for general answers, or links to details, rather than a detailed how-to for putting a UNICODE string into an .rc string table. Thanks!

Blabbermouth answered 2/10, 2012 at 14:56 Comment(1)
This isn't directly related to your question, but since Windows 2000 the internal character format of Windows has been UTF-16. The differences between UTF-16 and UCS-2 are few, but they exist.State
D
7

All resource strings in WIN32 are compiled as Unicode. See here for more info. The .rc script itself can be ANSI (using the local codepage) or UCS-2 with the appropriate BOM (reference).

Dodge answered 2/10, 2012 at 15:2 Comment(7)
THe RC script can be UCS-2 as well.Geoid
Thanks for the edit, Deanna. I was just about to paste that in.Dodge
So, bottom line, it doesn't matter whether the EXE/DLL project is set to MBCS or UNICODE - the binary resources within the EXE/DLL are in UNICODE for things like dialogs, string tables, and so on?Blabbermouth
RC script supports Unicode strings, but does RC editor in Visual Studio? Can I enter a string with both Arabic and Hebrew symbols from RC editor?Mikiso
Quoting from the resource compiler docs, "The Win32 resource compiler can process files encoded in Unicode, but you would need to create such a file using a Unicode-enabled editor." The Localization team we use at my work usually use their own editors which support both the language they're translating as well as setting the byte order mark (BOM) at the head of the file. FWIW, Notepad actually supports this as well (and is often a source of great frustration for developers reading a text file in code saved as Unicode using that feature in notepad.exe).Dodge
@Mikiso Yes it does, both when editing the RC file (UCS-2 with BOM) as code and using the resource editor. Note that it doesn't seem to create them that way so you'll need to use notepad or edit as code to save as unicode first. The codepage is specified in the RC itself.Geoid
To clarify my earlier comment the codepage is only saved out and used when the file is in ANSI format. It's not needed at all when dealing with UCS-2 files.Geoid
J
4

If in doubt take a look at the hex. Here the start of notepad.exe's rc file, in UTF16:

0002ed60  01 00 53 00 74 00 72 00  69 00 6e 00 67 00 46 00  |..S.t.r.i.n.g.F.|
0002ed70  69 00 6c 00 65 00 49 00  6e 00 66 00 6f 00 00 00  |i.l.e.I.n.f.o...|
0002ed80  a6 02 00 00 01 00 30 00  34 00 30 00 39 00 30 00  |......0.4.0.9.0.|
0002ed90  34 00 42 00 30 00 00 00  4c 00 16 00 01 00 43 00  |4.B.0...L.....C.|
0002eda0  6f 00 6d 00 70 00 61 00  6e 00 79 00 4e 00 61 00  |o.m.p.a.n.y.N.a.|
0002edb0  6d 00 65 00 00 00 00 00  4d 00 69 00 63 00 72 00  |m.e.....M.i.c.r.|
0002edc0  6f 00 73 00 6f 00 66 00  74 00 20 00 43 00 6f 00  |o.s.o.f.t. .C.o.|
0002edd0  72 00 70 00 6f 00 72 00  61 00 74 00 69 00 6f 00  |r.p.o.r.a.t.i.o.|
Josuejosy answered 2/10, 2012 at 15:0 Comment(1)
This does not answer the question. The question is, whether Unicode characters can be placed in a .rc file. You cannot answer that by looking at the output of the resource compiler. Besides, nothing in the hexdump identifies UTF-16. This could be UCS-2 just as well.Cynosure
B
3

There is a good writeup of the issue here.

The Resource Compiler defaults to CP_ACP, even in the face of subtle hints that the file is UTF-8 https://devblogs.microsoft.com/oldnewthing/20190607-00/?p=102569

Blackmail answered 16/2, 2022 at 7:38 Comment(1)
Please read "How to Answer" and "Link-only answers"Misleading

© 2022 - 2024 — McMap. All rights reserved.