About the "Character set" option in Visual Studio
Asked Answered
H

2

31

I have an inquiry about the "Character set" option in Visual Studio. The Character Set options are:

  • Not Set
  • Use Unicode Character Set
  • Use Multi-Byte Character Set

I want to know what the difference between three options in Character Set?

Also if I choose something of them, will affect the support for languages ​​other than English (like RTL languages)?

Headpin answered 19/2, 2012 at 12:58 Comment(0)
P
33

It is a compatibility setting, intended for legacy code that was written for old versions of Windows that were not Unicode enabled. Versions in the Windows 9x family, Windows ME was the last and widely ignored one. With "Not Set" or "Use Multi-Byte Character Set" selected, all Windows API functions that take a string as an argument are redefined to a little compatibility helper function that translates char* strings to wchar_t* strings, the API's native string type.

Such code critically depends on the default system code page setting. The code page maps 8-bit characters to Unicode which selects the font glyph. Your program will only produce correct text when the machine that runs your code has the correct code page. Characters whose value >= 128 will get rendered wrong if the code page doesn't match.

Always select "Use Unicode Character Set" for modern code. Especially when you want to support languages with a right-to-left layout and you don't have an Arabic or Hebrew code page selected on your dev machine. Use std::wstring or wchar_t[] in your code. Getting actual RTL layout requires turning on the WS_EX_RTLREADING style flag in the CreateWindowEx() call.

Proof answered 19/2, 2012 at 15:4 Comment(0)
D
14

Hans has already answered the question, but I found these settings to have curious names. (What exactly is not being set, and why do the other two options sound so similar?) Regarding that:

  • "Unicode" here is Microsoft-speak for UCS-2 encoding in particular. This is the recommended and non-codepage-dependent described by Hans. There is a corresponding C++ #define flag called _UNICODE.
  • "Multi-Byte Character Set" (aka MBCS) here the official Microsoft phrase for describing their former international text-encoding scheme. As Hans described, there are different MBCS codepages describing different languages. The encodings are "multi-byte" in that some or all characters may be represented by multiple bytes. (Some codepages use a variable-length encoding akin to UTF-8.) Your typical codepage will still represent all the ASCII characters as one-byte each. There is a corresponding C++ #define flag called _MBCS
  • "Not set" apparently refers to compiling with_UNICODE nor _MBCS being #defined. In this case Windows works with a strict one-byte per character encoding. (Once again there are several different codepages available in this case.)

Difference between MBCS and UTF-8 on Windows goes into these issues in a lot more detail.

Diplodocus answered 12/5, 2013 at 1:11 Comment(2)
Too bad microsoft refuses to add support for UTF-8Epiphora
@Epiphora Things changed somewhat in 2019.Pectin

© 2022 - 2024 — McMap. All rights reserved.