I'm not quite pro with encodings, but here's what I think I know (though it may be wrong):
- ASCII is a 7-bit, fixed-length encoding, with the characters you can find in ASCII charts.
- UTF8 is an 8-bit, variable-length encoding. All characters can be written in UTF8.
- UCS-2 LE/BE are fixed-length, 16-bit encodings that support most common characters.
- UTF-16 is a 16-bit, variable-length encoding. All characters can be written in UTF16.
Are those above all correct?
Now, for the questions:
- Do the Windows "A" functions (like
SetWindowTextA
) take in ASCII strings? Or "multi-byte strings" (more questions on this below)? - Do the Windows "W" functions take in UTF-16 strings or UCS-2 strings? I thought they take in UCS-2, but the names confuse me.
- In WideCharToMultiByte, Microsoft uses the word "wide-character string" to mean UTF-16. In that context, then what is considered a "multi-byte string"? UTF-8?
- Is
LPWSTR
a "wide-character string"? I would say it is, but then, wouldn't that mean it's UTF-16? And wouldn't that mean that it could be used to display, say, 4-byte characters? If not, then... is displaying 4-byte characters impossible? (Windows doesn't seem to have APIs for those.) - Is the functionality of
WideCharToMultiByte
a superset of that ofwcstombs
, and do they both work on the same type of string? Or does one, say, work on UTF-16 while the other works on UCS-2? - Are file paths in UTF-16 or UCS-2? I know Windows treats it as an "opaque array of characters" from Microsoft's documentation, but per the C standard for functions like
fwprintf
, is there any standardized encoding? - What is "ANSI" encoding? Is that even a correct term? And how does it relate to ASCII?
- (I had more questions, but this is enough... I forgot some of them anyway...)
These are a lot of questions, so any links to explanations about how all these connect (aside from reading the Unicode standard, which won't help with the Windows API anyway) would also be greatly appreciated.
Thank you!