Windows API ANSI functions and UTF-8

M

3

27

Is it possible to use Windows API ANSI functions with UTF-8 strings?

For example, say I have a path encoded in UTF-8. Can I call CreateDirectoryA or CreateFileA and use a UTF-8 path, or do I have to perform some conversion before calling the functions?

Miley answered 12/1, 2012 at 6:47 Comment(5)

Yikes. Why would anyone want that? I think we're way past Windows ME now (which was the last Windows version ever to need the ANSI APIs). They should die out already, especially for newly-developed applications. – Leid 12/1, 2012 at 7:22

From where are you obtaining UTF-8 strings? It's much easier to convert your application to work entirely with UTF-16 strings, as the so-called wide-versions Windows API functions require. And as Joey says, always call the wide versions (with the W suffix), not the ANSI versions. Those have been obsolete for decades. – Stenograph 12/1, 2012 at 11:27

@Joey: Because an awful lot of C(++) libraries (including the standard library!) prefer to work with char-based strings rather than wchar_t-based strings. If Windows fully supported UTF-8, then you could just use UTF-8 throughout your program instead of having to convert between UTF-8 and UTF-16 all the time. – Adne 12/1, 2012 at 16:39

@dan04: UTF-16 is the best Unicode encoding for processing (UTF-8 is OK for storage), see this interesting article: unicode.org/notes/tn12 (note also that both C# and Java use UTF-16 encoding for their string classes). – Grizzled 14/1, 2012 at 22:22

@Grizzled UTF-16-processing code is no less complex than UTF-8-processing code. UTF-32-processing code is much simpler. – Silvanus 14/4, 2014 at 6:48

C

17

No. Use MultiByteToWideChar to convert UTF-8 to UTF-16 and then call the wide character APIs such as CreateDirectoryW or CreateFileW.

Chemiluminescence answered 12/1, 2012 at 6:52 Comment(4)

I would also add that since Windows uses UTF-16 exclusively, it might be best for you to follow suit and work with UTF-16 for the most part, and only do the conversion to UTF-8 when you need to read/write from external sources. – Chemiluminescence 12/1, 2012 at 7:9

@casablanca: Another approach that's been advocated is to use UTF-8 for the most part and convert to and from UTF-16 only when talking to the Windows interface. – Breault 14/8, 2014 at 15:54

@Chemiluminescence that will cause some serious headaches with the C++ standard library unfortunately, stuff like exception messages is hard coded to be char, not wchar_t. There are some people who suggest not putting unicode in exception messages, but this is not very practical, because if you need to communicate something like "Cannot open file 바위처럼 단단한.txt" or "Record with name 바위처럼 단단한 does not exist" in an exception you won't be able to easily do it. Saying "exceptions don't need unicode" really means "your whole codebase uses unicode only for display purposes". – Docile 2/4, 2021 at 14:24

This answer is out of date, please refer to the updated answer below. – Sprue 18/3, 2022 at 21:45

L

23

The accepted answer is no longer correct (as of Windows Version 1903 (May 2019 Update)).

An application can now set the active code page of the process to UTF-8. This allows ...A functions (and CP_ACP) to work with UTF-8. A manifest to do that looks like this

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
  <assemblyIdentity type="win32" name="..." version="6.0.0.0"/>
  <application>
    <windowsSettings>
      <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
    </windowsSettings>
  </application>
</assembly>

Source and additional information: Use the Windows UTF-8 code page

Longshoreman answered 14/9, 2021 at 16:29 Comment(0)

C

17

No. Use MultiByteToWideChar to convert UTF-8 to UTF-16 and then call the wide character APIs such as CreateDirectoryW or CreateFileW.

Chemiluminescence answered 12/1, 2012 at 6:52 Comment(4)

I would also add that since Windows uses UTF-16 exclusively, it might be best for you to follow suit and work with UTF-16 for the most part, and only do the conversion to UTF-8 when you need to read/write from external sources. – Chemiluminescence 12/1, 2012 at 7:9

@casablanca: Another approach that's been advocated is to use UTF-8 for the most part and convert to and from UTF-16 only when talking to the Windows interface. – Breault 14/8, 2014 at 15:54

@Chemiluminescence that will cause some serious headaches with the C++ standard library unfortunately, stuff like exception messages is hard coded to be char, not wchar_t. There are some people who suggest not putting unicode in exception messages, but this is not very practical, because if you need to communicate something like "Cannot open file 바위처럼 단단한.txt" or "Record with name 바위처럼 단단한 does not exist" in an exception you won't be able to easily do it. Saying "exceptions don't need unicode" really means "your whole codebase uses unicode only for display purposes". – Docile 2/4, 2021 at 14:24

This answer is out of date, please refer to the updated answer below. – Sprue 18/3, 2022 at 21:45

G

3

An easier approach (than using raw Win32 API MultiByteToWideChar) would be to use ATL conversion helpers, like CA2CW. You can specify CP_UTF8 as code page (second parameter in the constructor), to convert from Unicode UTF-8 to Unicode UTF-16:

CreateDirectoryW( 
  CA2W( utf8Name, CP_UTF8 ) // convert from UTF-8 to UTF-16
  ... // other stuff
);

Note that in Unicode builds (which should be the default ones these days), CreateDirectory just expands to CreateDirectoryW, so I would just drop the ending "W" and use the (IMHO, more readable) CreateDirectory:

CreateDirectory( 
  CA2W( utf8Name, CP_UTF8 ) // convert from UTF-8 to UTF-16
  ... // other stuff
);

Grizzled answered 14/1, 2012 at 18:18 Comment(0)

Recommended topics

Hot tags