Convert char* to wchar* in C
Asked Answered
A

5

21

I would like to convert a char* string to a wchar* string in C.

I have found many answers, but most of them are for C++. Could you help me?

Thanks.

Anticatalyst answered 28/1, 2011 at 8:19 Comment(7)
What is the original encoding in your char*? UTF8? ANSI? What is the sizeof(wchar) on your system and what encoding does it rely upon? UCS-2 (16bit)? UCS-4 (32bit)?Amusing
@Benoit: Whoa... I thought sizeof(wchar) was always 2, no?Hibben
@Mehrdad: It is not necessarily 2. It is implementation-defined. If programming on Windows, it has a size of two bytes and holds UTF-16, with double wchar_t's for surrogate pairs.Amusing
@Benoit: o__O I did not know it's implementation-defined... interesting, thanks for the info.Hibben
It's on unix system, so i guess it doesn't matter no ?Anticatalyst
I forget which system (Linux maybe?), that uses a 4-byte wchar_t encoded with UTF-32.Uno
Yes 4 byte, printf("Size of wchar_t : %d",sizeof(wchar_t)); -> "Size of wchar_t : 4" So, how can i convert string into unidocde ?Anticatalyst
V
30

Try swprintf with the %hs flag.

Example:

wchar_t  ws[100];
swprintf(ws, 100, L"%hs", "ansi string");
Vocable answered 28/1, 2011 at 8:36 Comment(10)
i will try this evening , for now i don't have access to a shell.ThanksAnticatalyst
@NickDandoulakis I think this answer could be very useful, however I found out that swprintf could have 2 possible interfaces, could you please take a look at this question? https://mcmap.net/q/659985/-understanding-which-swprintf-will-be-used-or-again-convert-a-char-string-to-wchar_t/2436175Photomicroscope
@Photomicroscope the interface that requires the buffer length is the portable one.Vocable
@NickDandoulakis It won't compile on Mingw 4.5.2 for example, so unfortunately is not general!Photomicroscope
this is a good solution when cross compiling to mingw on a linux platformIncome
Note that this appears to work with any of the printf functions, fortunately.Ailbert
Can you explain why %hs is the correct flag ? I tried %ls but it doesn't work (at least on windows). According to this article %hs is used for "narrow" string while %ls is used for "wide" string. I'm using wchar_t so I thought that the correct choise was %ls for "wide" string but I was wrong.Metalware
@Bemipefe, the %hs specifies the type of the argument, e.g. "ansi string" which is a narrow string.Vocable
@NickDandoulakis Thanks. I feel a little bit stupid. You are right. The whole printed string will be a wide string but you have to specify the original (input) format.Metalware
@Bemipefe, no problem. No such thing as a stupid questionVocable
H
5

setlocale() followed by mbstowcs().

Hibben answered 28/1, 2011 at 8:23 Comment(6)
This is OK as long as the input is an ANSI string.Amusing
@Benoit: Yeah, there's obviously more to string conversion than calling just a single function. But I didn't give any details since I think this is all the OP's looking for...Hibben
The imput come from LdapDirectory, so i guess it's an UTF8 ?Anticatalyst
@Benoit: There's no such thing as an "ANSI string". This will work if the original string is in the multibyte format corresponding to the currently set locale.Ethben
I already have found this function, but i can't use it correctly, i just want to encode a string to unicode to send in a mail subject header. Thanks to youAnticatalyst
@Crupuk: What format is the source string in? If it's just ASCII, and you want to use it in a UTF-8 mail header, then no transformation is needed.Ethben
T
4

what you're looking for is

mbstowcs

works just like the copy function from char* to char*

but in this case you're saving into a wchar_t*

Trypsin answered 18/11, 2012 at 9:35 Comment(0)
F
0

If you happen to have the Windows API availiable, the conversion function MultiByteToWideChar offers some configurable string conversion from different encodings to UTF-16. That might be more appropriate if you don't care too much about portability and don't want to figure out exactly what the implications of different locale settings are to the string converison.

Fruma answered 28/1, 2011 at 8:49 Comment(0)
D
-4

if you currently have ANSI chars. just insert an 0 ('\0') before each char and cast them to wchar_t*.

Dual answered 8/4, 2016 at 6:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.