I am trying to write code to compare two string. In windows i can use strcmp but i want write for multibyte character string so that it compatible to all other platform Can i use memcmp? if no then is there any other API i can use or i need to write my own API.
You have to be careful. I'm not an expert on Unicode/multi byte encodings, but I know that with diacritics sometimes two strings can be considered equal when their bytes are not exactly the same. It's recommended to use pre-tested APIs, because string encodings can get pretty messy.
See the old new thing on case mapping. I can't think of a reference for the diacritics but if I do I'll post it.
memcmp
will work. For 100% correctness, and especially if Unicode in any form is involved, memcmp
will not work. Even simple characters like é
can be represented more than one way--either as é
(one Unicode character), or as ´
combined with e
(in two Unicode characters). Most of the time, these don't get mixed and matched, so you might not see any problems at first, but eventually it will bite you. –
Striation i
is not I
, it's İ
(I
with a dot above it) and the lower case of I
isn't i
, it's ı
(dotless i
), in which case you need to know the language in which a word is written. :) –
Intercom If the two strings are using the same encoding, you can use memcmp
. If they are using UTF-8 and your strings don't contain the NULL character (U+0000), you could even use strcmp
, since, in the absence of NULL itself, 0 does not appear in UTF-8 encoded strings. Another option is to convert your strings to wide characters using mbstowcs
.
If the strings both use the same encoding, memcmp
will work fine. Keep in mind that wide characters are different sizes on different platforms, however.
If the strings use different encodings, you will need a library such as ICU to deal with it.
© 2022 - 2024 — McMap. All rights reserved.