Strings in C# are stored as arrays of char
. That is to say: they are arrays of UTF-16 code units. ToCharArray()
just returns that UTF-16 array. And it sometimes takes multiple code units to form a single "symbol".
Would char.GetUnicodeCategory(char)
be of any help? Maybe you could split that array on OtherLetter
or something (not familiar with Hebrew)?
const string word = "כֶּלֶב";
Console.WriteLine(word.Length);
Console.WriteLine(string.Join(" ", word.ToCharArray().Select(x => (int)x)));
Console.WriteLine(string.Join(" ", word.ToCharArray().Select(char.GetUnicodeCategory)));
Output:
6
1499 1468 1462 1500 1462 1489
OtherLetter NonSpacingMark NonSpacingMark OtherLetter NonSpacingMark OtherLetter
word.Length()
gives you the correct value? – Gustavogustavuschar
.. try writingchar y = 'לֶ';
orchar x = 'כֶּ';
-> No you simply can not force a symbol that is combined of 2 or 3char
into a singlechar
... nevertheless, interesting case though ;) – AvariaTrim
is for whitesapce so don't see how that's relevant. Yes they are different letters, but the font will specify how to produce the glyph overlapping the previous letter. So you need to draw the string, then scroll the result – Guard