In Ruby, Javascript and Java (others I didn't try), have cyrillic chars Я̆ Я̄ Я̈ length 2. When I try to check length of string with these chars indside, I get bad output value.
"Я̈".mb_chars.length
#=> 2 #should be 1 (ruby on rails)
"Я̆".length
#=> 2 #should be 1 (ruby, javascript)
"Ӭ".length
#=> 1 #correct (ruby, javascript)
Please note, that strings are encoded in UTF-8 and each char behave as single character.
My question is why is there such behaviour and how can I get length of string correctly with these chars inside?
"Я̈ "
which has a space in it, same with the second example, but not the third. Check with"Я̈ ".chars
which gives["Я", "̈", " "]
for me, the accent as a separate char. – Mingo2.4.2 :002 > 'Я̆'.length => 2
– Ignescent