in my js I am trying to substring()
text which generally works but unfortunately decapitates emojis.
usaText = "A๐บ๐ธZ"
splitText = usaText.substring(0,2) //"A๏ฟฝ"
splitText = usaText.substring(0,3) //"A๐บ"
splitText = usaText.substring(0,4) //"A๐บ๏ฟฝ"
splitText = usaText.substring(0,5) //"A๐บ๐ธ"
Is there a way to use substring without breaking emoji? In my production code I cut at about 40 characters and I wouldn't mind if it was 35 or 45. I have thought about simply checking whether the 40th character is a number or between a-z but that wouldn't work if you got a text full of emojis. I could check whether the last character is one that "ends" an emoji by pattern matching but this also seems a bit weird performance-wise.
Am I missing something? With all the bloat that JavaScript carries, is there no built-in count
that sees emoji as one?
To the Split JavaScript string into array of codepoints? (taking into account "surrogate pairs" but not "grapheme clusters") thing:
chrs = Array.from( usaText )
(4) ["A", "๐บ", "๐ธ", "Z"]
0: "A"
1: "๐บ"
2: "๐ธ"
3: "Z"
length: 4
That's one too many unfortunately.
Array.from(yourstring)
, which will split your string into individual unicode characters without breaking them between bytes. โ Swanherd