I want to support Unicode and as most characters as possible in my PowerShell script. As encoding I want to use UTF-8. So for testing purposes I simply type this line and press enter:
[char]0x02A7
And it successfully shows the character Κ§.
But when I try to display a Unicode character (> 0xFFFF):
[char]0x01F600
It throws an error telling that the value 128512 cannot be converted to System.Char. Instead it should show the smiley π.
What is wrong here?
Edit:
As Jeroen Mostert stated in the comments, I have to use another command for unicode characters with code point > 0xFFFF. So I wrote this script:
$s = [Char]::ConvertFromUtf32(0x01F600)
Write-Host $s
In the PowerShell IDE I get a beautiful smiley π. But when I run the script standalone (in an own window) I don't get the smiley. Instead it shows two strange characters.
What is wrong here?
char
is a 16-bit type and only holds 16-bit UTF-16 code units, not the full range of Unicode characters. Characters with code points outside this range have to be represented as a fullString
([Char]::ConvertFromUtf32(0x01F600)
); this string will be made up of two surrogate characters. Note that there is no such thing as a "3-byte Unicode character", and you have to be careful with terminology here lest you confuse yourself. Unicode characters have (numeric) code points, which are represented in different ways, with the number of bytes required depending on the encoding used. β Porcupine