Display 3-byte unicode character in Windows PowerShell
Asked Answered
M

1

3

I want to support Unicode and as most characters as possible in my PowerShell script. As encoding I want to use UTF-8. So for testing purposes I simply type this line and press enter:

[char]0x02A7

And it successfully shows the character Κ§.

But when I try to display a Unicode character (> 0xFFFF):

[char]0x01F600  

It throws an error telling that the value 128512 cannot be converted to System.Char. Instead it should show the smiley πŸ˜€.

What is wrong here?

Edit:

As Jeroen Mostert stated in the comments, I have to use another command for unicode characters with code point > 0xFFFF. So I wrote this script:

$s = [Char]::ConvertFromUtf32(0x01F600)
Write-Host $s

In the PowerShell IDE I get a beautiful smiley πŸ˜€. But when I run the script standalone (in an own window) I don't get the smiley. Instead it shows two strange characters.

What is wrong here?

Mensuration answered 19/11, 2021 at 8:49 Comment(6)
In .NET, which Powershell is based on, characters are 16-bit. You will have to figure out to encode that symbol as two characters. – Kokura
char is a 16-bit type and only holds 16-bit UTF-16 code units, not the full range of Unicode characters. Characters with code points outside this range have to be represented as a full String ([Char]::ConvertFromUtf32(0x01F600)); this string will be made up of two surrogate characters. Note that there is no such thing as a "3-byte Unicode character", and you have to be careful with terminology here lest you confuse yourself. Unicode characters have (numeric) code points, which are represented in different ways, with the number of bytes required depending on the encoding used. – Porcupine
@JeroenMostert thank you for this knowledge. I now get a beautiful smiley in the IDE. But if I run the script in a PowerShell terminal window (Win+X) it shows two strange characters. Do you know why? (Also see my edit) – Mensuration
@somega The answer is likely that the fonts that ship with the Console Host (the default terminal host in Windows) don't support smileys and other wide-chars :) – Lungan
Encoding issues are a whole differerent kettle of fish. See this answer for a ton of gory details. You should see only one character, but that will probably still be a replacement character, because your console won't support emojis. You can verify that by trying to copy-paste the smiley directly to the prompt: it'll show up as a replacement character there too. Emoji support requires something like Windows Terminal; launching PS from there gives you emoji support by default. – Porcupine
Emoji's are 2 characters long. You'd have to do some surrogate math to make them yourself. https://mcmap.net/q/267200/-spliting-an-emoji-sequence-in-powershell – Peppermint
P
2

Aside from [Char]::ConvertFromUtf32(), here's a way to calculate the surrogate pair by hand for code points over 2 bytes or 16 bits long (http://www.russellcottrell.com/greek/utilities/surrogatepaircalculator.htm):

$S = 0x1F600
[int]$H = [Math]::Truncate(($S - 0x10000) / 0x400) + 0xD800
[int]$L = ($S - 0x10000) % 0x400 + 0xDC00
[char]$H + [char]$L

πŸ˜€
Peppermint answered 21/11, 2021 at 18:0 Comment(1)
This is the same as UnicodeHexHTML emoji decoding. Thank you. It was too lazy to raise archives about high, low bytes and all sorts of other rubbish. – Topical

© 2022 - 2024 β€” McMap. All rights reserved.