How do I encode Unicode character codes in a PowerShell string literal?
Asked Answered
C

7

76

How can I encode the Unicode character U+0048 (H), say, in a PowerShell string?

In C# I would just do this: "\u0048", but that doesn't appear to work in PowerShell.

Castorena answered 29/6, 2009 at 5:32 Comment(2)
What's your output encoding set to? ($OutputEncoding)Anthropophagi
It's us-ascii. But U+0048 should be encodable in that. I'm actually trying to encode an escape character (U+001B).Castorena
E
98

Replace '\u' with '0x' and cast it to System.Char:

PS > [char]0x0048
H

You can also use the "$()" syntax to embed a Unicode character into a string:

PS > "Acme$([char]0x2122) Company"
AcmeT Company

Where T is PowerShell's representation of the character for non-registered trademarks.

Note: this method works only for characters in Plane 0, the BMP (Basic Multilingual Plane), chars < U+10000.

Expound answered 29/6, 2009 at 7:26 Comment(5)
You can even write a little function: function C($n) {[char][int]"0x$n"}. Which you can use in a string as follows: "$(C 48)ello World." Not ideal but probably a little closer to the \u escape.Hawaiian
This also works when you want to pass a unicode [char] to a function. Thanks for the help.Washboard
I know this topic is 2.5 years old, but following up on @Joey's comment, you can even make a function called \u. It's identical to Joey's, just with a different name. So the function is function \u($n) {[char][int]"0x$n"}. The way you call it is just like C# except that you need a space between the function name and the number. So \u 0048 returns H.Inflect
This only works for characters in BMP, else it triggers an error. Eg. [char]0x1D400: InvalidArgument: Cannot convert value "119808" to type "System.Char". Error: "Value was either too large or too small for a character."Robinetta
@Robinetta The reason this only works for characters in the BMP is that .NET’s char type represents UTF-16 code units, and for BMP characters, 1 character = 1 code unit, but for non-BMP characters, 1 character = 2 code units. /// @Inflect the \u function could be extended to work with non-BMP characters.Sideslip
F
32

According to the documentation, PowerShell Core 6.0 adds support with this escape sequence:

PS> "`u{0048}"
H

see https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_special_characters?view=powershell-6#unicode-character-ux

Floatstone answered 20/3, 2018 at 10:34 Comment(0)
I
17

Maybe this isn't the PowerShell way, but this is what I do. I find it to be cleaner.

[regex]::Unescape("\u0048") # Prints H
[regex]::Unescape("\u0048ello") # Prints Hello
Ivories answered 8/9, 2015 at 17:18 Comment(0)
F
5

For those of us still on 5.1 and wanting to use the higher-order Unicode charset (for which none of these answers work) I made this function so you can simply build strings like so:

'this is my favourite park ',0x1F3DE,'. It is pretty sweet ',0x1F60A | Unicode

enter image description here

#takes in a stream of strings and integers,
#where integers are unicode codepoints,
#and concatenates these into valid UTF16
Function Unicode {
    Begin {
        $output=[System.Text.StringBuilder]::new()
    }
    Process {
        $output.Append($(
            if ($_ -is [int]) { [char]::ConvertFromUtf32($_) }
            else { [string]$_ }
        )) | Out-Null
    }
    End { $output.ToString() }
}

Note that getting these to display in your console is a whole other problem, but if you're outputting to an Outlook email or a Gridview (below) it will just work (as utf16 is native for .NET interfaces).

enter image description here

This also means you can also output plain control (not necessarily unicode) characters pretty easily if you're more comfortable with decimal since you dont actually need to use the 0x (hex) syntax to make the integers. 'hello',160,'there' | Unicode would put a non-breaking space betwixt the two words, the same as if you did 0xA0 instead.

Frazier answered 24/3, 2020 at 5:26 Comment(4)
[char]::ConvertFromUtf32 has been available since .NET 2.1 so you don't need such a complex functionGeriatrics
oh neat. The function is still necessary, I'm not writing [char]blahblahblah whenever I want a "`u{}", but it does simplify the ifFrazier
besides $_ -shr 11 should be used instead of [int][math]::Floor($_ / 0x400), and ($_ -band 0x3FF) -bor 0xDC00 instead of [char]($_ % 0x400 + 0xDC00)Geriatrics
I s'pose that's obvious since it was a nice even hex number, oh well. Doesn't matter now that .NET can handle the overarching problemFrazier
G
4

To make it work for characters outside the BMP you need to use Char.ConvertFromUtf32()

'this is my favourite park ' + [char]::ConvertFromUtf32(0x1F3DE) + 
'. It is pretty sweet ' + [char]::ConvertFromUtf32(0x1F60A)

In PowerShell 6.0 or newer you can also use `u{}

"this is my favourite park `u{1F3DE}. It is pretty sweet `u{1F60A}"

This special character was added in PowerShell 6.0.

The Unicode escape sequence (`u{x}) allows you to specify any Unicode character by the hexadecimal representation of its code point. This includes Unicode characters above the Basic Multilingual Plane (> 0xFFFF) which includes emoji characters such as the thumbs up (`u{1F44D}) character. The Unicode escape sequence requires at least one hexadecimal digit and supports up to six hexadecimal digits. The maximum hexadecimal value for the sequence is 10FFFF.

Unicode character (`u{x})

Geriatrics answered 24/3, 2020 at 5:38 Comment(1)
Yep, with emoji's in powershell you would need 2 surrogate characters in a row https://mcmap.net/q/266875/-display-3-byte-unicode-character-in-windows-powershell.Abscission
H
3

Another way using PowerShell.

$Heart = $([char]0x2665)
$Diamond = $([char]0x2666)
$Club = $([char]0x2663)
$Spade = $([char]0x2660)
Write-Host $Heart -BackgroundColor Yellow -ForegroundColor Magenta

Use the command help Write-Host -Full to read all about it.

Hamon answered 29/9, 2020 at 21:28 Comment(1)
Shay Levy's answer above already showed how to use [char]0x2665. In fact this is far more inefficient because you create a new subshell for each variable instead of assigning directly: $Heart = [char]0x2665Geriatrics
L
0

Note that some characters like 🌎 might need a "double rune" to be printed:

PS> "C:\foo\bar\$([char]0xd83c)$([char]0xdf0e)something.txt"

Will print:

C:\foo\bar\🌎something.txt

You can find these "runes" here, in the "unicode escape" row: https://dencode.com/string

Lebar answered 28/5, 2021 at 6:42 Comment(1)
no need for such a complex manual lookup method. My answer already shows many solutions before you. `` "C:\foo\bar`u{1F30E}something.txt" `` or "C:\foo\bar\" + [char]::ConvertFromUtf32(0x1F30E) + "something.txt" will workGeriatrics

© 2022 - 2024 — McMap. All rights reserved.