UTF8 Script in PowerShell outputs incorrect characters
Asked Answered
I

4

52

I've created a UTF8 script for PowerShell with non-ascii characters.

characters.ps1:

Write-Host "ç â ã á à"

When the script is run in PowerShell console, it outputs wrong characters.

enter image description here

However, if I write the chars directly in the console, they are shown as expected:

enter image description here

Does anyone knows what causes that behavior?

The problem arised from a script I wrote who has hardcoded paths which include non-ascii characters. When I try to pass the path as argument to cmdlets (in the case I was gonna robocopy a folder) the command fails because it cannot find the path (which is output wrongly in the screen).

Ingather answered 23/1, 2013 at 14:38 Comment(1)
With the character "Ä" (capital ä) it's even worse; as soon as you write it between double quotation marks it will produce an error, if the file is encoded with utf-8 without bom.Percale
I
115

Changing the encoding of the script to UTF-8 with BOM solved the issue.

I was using SublimeText with the EncodingHelper plugin to control the character-set of the script. It was set correctly to UTF8.

I changed the encoding of the script in SublimeText to "UTF-8 with BOM" and the output was shown correctly.

I created the same script with Notepad++, which defaults to "UTF-8 with BOM", and the string was shown correctly in the console.

I changed the encoding of the script in Notepad++ to "UTF-8 without BOM" and it was shown incorrectly.

It seems PowerShell cannot guess correctly the encoding of UTF-8 files with no BOM.

Ingather answered 23/1, 2013 at 14:56 Comment(5)
This is pathetic. Especially considering how pointless and useless UTF-8 BOM is. +1 for enlightening information though.Exegetic
Tried like 10 commands wih no result, and it was just that ... Thanks broCharger
I would guess that in the absence of the BOM, Windows assumes Windows-1252 encoding for legacy reasons, unlike Linux which assumes UTF-8.Cribbage
This happened to me too. I created a PowerShell script with VS Code that created an Azure AD group with accented characters in the group description. Something was mangling the description, and it looks like that something was PowerShell. VS Code created the script as UTF-8 with no BOM, but I used Notepad++ to add the BOM and that fixed it.Lechery
In VS Code, my file was UTF8 by default. I saved with encoding UTF8 with BOM and it worked like magic. (Click on the encoding, bottom right part of editor)Kattie
C
15

In my case the problem was caused by creating a new PowerShell script with Visual Studio Code which has the default encoding of UTF-8 without BOM. Set the encoding to "Windows 1252" solved the problem.

It seems that PowerShell can't handle UTF-8 without BOM, it needs "Windows 1252" or "UTF8 with BOM" encodings.

Circumfuse answered 3/5, 2017 at 19:39 Comment(0)
U
1

There is a reliable way to detect utf8nobom (https://unicodebook.readthedocs.io/guess_encoding.html). Like a lot of other little things, this seems to work better in PS 6. Even my beloved emacs 25 for windows gets the encoding wrong.

PS C:\users\admin> pwsh
PowerShell 6.1.0
Copyright (c) Microsoft Corporation. All rights reserved.

https://aka.ms/pscore6-docs
Type 'help' to get help.

PS C:\users\admin> "write-host 'ç â ã á à'" | set-content -Encoding utf8NoBOM accent.ps1
PS C:\users\admin> .\accent
ç â ã á à
Uttica answered 8/11, 2018 at 18:5 Comment(0)
S
0

See what version you are running by executing $PSVersionTable, if PSVersion is 5.1, you're running Windows PowerShell (and can see that in $PSHOME leading to system32), which is an extremely dated version.

If you instead use modern PowerShell, you'll find that this and other behaviors are improved.
The problem is gone for me on the current 7.4 version, and someone else posted that 6.1 is also fine.

Alternatively, save your scripts as UTF8 with BOM to make 5.1 behave nicer.

Spherical answered 6/4 at 16:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.