Write-Output with no BOM
Asked Answered
C

2

7

If I run a command like this:

Write-Output March > a.txt

I get this result:

        U+FEFF    
M       U+004D          
a       U+0061          
r       U+0072    
c       U+0063          
h       U+0068 
        U+000D       
\n      U+000A       

I do not want the BOM. I tried different actions, like this:

$OutputEncoding = [System.Text.UTF8Encoding]::new($false)
$PSDefaultParameterValues['*:Encoding'] = 'utf8'
[Console]::InputEncoding = [System.Text.UTF8Encoding]::new($false)
[Console]::OutputEncoding = [System.Text.UTF8Encoding]::new($false)

but none of them seem to address the issue. Note I am using PowerShell 5.1. I did see some similar questions, but not quite the same issue as this, as they were dealing with piping and external commands.

Crepitate answered 8/12, 2020 at 1:7 Comment(0)
C
2

If you're only using ascii characters, set-content would be fine in powershell 5.1:

Write-Output March | set-content a.txt
'March' | set-content a.txt

Or set the default encoding of out-file to ascii in your $profile with this hashtable. The default encoding of out-file is utf16 or 'unicode' encoding. '>' is a shortcut for out-file. The name of the key has to be quoted because it contains a colon. utf8nobom isn't available until later powershell versions. '>>' also invokes out-file and may mix encodings in the same file.

$PSDefaultParameterValues = @{ 'out-file:encoding' = 'ascii' }

Then this will make an ascii file:

Write-Output March > a.txt
Coquette answered 8/12, 2020 at 18:39 Comment(0)
A
13

tl;dr

  • If you want Windows PowerShell's > operator and cmdlets such as Out-File to output BOM-less UTF-8, your only option is to change to that encoding system-wide (see caveats in next section):

    • As a one-time step, run intl.cpl to open Control Panel's Region settings, switch to the Administrative tab, click the Change system locale... button and check Beta: Use Unicode UTF-8 for worldwide language support. Reboot required.

    • Additionally, run the following in every session, which is best done via your $PROFILE file:

      • $PSDefaultParameterValues['*:Encoding'] = 'Default'
  • Otherwise, you must use .NET APIs directly - see the answers to this question - or write a PowerShell-friendly wrapper around them - see this answer, which also shows a New-Item alternative.

  • Alternatively, you can install the cross-platform PowerShell (Core) v6+ edition, which consistently defaults to BOM-less UTF-8.


Starting with Windows 10, you can make Windows PowerShell default consistently to BOM-less UTF-8 - assuming you're willing to change to this encoding system-wide:

  • Change your system locale (language for non-Unicode programs) to BOM-less UTF-8, as described in this answer:

    • In short: Run intl.cpl to open Control Panel's Region settings, switch to the Administrative tab, click the Change system locale... button and check the Beta: Use Unicode UTF-8 for worldwide language support; note that you need administrative privileges to make this change and that a reboot is required to make the change take effect.

    • Caveats:

      • This change sets both the OEM and the ANSI code page to 65001, i.e. (BOM-less) UTF-8, which affects not only all console windows, but all legacy (non-Unicode) applications, including GUI ones.

      • As of Windows 11 version 22H2, this feature is still in beta and can break legacy console applications.

  • Then, in Windows PowerShell v5.1, add the following to your $PROFILE file (this isn't necessary in PowerShell (Core) v6+):

    • $PSDefaultParameterValues['*:Encoding'] = 'Default'
    • $OutputEncoding = [System.Text.Utf8Encoding]::new($false)

With this in effect:

  • All file-writing[1] Windows PowerShell cmdlets that have an -Encoding parameter will then default to BOM-less UTF-8 (Default represents the active ANSI code page, which will then be 65001, i.e. BOM-less UTF-8) - notably including > / Out-File / Set-Content.

  • Windows PowerShell then also reads BOM-less files as UTF-8, including source code and via Get-Content; normally, Windows PowerShell interprets BOM-less files based on the system locale-appropriate ANSI code page (whereas PowerShell (Core) v6+ assumes UTF-8).

  • By virtue of the OEM code page then being BOM-less UTF-8 (as reflected in chcp.com reporting 65001), PowerShell will also use BOM-less UTF-8:

    • When interpreting data received from the outside via its CLI.
    • When interpreting data received from an external program inside a PowerShell session.
    • The $OutputEncoding assignment above additionally ensures that PowerShell sends data to external programs as BOM-less UTF-8. (This preference variable fortunately now defaults to BOM-less UTF-8 in PowerShell [Core] v6+.)

Note that the above also makes all PowerShell [Core] v6+ console windows use BOM-less UTF-8 in all respects, except that you don't need the $PROFILE additions (though they do no harm).


Background information:

  • > a.txt is effectively the same as | Out-File a.txt.

  • Windows PowerShell's > / >> / Out-File default to UTF-16LE ("Unicode")[2], which invariably uses a BOM.

  • You have two options for choosing a different encoding:

    • Use Out-File explicitly and use its -Encoding parameter.

    • In v5.1 (and also in PowerShell [Core] v6+), you can set the default encoding for > / >> / Out-File via the $PSDefaultParameterValues preference variable, as discussed in this answer.

    • However, in Windows PowerShell, the utf8 value for -Encoding is invariably a UTF-8 encoding with BOM, so - unless you're willing to switch to UTF-8 system-wide, as explained above - the only way to create BOM-less UTF-8 files is to use .NET APIs directly.

      • Note that in PowerShell [Core] v6+ the utf8 value accepted by an -Encoding parameter now (more sensibly) refers to a BOM-less UTF-8 encoding; if you do want a UTF-8 BOM there, use utf8BOM instead.

As for what you tried:

The properties and variables you tried are related only to how PowerShell - in both editions - communicates with external programs:

  • $OutputEncoding determines the encoding PowerShell uses when sending data via the pipeline to an external program (which the latter can read via stdin (standard input).

  • [Console]::OutputEncoding determines the encoding that PowerShell uses when interpreting output received from an external program.

  • [Console]::InputEncoding is the encoding that PowerShell uses when it receives data from the outside, when its CLI is called.

    • Caveat: You can't change this encoding from within your PowerShell session in this case, as that would be too late.
    • It must be set by the caller, before calling the PowerShell CLI, which from cmd.exe is most easily done with chcp 65001 (see caveat re calling chcp from inside PowerShell below). While that invariably sets both [Console]::InputEncoding and [Console]::OutputEncoding, that is usually desirable.

Note:

  • On Windows, [Console]::OutputEncoding and [Console]::InputEncoding by default reflect the encoding of the legacy system locale's OEM code page, as reported by chcp.com; on Unix-like platforms (PowerShell [Core] v6+), it is (virtually without exception, these days) (BOM-less) UTF-8

  • Due to caching of the encodings in these .NET properties, you cannot use chcp.com from inside PowerShell to change these properties - instead, assign the desired encoding directly.

  • See this answer for more information, which discusses how to make console windows on Windows use BOM-less UTF-8 consistently with respect to external programs.


[1] Technically, this preference is also applied to file-reading cmdlets, which is neither strictly necessary for BOM-less files nor does it do any harm for files with a BOM - even if that BOM indicates a UTF-16 or UTF-32 encoding - because a BOM alway overrides an -Encoding argument.

[2] Unfortunately, in Windows PowerShell the default encodings vary wildly across cmdlets - see the bottom section of this answer.

Appositive answered 8/12, 2020 at 2:4 Comment(5)
Can system locale be set through cli or a group policy?Majoriemajority
Also adding the two lines to the profile seems to trigger an addition of a 3rd line "値䑓晥畡瑬慐慲敭整噲污敵孳⨢䔺据摯湩≧⁝‽䐢晥畡瑬ഢ␊畏灴瑵湅潣楤杮㴠嬠祓瑳浥吮硥⹴呕㡆湅潣楤杮㩝渺睥⤨਍" (I have zero clue what it says) which causes Powershell to show an error every time I open it: imgur.com/a/c5Qfoz3Majoriemajority
@gargoylebident, re CLI: I assume yes, via the registry; re GPO: not sure - but I encourage you to create a new question specifically for that. As for the profile: sounds like an encoding mismatch, which you'd get if you added, say, UTF-8 or ANSI-encoded characters to a UTF-16LE ("Unicode") file.Appositive
Thanks, you were right as always. I was adding both strings using | Out-File $PROFILE.AllUsersCurrentHost -Append after applying the system locale setting, and the profile file (the 5.1 one) is in UTF-16LE (as opposed to UTF-8 which Powershell now defaults to). Appending prior to changing the locale setting (or using 16LE when appending) solves this.Majoriemajority
Glad to hear it, @gargoylebident. Note that for appending to existing files Add-Content is the best choice, because it tries to match the existing encoding. (If the file doesn't exist yet, it behaves like Set-Content, meaning it uses ANSI in Windows PowerShell, and BOM-less UTF-8 in PowerShell Core).Appositive
C
2

If you're only using ascii characters, set-content would be fine in powershell 5.1:

Write-Output March | set-content a.txt
'March' | set-content a.txt

Or set the default encoding of out-file to ascii in your $profile with this hashtable. The default encoding of out-file is utf16 or 'unicode' encoding. '>' is a shortcut for out-file. The name of the key has to be quoted because it contains a colon. utf8nobom isn't available until later powershell versions. '>>' also invokes out-file and may mix encodings in the same file.

$PSDefaultParameterValues = @{ 'out-file:encoding' = 'ascii' }

Then this will make an ascii file:

Write-Output March > a.txt
Coquette answered 8/12, 2020 at 18:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.