Mathias R. Jessen provides the crucial pointer in a comment on the question:
It is the Import-Csv
call, not Export-Csv
, that is the cause of the problem in your case:
Like Export-Csv
, Import-Csv
too needs to be passed -Encoding Default
in order to properly process text files encoded with the system's active "ANSI" legacy code page, which is an 8-bit, single-byte character encoding such as Windows-1252.
In Windows PowerShell, even though the generic text-file processing Get-Content
/ Set-Content
cmdlet pair defaults to Default
encoding (as the name suggests), regrettably and surprisingly, Import-Csv
and Export-Csv
do not.
Note that on reading a default encoding is only assumed if the input file has no BOM (byte-order mark, a.k.a Unicode signature, a magic byte sequence at the start of the file that unambiguously identifies the file's encoding).
Not only do Import-Csv
and Export-Csv
have defaults that differ from Get-Content
/ Set-Content
, they individually have different defaults:
Import-Csv
defaults to UTF-8.
Export-Csv
defaults to ASCII(!), which means that any non-ASCII characters -such as ä
, ö
, ü
- are transliterated to literal ?
chars., resulting in loss of data.
By contrast, in PowerShell Core, the cross-platform edition built on .NET Core, the default encoding is (BOM-less) UTF-8, consistently, across all cmdlets, which greatly simplifies matters and makes it much easier to determine when you do need to use the -Encoding
parameter.
Demonstration of the Windows PowerShell Import-Csv
/ Export-Csv
behavior
Import-Csv
- defaults to UTF-8:
# Sample CSV content.
$str = @'
Column1
aäöü
'@
# Write sample CSV file 't.csv' using UTF-8 encoding *without a BOM*
# (Note that this cannot be done with standard PowerShell cmdlets.)
$null = new-item -type file t.csv -Force
[io.file]::WriteAllLines((Convert-Path t.csv), $str)
# Use Import-Csv to read the file, which correctly preserves the UTF-8-encoded
# umlauts
Import-Csv .\t.csv
The above yields:
Column1
-------
aäöü
As you can see, the umlauts were correctly preserved.
By contrast, had the file been "ANSI"-encoded ($str | Set-Content t.csv
; -Encoding Default
implied), the umlauts would have gotten corrupted.
Export-Csv
- defaults to ASCII - risk of data loss:
Building on the above example:
Import-Csv .\t.csv | Export-Csv .\t.new.csv
Get-Content .\t.new.csv
yields:
"Column1"
"a???"
As you can see, the umlauts were replaced by literal question marks (?
).
Import-Csv
as well – SilicaDefault
varies from machine to machine, user to user and even time to time, so it is very rarely one that a writer would want to use. – MulleinGet-Content
andSet-Content
cmdlets; despite even naming the ACP encodingDefault
when used with the-Encoding
parameter, however, other cmdlets do not use it by default, which is the truly problematic aspect here. – MaleficentGet-WinEvent
, and you don't use a Unicode encoding, such as UTF-8, you potentially get silent data loss. So, one strategy is to just specify UTF-8 everywhere. – Mullein