tl;dr:
In Windows PowerShell and PowerShell (Core) up to v7.3.x, if you need raw byte handling and/or need to prevent PowerShell from situationally adding a trailing newline to your text data, avoid the PowerShell pipeline altogether, as shown below.
Workarounds are no longer necessary in v7.4+: In v7.4, the previously experimental feature named PSNativeCommandPreserveBytePipe
became a stable feature: >
and |
when applied to external (native) programs now act as raw byte conduits, i.e. they bypass the usual string-decoding and re-encoding cycle in favor of passing the raw data through.
- However, two limitations remain:
- Sending a PowerShell string through the pipeline to an external program still invariably causes a newline to be appended. See this answer for workarounds.
- You cannot capture the raw bytes that form an external program's output in memory in PowerShell; see the bottom section for workarounds.
For raw byte handling in Windows PowerShell and PowerShell v7.3-, shell out to cmd
with /c
(on Windows; on Unix-like platforms / Unix-like Windows subsystems, use sh
or bash
with -c
):
cmd /c 'type .\test.txt | .\Crypt.exe --encrypt | .\Crypt.exe --decrypt'
Use a similar technique to save raw byte output in a file - do not use PowerShell's >
operator:
cmd /c 'someexe > file.bin'
Note that if you want to capture an external program's text output in a PowerShell variable or process it further in a PowerShell pipeline, you need to make sure that [Console]::OutputEncoding
matches your program's output character encoding (the active OEM code page, typically), which should be true by default in this case; see the next section for details.
Generally, however, byte manipulation of text data is best avoided.
There are two separate problems, only one of which has a simple solution:
Problem 1: There is indeed a character encoding problem, as you suspected:
PowerShell invisibly inserts itself as an intermediary in pipelines, even when sending data to and receiving data from external programs: It converts data from and to .NET strings (System.String
), which are sequences of UTF-16 code units.
- As an aside: Even when using only PowerShell-native commands, this means that reading input from files and saving them again can result in a different character encoding, because the information about the original character encoding is not preserved once (string) data has been read into memory, and on saving it is the cmdlets' default character encoding that is used; while this default encoding is consistently BOM-less UTF-8 in PowerShell (Core) 6+, it varies by cmdlet in Windows PowerShell - see this answer.
In order to send to and receive data from external programs (such as Crypt.exe
in your case), you need to match their character encoding; in your case, with a Windows console application that uses raw byte handling, the implied encoding is the system's active OEM code page.
On sending data, PowerShell uses the encoding of the $OutputEncoding
preference variable to encode (what is invariably treated as text) data, which defaults to ASCII(!) in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell (Core).
The receiving end is covered by default: PowerShell uses [Console]::OutputEncoding
(which itself reflects the code page reported by chcp
) for decoding data received, and on Windows this by default reflects the active OEM code page, both in Windows PowerShell and PowerShell [Core][1].
To fix your primary problem, you therefore need to set $OutputEncoding
to the active OEM code page:
# Make sure that PowerShell uses the OEM code page when sending
# data to `.\Crypt.exe`
$OutputEncoding = [Console]::OutputEncoding
Problem 2: PowerShell invariably appends a trailing newline to data that doesn't already have one when piping data to external programs:
That is, "foo" | .\Crypt.exe
doesn't send (the $OutputEncoding
-encoded bytes representing) "foo"
to .\Crypt.exe
's stdin, it sends "foo`r`n"
on Windows; i.e., a (platform-appropriate) newline sequence (CRLF on Windows) is automatically and invariably appended (unless the string already happens to have a trailing newline).
This problematic behavior is discussed in GitHub issue #5974 and also in this answer.
In your specific case, the implicitly appended "`r`n"
is also subject to the byte-value-shifting, which means that the 1st Crypt.exe
calls transforms it to -*
, causing another "`r`n"
to be appended when the data is sent to the 2nd Crypt.exe
call.
The net result is an extra newline that is round-tripped (the intermediate -*
), plus an encrypted newline that results in φΩ
).
In short: If your input data had no trailing newline, you'll have to cut off the last 4 characters from the result (representing the round-tripped and the inadvertently encrypted newline sequences):
# Ensure that .\Crypt.exe output is correctly decoded.
$OutputEncoding = [Console]::OutputEncoding
# Invoke the command and capture its output in variable $result.
# Note the use of the `Get-Content` cmdlet; in PowerShell, `type`
# is simply a built-in *alias* for it.
$result = Get-Content .\test.txt | .\Crypt.exe --decrypt | .\Crypt.exe --encrypt
# Remove the last 4 chars. and print the result.
$result.Substring(0, $result.Length - 4)
Given that calling cmd /c
as shown at the top of the answer works too, that hardly seems worth it.
How PowerShell handles pipeline data with external programs:
Note: The following mostly applies to v7.4+ as well, except where noted. (PowerShell) v7.3- is shorthand for both older PowerShell (Core) versions (7.3.x and below) and Windows PowerShell.
Unlike cmd
(or POSIX-like shells such as bash
):
PowerShell v7.3- doesn't support raw byte data in pipelines.[2]
When talking to external programs, it only knows text (whereas it passes .NET objects when talking to PowerShell's own commands, which is where much of its power comes from).
Specifically, this works as follows:
When you send data to an external program via the pipeline (to its stdin stream):
In v7.4+, you now can now send raw byte data to an external program, as a stream of [byte]
instances or, preferably, for better performance, as a [byte[]]
array; one use case for this technique is to prevent PowerShell from appending a trailing newline to text, which invariably happens when a string is sent (see this answer for details and a 7.3- workaround); e.g.
# Sends bytes with values 65 and 66 to findstr.exe,
# which interprets them as single-byte characters 'A' and 'B'
# The unary form of "," in effect sends the byte array
# *as a whole*, which improves performance.
, [byte[]] (65, 66) | findstr . # -> 'AB'
Otherwise - and invariably in v7.3- - it is converted to text (strings) using the character encoding specified in the $OutputEncoding
preference variable, which defaults to ASCII(!) in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell (Core).
Caveat: If you assign an encoding with a BOM to $OutputEncoding
, PowerShell will emit the BOM as part of the first line of output sent to an external program; therefore, for instance, do not use [System.Text.Encoding]::UTF8
(which emits a BOM) in Windows PowerShell, and use [System.Text.Utf8Encoding]::new()
(which doesn't) instead.
If the data is not captured or redirected by PowerShell, encoding problems may not always become apparent, namely if an external program is implemented in a way that uses the Windows Unicode console API to print to the display.
Something that isn't already text (a string) is stringified using PowerShell's default output formatting (the same format you see when you print to the console), with an important caveat:
If the (last) input object already is a string that doesn't itself have a trailing newline, one is invariably appended (and even an existing trailing newline is replaced with the platform-native one, if different).
When you capture / redirect data from an external program (from its stdout stream), it is invariably decoded as lines of text (strings), based on the encoding specified in [Console]::OutputEncoding
, which defaults to the active OEM code page on Windows (surprisingly, in both PowerShell editions, as of v7.0-preview6[1]).
PowerShell-internally text is represented using the .NET System.String
type, which is based on UTF-16 code units (often loosely, but incorrectly called "Unicode"[3]).
In Windows PowerShell and PowerShell (Core) up to v7.3.x only, the above also applies:
when piping data between external programs,
when data is redirected to a file; that is, irrespective of the source of the data and its original character encoding, PowerShell uses its default encoding(s) when sending data to files; in Windows PowerShell, >
produces UTF-16LE-encoded files (with BOM), whereas PowerShell (Core) sensibly defaults to BOM-less UTF-8 (consistently, across file-writing cmdlets).
In v7.4+, PowerShell now streams raw bytes in the two scenarios above, which not only improves performance noticeably, but prevents potential data corruption due to the previous as-text interpretation.
Note that capturing raw byte data from external programs in memory isn't directly possible: on assignment to a variable or on processing via a PowerShell command, the as-text interpretation still invariably applies; the simplest workaround is:
v7.4+: Use >
to redirect the external-program call to a file, and read that file with [System.IO.File::ReadAllBytes()
, for instance; as always when passing file-system paths to .NET methods, be sure to pass full paths, because .NET's working directory usually differs from PowerShell's.
v7.3-: Call the external program via the platform-native shell and use its >
operator to capture the raw byte output in a file (cmd /c 'foo.exe ... > file
on Windows, sh -c 'foo ... > file'
on Unix-like platforms), then read the file in PowerShell, as above.
[1] In PowerShell (Core), given that $OutputEncoding
commendably already defaults to UTF-8, it would make sense to have [Console]::OutputEncoding
be the same - i.e., for the active code page to be effectively 65001
on Windows, as suggested in GitHub issue #7233.
[2] With input from a file, the closest you can get to raw byte handling is to read the file as a .NET System.Byte
array with Get-Content -AsByteStream
(PowerShell (Core)) / Get-Content -Encoding Byte
(Windows PowerShell), but the only way you can further process such as an array is to pipe to a PowerShell command that is designed to handle a byte array, or by passing it to a .NET type's method that expects a byte array. If you tried to send such an array to an external program via the pipeline, each byte would be sent as its decimal string representation on its own line.
[3] Unicode is the name of the abstract standard describing a "global alphabet". In concrete use, it has various standard encodings, UTF-8 and UTF-16 being the most widely used.