Fast and simple binary concatenate files in Powershell
Asked Answered
C

5

35

What's the best way of concatenating binary files using Powershell? I'd prefer a one-liner that simple to remember and fast to execute.

The best I've come up with is:

gc -Encoding Byte -Path ".\File1.bin",".\File2.bin" | sc -Encoding Byte new.bin

This seems to work ok, but is terribly slow with large files.

Cyclopropane answered 23/11, 2009 at 14:44 Comment(0)
B
40

The approach you're taking is the way I would do it in PowerShell. However you should use the -ReadCount parameter to improve perf. You can also take advantage of positional parameters to shorten this even further:

gc File1.bin,File2.bin -Encoding Byte -Read 512 | sc new.bin -Encoding Byte

Editor's note: In the cross-platform PowerShell (Core) edition (version 6 and up), -AsByteStream must now be used instead of -Encoding Byte; also, the sc alias for the Set-Content cmdlet has been removed.

Regarding the use of the -ReadCount parameter, I did a blog post on this a while ago that folks might find useful - Optimizing Performance of Get Content for Large Files.

Billon answered 23/11, 2009 at 15:11 Comment(4)
I just ran this on my example files and the command went from taking 9 minutes to 3 seconds with the inclusion of the -read param. This is on a x25m drive. Nice. You get my accept.Cyclopropane
Just used your one-liner to join a 4.4gb iso spanned over 23 files. Reassembled the file fine, and took 35 minutes on my laptop using 1024 byte blocks.Pignut
I'm guessing this works because the pipe is sending .net objects to sc? When I tried to pipe binary data to a c program, I noticed that I only got the first 7 bits of each byte, since "|" was invokes encoding.Kuehl
No longer works in PowerShell 6/7. Byte is not an accepted encoding. Get-Content: Cannot process argument transformation on parameter 'Encoding'. 'Byte' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method. (Parameter 'name')Organzine
G
34

It's not Powershell, but if you have Powershell you also have the command prompt:

copy /b 1.bin+2.bin 3.bin

As Keith Hill pointed out, if you really need to run it from inside Powershell, you can use:

cmd /c copy /b 1.bin+2.bin 3.bin 
Gudrunguelderrose answered 23/11, 2009 at 15:10 Comment(4)
copy is an intrinsic command in cmd.exe. You would have to execute cmd /c copy /b 1.bin+2.bin 3.binBillon
Nice simple solution, works on any windows computer. Upvoted but accept to Keith since I asked for PS version. ThxCyclopropane
Note also that copy supports wildcards. So copy /b *.bin out.bin will concatenate all your bin-files and the output will be very fast (i.e. much faster than with PowerShell).Soulful
Thanks... Its about a billion times faster than the accepted anwser ;). I missed the "cmd /c" when trying to run it from PowerShell. Sometimes the old ways are still the best.Wilburnwilburt
S
7

I had a similar problem recently, where I wanted to append two large (2GB) files into a single file (4GB).

I tried to adjust the -ReadCount parameter for Get-Content, however I couldn't get it to improve my performance for the large files.

I went with the following solution:

function Join-File (
    [parameter(Position=0,Mandatory=$true,ValueFromPipeline=$true)]
    [string[]] $Path,
    [parameter(Position=1,Mandatory=$true)]
    [string] $Destination
)
{
    write-verbose "Join-File: Open Destination1 $Destination"
    $OutFile = [System.IO.File]::Create($Destination)
    foreach ( $File in $Path ) {
        write-verbose "   Join-File: Open Source $File"
        $InFile = [System.IO.File]::OpenRead($File)
        $InFile.CopyTo($OutFile)
        $InFile.Dispose()
    }
    $OutFile.Dispose()
    write-verbose "Join-File: finished"
} 

Performance:

  • cmd.exe /c copy file1+file2 File3 around 5 seconds (Best)
  • gc file1,file2 |sc file3 around 1100 seconds (yuck)
  • join-file File1,File2 File3 around 16 seconds (OK)
Sterigma answered 23/11, 2009 at 14:44 Comment(1)
cmd.exe copy is many times faster than native PS cmdlets - 1.2MB/s versus >120Mb/s. Not surprising considering how Get-Content works even with the -ReadCound parameterClef
B
4

Performance is very much dependent on the buffer size used. Those are fairly small by default. Concatenating 2x2GB files I'd take a buffersize of about 256kb. Going larger might sometimes fail, smaller and you'll get less throughput than your drive is capable of.

With gc that'd be with -ReadCount not simply -Read (PowerShell 5.0):

gc -ReadCount 256KB -Path $infile -Encoding Byte | ...

Plus I found Add-Content to be better and going file-by-file for a lot of small files, because piping only a moderate amount of data (200MB) I found my computer going oom, PowerShell freezing and CPU at full.

Although Add-Content randomly fails a few times for a few hundred files with an error about the destination file being in use, so I added a while loop and a try catch:

# Empty the file first
sc -Path "$path\video.ts" -Value @() -Encoding Byte 
$tsfiles | foreach {    
    while ($true) {
        try { # I had -ReadCount 0 because the files are smaller than 256KB
            gc -ReadCount 0 -Path "$path\$_" -Encoding Byte | `
                Add-Content -Path "$path\video.ts" -Encoding Byte -ErrorAction Stop
            break;
        } catch {
        }
    }
}

Using a file stream is much faster still. You cannot specify a buffer size with [System.IO.File]::Open but you can with new [System.IO.FileStream] like so:

# $path = "C:\"
$ins = @("a.ts", "b.ts")
$outfile = "$path\out.mp4"
$out = New-Object -TypeName "System.IO.FileStream" -ArgumentList @(
    $outfile, 
    [System.IO.FileMode]::Create,
    [System.IO.FileAccess]::Write,
    [System.IO.FileShare]::None,
    256KB,
    [System.IO.FileOptions]::None)
try {
    foreach ($in in $ins) {
        $fs = New-Object -TypeName "System.IO.FileStream" -ArgumentList @(
            "$path\$in", 
            [System.IO.FileMode]::Open,
            [System.IO.FileAccess]::Read,
            [System.IO.FileShare]::Read,
            256KB,
            [System.IO.FileOptions]::SequentialScan)
        try {
            $fs.CopyTo($out)
        } finally {
            $fs.Dispose()
        }
    }
} finally {
    $out.Dispose()
}
Barbe answered 9/8, 2015 at 15:22 Comment(1)
One is guessing that this is a very similar method used by cmd.exe copy commandClef
A
0

you could also use this method to concatenate any type of files.
not that fast but just another way to do it

cmd /c type file1 file2 file3 file4 > outfile

or you can use jobs to speed up things if more than one copy task exist.

$jobs=@()
$Files=@( @("d:\path\app1.rar","d:\path\app2.rar","d:\path\app3.rar","d:\path\appOutFile.rar"),@("c:\users\user\desktop\file1.iso","c:\users\user\desktop\file2.iso","c:\users\user\desktop\file3.iso","c:\users\user\desktop\isoOutFile.iso"),@("c:\users\user\video\video1.mp4","c:\users\user\video\video2.mp4","c:\users\user\video\mp4OutFile.mp4"))
$totalMb=0
foreach($Group in $Files){
    $sources=$Group|select -first ($Group.Length-1)
    $sources|%{ $totalMb+=(get-Item $_).Length/1mb }
    $sources=$(($sources -replace "^|$",[char]34)-join"+")
    $destination=[char]34+($Group|select -last 1)+[char]34
    cmd /c type nul> $destination
    $jobs+=start-job -scriptBlock { param($src,$dst) &cmd.exe "/c copy /b $src $dst" } -arg $sources,$destination 
    # $jobs+=start-job -scriptBlock { param($src,$dst) &cmd.exe "/c type $($src -replace '`"\+`"','`" `"') > $dst" } -arg $sources,$destination
}

"Total MegaBytes To Concatenate is: $( [math]::Truncate($totalMb) )MB"
while ( $($jobs|% {$_.state -eq "running"}) -contains $true ) {"still concatenating files...";start-sleep -s 2}

"jobs finished !!!";$jobs|receive-job;$jobs|remove-job -force
Andeee answered 18/9, 2023 at 23:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.