How do I compress files using powershell that are over 2 GB?
Asked Answered
S

1

8

I am working on a project to compress files that range anywhere from a couple mb to several gb's big and I am trying to use powershell to compress them into a .zip. The main problem I am having is that using Compress-Archive has a 2 Gb cap to individual file size, and I was wondering if there was another method to compress files.

Edit:

So for this project we are looking to implement a system to take .pst files from outlook and compress them to a .zip and upload them to a server. Once they are uploaded they will be pulled down from a new device and extracted into a .pst file again.

Serg answered 13/6, 2022 at 19:7 Comment(9)
@anothervictimofthemouse I have been trying to use that but due to the cap of 2Gb per file I keep running into an error because the file is too bigSerg
Use .NET API directlyBarranca
@SantiagoSquarzon do you have any articles or resources I could look at to implement this?Serg
I'll post an answer later if nobody does, you will need to use a file stream for this. Please add more details to your question such as file folder structure you're looking to compressBarranca
You cant use Powershell alone. You either need C# code (see learn.microsoft.com/en-us/dotnet/api/system.io.compression) or install an external program and let that do the compression (the latter being a lot easier). You could invoke 7Zip from powershell with Start-Process -FilePath "C:\path\to\7z.exe" -ArgumentList @("a", "c:\path\to\output.zip", "c:\path\to\inputfolder"). (www.7-zip.org - you must install 7za.exe). Chocolatey(chocolatey.org) can be used to install 7zip from PowerShellRaynard
@Raynard that's absolutely wrong. PowerShell can do anything C# can, because they're both based on .NET. You can call Win32 APIs and .NET framework APIs directly from PowerShellThermosiphon
powershell compress-archive File size issueThermosiphon
@Serg is it a must to use zip? If not then just use 7z or rar which is much better, or just use the built-in tar command which can give you bz2 or gz which are also much better than zip. Alternatively compress to *.wimThermosiphon
Similar but with no alternative solutions provided: Compress-Archive command returns "Exception: Stream was too long.", Compress-Archive is throwing 'System.OutOfMemoryException'Cuspidor
B
8

NOTE

Further updates to this function will be published to the official GitHub repo as well as to the PowerShell Gallery. The code in this answer will no longer be maintained.

Contributions are more than welcome, if you wish to contribute, fork the repo and submit a pull request with the changes.


To explain the limitation named on PowerShell Docs for Compress-Archive:

The Compress-Archive cmdlet uses the Microsoft .NET API System.IO.Compression.ZipArchive to compress files. The maximum file size is 2 GB because there's a limitation of the underlying API.

This happens because, presumably (since 5.1 version is closed source), this cmdlet uses a Memory Stream to hold all zip archive entries in memory before writing the zip archive to a file. Inspecting the InnerException produced by the cmdlet we can see:

System.IO.IOException: Stream was too long.
   at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
   at CallSite.Target(Closure , CallSite , Object , Object , Int32 , Object )

We would also see a similar issue if we attempt to read all bytes from a file larger than 2Gb:

Exception calling "ReadAllBytes" with "1" argument(s): "The file is too long.
This operation is currently limited to supporting files less than 2 gigabytes in size."

Coincidentally, we see the same limitation with System.Array:

.NET Framework only: By default, the maximum size of an Array is 2 gigabytes (GB).


There is also another limitation pointed out in this question, Compress-Archive can't compress if another process has a handle on a file.

How to reproduce?

# cd to a temporary folder and
# start a Job which will write to a file
$job = Start-Job {
    0..1000 | ForEach-Object {
        "Iteration ${_}:" + ('A' * 1kb)
        Start-Sleep -Milliseconds 200
    } | Set-Content .\temp\test.txt
}

Start-Sleep -Seconds 1
# attempt to compress
Compress-Archive .\temp\test.txt -DestinationPath test.zip
# Exception:
# The process cannot access the file '..\test.txt' because it is being used by another process.
$job | Stop-Job -PassThru | Remove-Job
Remove-Item .\temp -Recurse

To overcome this issue, and also to emulate explorer's behavior when compressing files used by another process, the function posted below will default to [FileShare] 'ReadWrite, Delete' when opening a FileStream.


To get around this issue there are two workarounds:

  • The easy workaround is to use the ZipFile.CreateFromDirectory Method. There are 3 limitations while using this static method:
    1. The source must be a directory, a single file cannot be compressed.
    2. All files (recursively) on the source folder will be compressed, we can't pick / filter files to compress.
    3. It's not possible to Update the entries of an existing Zip Archive.

Worth noting, if you need to use the ZipFile Class in Windows PowerShell (.NET Framework) there must be a reference to System.IO.Compression.FileSystem. See inline comments.

# Only needed if using Windows PowerShell (.NET Framework):
Add-Type -AssemblyName System.IO.Compression.FileSystem

[IO.Compression.ZipFile]::CreateFromDirectory($sourceDirectory, $destinationArchive)
  • The code it yourself workaround, would be using a function which does all the manual process for creating the ZipArchive and the corresponding ZipEntries.

This function should be able to handle compression same as ZipFile.CreateFromDirectory Method but also allow filtering files and folders to compress while keeping the file / folder structure untouched.

Documentation as well as usage example can be found here.

using namespace System.IO
using namespace System.IO.Compression
using namespace System.Collections.Generic

Add-Type -AssemblyName System.IO.Compression

function Compress-ZipArchive {
    [CmdletBinding(DefaultParameterSetName = 'Path')]
    [Alias('zip', 'ziparchive')]
    param(
        [Parameter(ParameterSetName = 'PathWithUpdate', Mandatory, Position = 0, ValueFromPipeline)]
        [Parameter(ParameterSetName = 'PathWithForce', Mandatory, Position = 0, ValueFromPipeline)]
        [Parameter(ParameterSetName = 'Path', Mandatory, Position = 0, ValueFromPipeline)]
        [string[]] $Path,

        [Parameter(ParameterSetName = 'LiteralPathWithUpdate', Mandatory, ValueFromPipelineByPropertyName)]
        [Parameter(ParameterSetName = 'LiteralPathWithForce', Mandatory, ValueFromPipelineByPropertyName)]
        [Parameter(ParameterSetName = 'LiteralPath', Mandatory, ValueFromPipelineByPropertyName)]
        [Alias('PSPath')]
        [string[]] $LiteralPath,

        [Parameter(Position = 1, Mandatory)]
        [string] $DestinationPath,

        [Parameter()]
        [CompressionLevel] $CompressionLevel = [CompressionLevel]::Optimal,

        [Parameter(ParameterSetName = 'PathWithUpdate', Mandatory)]
        [Parameter(ParameterSetName = 'LiteralPathWithUpdate', Mandatory)]
        [switch] $Update,

        [Parameter(ParameterSetName = 'PathWithForce', Mandatory)]
        [Parameter(ParameterSetName = 'LiteralPathWithForce', Mandatory)]
        [switch] $Force,

        [Parameter()]
        [switch] $PassThru
    )

    begin {
        $DestinationPath = $PSCmdlet.GetUnresolvedProviderPathFromPSPath($DestinationPath)
        if([Path]::GetExtension($DestinationPath) -ne '.zip') {
            $DestinationPath = $DestinationPath + '.zip'
        }

        if($Force.IsPresent) {
            $fsMode = [FileMode]::Create
        }
        elseif($Update.IsPresent) {
            $fsMode = [FileMode]::OpenOrCreate
        }
        else {
            $fsMode = [FileMode]::CreateNew
        }

        $ExpectingInput = $null
    }
    process {
        $isLiteral  = $false
        $targetPath = $Path

        if($PSBoundParameters.ContainsKey('LiteralPath')) {
            $isLiteral  = $true
            $targetPath = $LiteralPath
        }

        if(-not $ExpectingInput) {
            try {
                $destfs = [File]::Open($DestinationPath, $fsMode)
                $zip    = [ZipArchive]::new($destfs, [ZipArchiveMode]::Update)
                $ExpectingInput = $true
            }
            catch {
                $zip, $destfs | ForEach-Object Dispose
                $PSCmdlet.ThrowTerminatingError($_)
            }
        }

        $queue = [Queue[FileSystemInfo]]::new()

        foreach($item in $ExecutionContext.InvokeProvider.Item.Get($targetPath, $true, $isLiteral)) {
            $queue.Enqueue($item)

            $here = $item.Parent.FullName
            if($item -is [FileInfo]) {
                $here = $item.Directory.FullName
            }

            while($queue.Count) {
                try {
                    $current = $queue.Dequeue()
                    if($current -is [DirectoryInfo]) {
                        $current = $current.EnumerateFileSystemInfos()
                    }
                }
                catch {
                    $PSCmdlet.WriteError($_)
                    continue
                }

                foreach($item in $current) {
                    try {
                        if($item.FullName -eq $DestinationPath) {
                            continue
                        }

                        $relative = $item.FullName.Substring($here.Length + 1)
                        $entry    = $zip.GetEntry($relative)

                        if($item -is [DirectoryInfo]) {
                            $queue.Enqueue($item)
                            if(-not $entry) {
                                $entry = $zip.CreateEntry($relative + '\', $CompressionLevel)
                            }
                            continue
                        }

                        if(-not $entry) {
                            $entry = $zip.CreateEntry($relative, $CompressionLevel)
                        }

                        $sourcefs = $item.Open([FileMode]::Open, [FileAccess]::Read, [FileShare] 'ReadWrite, Delete')
                        $entryfs  = $entry.Open()
                        $sourcefs.CopyTo($entryfs)
                    }
                    catch {
                        $PSCmdlet.WriteError($_)
                    }
                    finally {
                        $entryfs, $sourcefs | ForEach-Object Dispose
                    }
                }
            }
        }
    }
    end {
        $zip, $destfs | ForEach-Object Dispose

        if($PassThru.IsPresent) {
            $DestinationPath -as [FileInfo]
        }
    }
}
Barranca answered 14/6, 2022 at 3:18 Comment(2)
FYI: MS just updated their code (in beta currenlty) to provide an x64 implementaiton which no longer has the 2GB limit: github.com/PowerShell/Microsoft.PowerShell.Archive/issues/19Maines
@Maines that's good to know it was about time already. If you want you can edit my answer adding your comment to it.Barranca

© 2022 - 2024 — McMap. All rights reserved.