GZipStream on large data
Asked Answered
B

2

6

I am attempting to compress a large amount of data, sometimes in the region of 100GB, when i run the routine i have written it appears the file comes out exactly the same size as the previous size. Has anyone else had this issue with the GZipStream?

My code is as follows:

        byte[] buffer = BitConverter.GetBytes(StreamSize);
        FileStream LocalUnCompressedFS = File.OpenWrite(ldiFileName);
        LocalUnCompressedFS.Write(buffer, 0, buffer.Length);
        GZipStream LocalFS = new GZipStream(LocalUnCompressedFS, CompressionMode.Compress);
        buffer = new byte[WriteBlock];
        UInt64 WrittenBytes = 0;
        while (WrittenBytes + WriteBlock < StreamSize)
        {
            fromStream.Read(buffer, 0, (int)WriteBlock);
            LocalFS.Write(buffer, 0, (int)WriteBlock);
            WrittenBytes += WriteBlock;
            OnLDIFileProgress(WrittenBytes, StreamSize);
            if (Cancel)
                break;
        }
        if (!Cancel)
        {
            double bytesleft = StreamSize - WrittenBytes;
            fromStream.Read(buffer, 0, (int)bytesleft);
            LocalFS.Write(buffer, 0, (int)bytesleft);
            WrittenBytes += (uint)bytesleft;
            OnLDIFileProgress(WrittenBytes, StreamSize);
        }
        LocalFS.Close();
        fromStream.Close();

The StreamSize is an 8 byte UInt64 value that holds the size of the file. i write these 8 bytes raw to the start of the file so i know the original file size. Writeblock has the value of 32kb (32768 bytes). fromStream is the stream to take data from, in this instance, a FileStream. Is the 8 bytes infront of the compressed data going to cause an issue?

Bossuet answered 16/5, 2012 at 15:29 Comment(2)
Does your code work on smaller files?Entirely
Can you confirm your code correctly compresses smaller datasets - a text file for example you know normally compresses well...Matchmaker
D
5

I ran a test using the following code for compression and it ran without issue on a 7GB and 12GB file (both known beforehand to compress "well"). Does this version work for you?

const string toCompress = @"input.file";
var buffer = new byte[1024*1024*64];

using(var compressing = new GZipStream(File.OpenWrite(@"output.gz"), CompressionMode.Compress))
using(var file = File.OpenRead(toCompress))
{
    var bytesRead = 0;
    while(bytesRead < buffer.Length)
    {
        bytesRead = file.Read(buffer, 0, buffer.Length);
        compressing.Write(buffer, 0, buffer.Length);
    }
}

Have you checked out the documentation?

The GZipStream class cannot decompress data that results in over 8 GB of uncompressed data.

You probably need to find a different library that will support your needs or attempt to break your data up into <=8GB chunks that can safely be "sewn" back together.

Drud answered 16/5, 2012 at 15:34 Comment(7)
Hi Austin, thanks for the answer. My program will not be decompressing so i dont think this matters? unless it's an 8gb limit to compression too.Bossuet
Hmm... what if you need more than that? Are there other options available? It seems strange that a stream would have that sort of limitation.Pick
That's talking about decompression, the OP is talking about compression.Robbyrobbyn
@Skintkingle: How are you testing the validity of your code now?Drud
Hi Austin, I have not confirmed the code compression works on smaller data sets although i cannot see that code snippet 'not' working. One second let me set StreamSize to something small so it only takes a portion of the data and see if the compressed size is smaller.Bossuet
I will try your new solution tomorrow when I'm back at work. :)Bossuet
Curiously, When putting the large data into 6gb data chunks and writing to file compression happened. It looks like there may be a performance hit on Compression aswell as decompression. I will mark your answer as the solution as it mentions an 8gb limit aswell.Bossuet
I
1

Austin Salonen's code doesn't work for me (buggy, 4GB error).

Here's the proper way:

using System;
using System.Collections.Generic;
using System.Text;

namespace CompressFile
{
    class Program
    {


        static void Main(string[] args)
        {
            string FileToCompress = @"D:\Program Files (x86)\msvc\wkhtmltopdf64\bin\wkhtmltox64.dll";
            FileToCompress = @"D:\Program Files (x86)\msvc\wkhtmltopdf32\bin\wkhtmltox32.dll";
            string CompressedFile = System.IO.Path.Combine(
                 System.IO.Path.GetDirectoryName(FileToCompress)
                ,System.IO.Path.GetFileName(FileToCompress) + ".gz"
            );


            CompressFile(FileToCompress, CompressedFile);
            // CompressFile_AllInOne(FileToCompress, CompressedFile);

            Console.WriteLine(Environment.NewLine);
            Console.WriteLine(" --- Press any key to continue --- ");
            Console.ReadKey();
        } // End Sub Main


        public static void CompressFile(string FileToCompress, string CompressedFile)
        {
            //byte[] buffer = new byte[1024 * 1024 * 64];
            byte[] buffer = new byte[1024 * 1024]; // 1MB

            using (System.IO.FileStream sourceFile = System.IO.File.OpenRead(FileToCompress))
            {

                using (System.IO.FileStream destinationFile = System.IO.File.Create(CompressedFile))
                {

                    using (System.IO.Compression.GZipStream output = new System.IO.Compression.GZipStream(destinationFile,
                        System.IO.Compression.CompressionMode.Compress))
                    {
                        int bytesRead = 0;
                        while (bytesRead < sourceFile.Length)
                        {
                            int ReadLength = sourceFile.Read(buffer, 0, buffer.Length);
                            output.Write(buffer, 0, ReadLength);
                            output.Flush();
                            bytesRead += ReadLength;
                        } // Whend

                        destinationFile.Flush();
                    } // End Using System.IO.Compression.GZipStream output

                    destinationFile.Close();
                } // End Using System.IO.FileStream destinationFile 

                // Close the files.
                sourceFile.Close();
            } // End Using System.IO.FileStream sourceFile

        } // End Sub CompressFile


        public static void CompressFile_AllInOne(string FileToCompress, string CompressedFile)
        {
            using (System.IO.FileStream sourceFile = System.IO.File.OpenRead(FileToCompress))
            {
                using (System.IO.FileStream destinationFile = System.IO.File.Create(CompressedFile))
                {

                    byte[] buffer = new byte[sourceFile.Length];
                    sourceFile.Read(buffer, 0, buffer.Length);

                    using (System.IO.Compression.GZipStream output = new System.IO.Compression.GZipStream(destinationFile,
                        System.IO.Compression.CompressionMode.Compress))
                    {
                        output.Write(buffer, 0, buffer.Length);
                        output.Flush();
                        destinationFile.Flush();
                    } // End Using System.IO.Compression.GZipStream output

                    // Close the files.        
                    destinationFile.Close();
                } // End Using System.IO.FileStream destinationFile 

                sourceFile.Close();
            } // End Using System.IO.FileStream sourceFile

        } // End Sub CompressFile


    } // End Class Program


} // End Namespace CompressFile
Intendancy answered 30/4, 2014 at 8:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.