FileHelpers throws OutOfMemoryException when parsing large csv file
Asked Answered
A

2

7

I'm trying to parse a very large csv file with FileHelpers (http://www.filehelpers.net/). The file is 1GB zipped and about 20GB unzipped.

        string fileName = @"c:\myfile.csv.gz";
        using (var fileStream = File.OpenRead(fileName))
        {
            using (GZipStream gzipStream = new GZipStream(fileStream, CompressionMode.Decompress, false))
            {
                using (TextReader textReader = new StreamReader(gzipStream))
                {
                    var engine = new FileHelperEngine<CSVItem>();
                    CSVItem[] items = engine.ReadStream(textReader);                        
                }
            }
        }

FileHelpers then throws an OutOfMemoryException.

Test failed: Exception of type 'System.OutOfMemoryException' was thrown. System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown. at System.Text.StringBuilder.ExpandByABlock(Int32 minBlockCharCount) at System.Text.StringBuilder.Append(Char value, Int32 repeatCount) at System.Text.StringBuilder.Append(Char value) at FileHelpers.StringHelper.ExtractQuotedString(LineInfo line, Char quoteChar, Boolean allowMultiline) at FileHelpers.DelimitedField.ExtractFieldString(LineInfo line) at FileHelpers.FieldBase.ExtractValue(LineInfo line) at FileHelpers.RecordInfo.StringToRecord(LineInfo line) at FileHelpers.FileHelperEngine1.ReadStream(TextReader reader, Int32 maxRecords, DataTable dt) at FileHelpers.FileHelperEngine1.ReadStream(TextReader reader)

Is it possible to parse a file this big with FileHelpers? If not can anyone recommend an approach to parsing files this big? Thanks.

Alika answered 5/3, 2013 at 20:29 Comment(4)
BowserKingKoopa my first question would be the obvious, how much available space do you have when unzipping the file if it's 20GB I would double that to see if you have 40GB freeDecalogue
Do you want to put ~20GB of data into RAM ? Really ?Pickmeup
Shouldn't you be using a BinaryReader instead of a TextReader? does FileHelpers handle the buffer size or do you need to set that as well yourself..?Decalogue
In my opinion this is something you should handle with a database (so the actual data will remain on the HD)... why for example don't you create a sqlite DB and import that CSV into a table ?Pickmeup
S
13

You must work record by record in this way:

  string fileName = @"c:\myfile.csv.gz";
  using (var fileStream = File.OpenRead(fileName))
  {
      using (GZipStream gzipStream = new GZipStream(fileStream, CompressionMode.Decompress, false))
      {
          using (TextReader textReader = new StreamReader(gzipStream))
          {
            var engine = new FileHelperAsyncEngine<CSVItem>();
            using(engine.BeginReadStream(textReader))
            {
                foreach(var record in engine)
                {
                   // Work with each item
                }
            }
          }
      }
  }

If you use this async aproach you will only be using the memory for a record a time, and that will be much more faster.

Scion answered 5/3, 2013 at 21:52 Comment(1)
Thanks! FileHelperAsyncEngine is just what I was looking for.Alika
B
0

This isn't a complete answer, but if you have a 20GB csv file, you'll need 20GB+ to store the whole thing in memory at once unless your reader keeps everything compressed in memory (unlikely). You need to read the file in chunks, and the solution you're using of putting everything into an array will not work if you don't have huge amounts of ram.

You need a loop a bit more like this:

CsvReader reader = new CsvReader(filePath)
CSVItem item = reader.ReadNextItem();
while(item != null){
  DoWhatINeedWithCsvRow(item);
  item = reader.ReadNextItem();
}

C#'s memory management will then be smart enough to dispose of the old CSVItems as you go through them, provided you don't keep references to them hanging around.

A better version would read a chunk from the CSV (eg. 10,000 rows), deal with all those, then get another chunk, or create a task for DoWhatINeedWithCsvRow if you don't care about processing order.

Bechuanaland answered 5/3, 2013 at 20:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.