Handling very large text files using TStreamReader and TStringList

I am using Embarcadero's Rad Studio Delphi (10.2.3) and have encountered a memory issue while reading in very large text files (7 million lines+, every line is different, lines can be 1 to ~200 characters in length, etc.). I am fairly new at Delphi programming, so I have scoured SO and Google looking for help before posting.

I originally implemented a TStringList and read the file using LoadFromFile method, but this failed spectacularly when the processed text files became large enough. I then implemented a TStreamReader and used ReadLn to populate the TStringList using the basic code found here:

TStringList.LoadFromFile - Exceptions with Large Text Files

Code Example:

//MyStringList.LoadFromFile(filename);
Reader := TStreamReader.Create(filename, true);
try
  MyStringList.BeginUpdate;
  try
    MyStringList.Clear;
    while not Reader.EndOfStream do
      MyStringList.Add(Reader.ReadLine);
  finally
    MyStringList.EndUpdate;
  end;
finally
  Reader.Free;
end;

This worked great until the files I needed to process became huge (~7 million lines +). It appears that the TStringList is getting so large as to run out of memory. I say "appears" as I don't actually have access to the file that is being run, and all error information is provided by my customer through email, making this problem even more difficult as I can't simply debug it in the IDE.

The code is compiled 32-bit and I am unable to use the 64-bit compiler. I can't include a database system or the like, either. Unfortunately, I have some tight restrictions. I need to load in every line to look for patterns and compare those lines to other lines to look for "patterns within patterns." I apologize for being very vague here.

The bottom line is this--is there a way to have access to every line in the text file without using a TStringList, or perhaps a better way to handle the TStringList memory?

Maybe there is a way to load a specific block of lines from the StreamReader into the TStringList (e.g., read in the first 100,000 lines and process, the next 100,000 lines, etc.) instead of everything at once? I think I could then write something to handle the possible "inter-block" patterns.

Thanks in advance for any and all help and suggestions!

***** EDITED WITH UPDATE *****

Ok, here is the basic solution that I need to implement:

var
  filename: string;
  sr: TStreamReader;
  sl: TStringList;
  total, blocksize: integer;
begin
  filename := 'thefilenamegoeshere';
  sl := TStringList.Create;
  sr := TStreamReader.Create(filename, true);
  sl.Capacity := sr.BaseStream.Size div 100;
  total := 0; // Total number of lines in the file (after it is read in)
  blocksize := 10000; // The number of lines per "block"
  try
    sl.BeginUpdate;
    try
      while not sr.EndOfStream do
        begin
          sl.Clear;
          while not (sl.Count >= blocksize) do
            begin
              sl.Add(sr.ReadLine);
              total := total + 1;
              if (sr.EndOfStream = true) then break;
            end;
          // Handle the current block of lines here
        end;
    finally
      sl.EndUpdate;
    end;
  finally
    sr.Free;
    sl.Free;
  end;
end;

I have some test code that I will use to refine my routines, but this seems to be relatively fast, efficient, and sufficient. I want to thank everyone for their responses that got my gray matter firing!

Recommended topics

Hot tags