I am using Embarcadero's Rad Studio Delphi (10.2.3) and have encountered a memory issue while reading in very large text files (7 million lines+, every line is different, lines can be 1 to ~200 characters in length, etc.). I am fairly new at Delphi programming, so I have scoured SO and Google looking for help before posting.
I originally implemented a TStringList and read the file using LoadFromFile method, but this failed spectacularly when the processed text files became large enough. I then implemented a TStreamReader and used ReadLn to populate the TStringList using the basic code found here:
TStringList.LoadFromFile - Exceptions with Large Text Files
Code Example:
//MyStringList.LoadFromFile(filename);
Reader := TStreamReader.Create(filename, true);
try
MyStringList.BeginUpdate;
try
MyStringList.Clear;
while not Reader.EndOfStream do
MyStringList.Add(Reader.ReadLine);
finally
MyStringList.EndUpdate;
end;
finally
Reader.Free;
end;
This worked great until the files I needed to process became huge (~7 million lines +). It appears that the TStringList is getting so large as to run out of memory. I say "appears" as I don't actually have access to the file that is being run, and all error information is provided by my customer through email, making this problem even more difficult as I can't simply debug it in the IDE.
The code is compiled 32-bit and I am unable to use the 64-bit compiler. I can't include a database system or the like, either. Unfortunately, I have some tight restrictions. I need to load in every line to look for patterns and compare those lines to other lines to look for "patterns within patterns." I apologize for being very vague here.
The bottom line is this--is there a way to have access to every line in the text file without using a TStringList, or perhaps a better way to handle the TStringList memory?
Maybe there is a way to load a specific block of lines from the StreamReader into the TStringList (e.g., read in the first 100,000 lines and process, the next 100,000 lines, etc.) instead of everything at once? I think I could then write something to handle the possible "inter-block" patterns.
Thanks in advance for any and all help and suggestions!
***** EDITED WITH UPDATE *****
Ok, here is the basic solution that I need to implement:
var
filename: string;
sr: TStreamReader;
sl: TStringList;
total, blocksize: integer;
begin
filename := 'thefilenamegoeshere';
sl := TStringList.Create;
sr := TStreamReader.Create(filename, true);
sl.Capacity := sr.BaseStream.Size div 100;
total := 0; // Total number of lines in the file (after it is read in)
blocksize := 10000; // The number of lines per "block"
try
sl.BeginUpdate;
try
while not sr.EndOfStream do
begin
sl.Clear;
while not (sl.Count >= blocksize) do
begin
sl.Add(sr.ReadLine);
total := total + 1;
if (sr.EndOfStream = true) then break;
end;
// Handle the current block of lines here
end;
finally
sl.EndUpdate;
end;
finally
sr.Free;
sl.Free;
end;
end;
I have some test code that I will use to refine my routines, but this seems to be relatively fast, efficient, and sufficient. I want to thank everyone for their responses that got my gray matter firing!