I was triggered to ask this question when trying to support this question with an MCVE.
I recently started noticing that TClientDataSet quickly runs out of memory. I had an issue in production where it couldn't load a dataset with about 60.000, which seemed surprisingly low to me. The client dataset was connected through a provider with an ADODataSet, which loaded fine. I ran that query separately and outputted the result to CSV, which gave me a file of < 30MB.
So I made a small test, where I can load up to about 165K records in the client dataset, which has a string field with a size of 4000. The actual value of the field is only 3 characters, but that doesn't seem to matter for the result.
It looks like each record takes up at least those 4000 characters. 4000 x 2 bytes x 165K records = 1.3GB, so that starts closing in to the 32 bit memory limit. If I turn it into a memo field, I can easily add 5 million rows.
program ClientDataSetTest;
{$APPTYPE CONSOLE}
uses SysUtils, DB, DBClient;
var
c: TClientDataSet;
i: Integer;
begin
c := TClientDataSet.Create(nil);
c.FieldDefs.Add('Id', ftInteger);
c.FieldDefs.Add('Test', ftString, 4000); // Actually claims this much space...
//c.FieldDefs.Add('Test', ftMemo); // Way more space efficient (and not notably slower)
//c.FieldDefs.Add('Test', ftMemo, 1); // But specifying size doesn't have any effect.
c.CreateDataSet;
try
i := 0;
while i < 5000000 do
begin
c.Append;
c['Id'] := i;
c['Test'] := 'xyz';
c.Post;
if (i mod 1000) = 0 then
WriteLn(i, c['Test']);
Inc(i);
end;
except
on e: Exception do
begin
c.Cancel;
WriteLn('Error adding row', i);
Writeln(e.ClassName, ': ', e.Message);
end;
end;
c.SaveToFile('c:\temp\output.xml', dfXML);
Writeln('Press ''any'' key');
ReadLn;
end.
So the question(s) themselves are a bit broad, but I'd like to have a solution for this and be able to load larger data sets by using the string space a bit more efficient. The reason the field is large, is because they can contain an annotation. For most records those will be empty or short though, so it's a tremendous waste of space.
- Can TClientDataSet be configured in such a way that it handles this differently? I browsed its properties, but I can't find anything that seems related to this.
- Can it be solved by using a different field type? I though of ftMemo, but that has some other disadvantages, like the size not being used for truncation, and some display issues, like TDBGrid displaying it as (MEMO), instead of the actual value.
- Are there drop-in replacements for TClientDataSet that solve this? It's not just about the in-memory part, but also about the communication with ADO components through a TProvider, which is the main way I use it in this project, so not any memory dataset would do the trick.
For that last point, I happened to find this question, where hidden away in comments, vgLib is mentioned, but all I find about that is broken links, and I don't even know if it would solve this issue. Apparently the C++ code for MidasLib is available now, but since it's 1.5MB of obscure code, I thought it might be worth asking here before I dive into that. ;)
(MEMO)
displayed in the grid can be easily fixed using the field'sOnGetText
event. I use it all the time to display the first handful of characters in the grid, and open a form with a memo control when the row is double-clicked to display the full content. The truncation can be handled inOnBeforePost
. – UncinateTO_CLOB
in the query to fit it into a memo field, makes the (Oracle 11g) query almost 10 times as slow, making it take minutes to open! Maybe not a dead end, but at least mortally wounded. :p – Toast