TClientDataSet uses too much memory for string fields
Asked Answered
T

2

7

I was triggered to ask this question when trying to support this question with an MCVE.

I recently started noticing that TClientDataSet quickly runs out of memory. I had an issue in production where it couldn't load a dataset with about 60.000, which seemed surprisingly low to me. The client dataset was connected through a provider with an ADODataSet, which loaded fine. I ran that query separately and outputted the result to CSV, which gave me a file of < 30MB.

So I made a small test, where I can load up to about 165K records in the client dataset, which has a string field with a size of 4000. The actual value of the field is only 3 characters, but that doesn't seem to matter for the result.

It looks like each record takes up at least those 4000 characters. 4000 x 2 bytes x 165K records = 1.3GB, so that starts closing in to the 32 bit memory limit. If I turn it into a memo field, I can easily add 5 million rows.

program ClientDataSetTest;
{$APPTYPE CONSOLE}
uses SysUtils, DB, DBClient;

var
  c: TClientDataSet;
  i: Integer;
begin
  c := TClientDataSet.Create(nil);
  c.FieldDefs.Add('Id', ftInteger);
  c.FieldDefs.Add('Test', ftString, 4000); // Actually claims this much space...
  //c.FieldDefs.Add('Test', ftMemo); // Way more space efficient (and not notably slower)
  //c.FieldDefs.Add('Test', ftMemo, 1); // But specifying size doesn't have any effect.
  c.CreateDataSet;

  try
    i := 0;
    while i < 5000000 do
    begin
      c.Append;
      c['Id'] := i;
      c['Test'] := 'xyz';
      c.Post;

      if (i mod 1000) = 0 then
        WriteLn(i, c['Test']);

      Inc(i);
    end;

  except
    on e: Exception do
    begin
      c.Cancel;
      WriteLn('Error adding row', i);
      Writeln(e.ClassName, ': ', e.Message);
    end;
  end;

  c.SaveToFile('c:\temp\output.xml', dfXML);
  Writeln('Press ''any'' key');
  ReadLn;
end.

So the question(s) themselves are a bit broad, but I'd like to have a solution for this and be able to load larger data sets by using the string space a bit more efficient. The reason the field is large, is because they can contain an annotation. For most records those will be empty or short though, so it's a tremendous waste of space.

  • Can TClientDataSet be configured in such a way that it handles this differently? I browsed its properties, but I can't find anything that seems related to this.
  • Can it be solved by using a different field type? I though of ftMemo, but that has some other disadvantages, like the size not being used for truncation, and some display issues, like TDBGrid displaying it as (MEMO), instead of the actual value.
  • Are there drop-in replacements for TClientDataSet that solve this? It's not just about the in-memory part, but also about the communication with ADO components through a TProvider, which is the main way I use it in this project, so not any memory dataset would do the trick.

For that last point, I happened to find this question, where hidden away in comments, vgLib is mentioned, but all I find about that is broken links, and I don't even know if it would solve this issue. Apparently the C++ code for MidasLib is available now, but since it's 1.5MB of obscure code, I thought it might be worth asking here before I dive into that. ;)

Toast answered 27/3, 2019 at 11:9 Comment(6)
FWIW, the (MEMO) displayed in the grid can be easily fixed using the field's OnGetText event. I use it all the time to display the first handful of characters in the grid, and open a form with a memo control when the row is double-clicked to display the full content. The truncation can be handled in OnBeforePost.Uncinate
@KenWhite Thanks for the suggestions! That is definitely an option for specific cases if a more generic solution isn't available. It does mean that the field type in the ADO dataset has to be memo too, otherwise that gives me errors. And that means I have to return it as a clob from my query, otherwise I get an error from the ADO Dataset. I'd have to figure out what the impact is of that, but it's no trivial change.Toast
I guess I should have said addressed instead of easily fixed. :-)Uncinate
Aiii, just adding TO_CLOB in the query to fit it into a memo field, makes the (Oracle 11g) query almost 10 times as slow, making it take minutes to open! Maybe not a dead end, but at least mortally wounded. :pToast
Yeah, minutes would be an issue. Didn't realize Oracle was involved. Just out of curiosity, are the Direct Oracle Access (DOA) components still around? They were vastly better than ADO or Delphi's own drivers when working with Delphi and Oracle. Just checked - they are, at AllRoundAutomationsUncinate
They are around, and I have used those for specific situation. But my project is 1 million lines and thousands of ADO components, so it's no easy fix to replace them all. Some is tricky too, because then I have two connections, and I have to be careful not trying to mix those in one transaction. But I ran the query in PLSQL Developer as well (also by AllroundAutomations, probably using their DAC) but there the query is a lot slower as well, although seemingly faster than in Delphi.... Sorry I left out all that context, but it felt as noice in the original question.Toast
V
3

There is a difference between the way that the blob fields (memo) and regular fields store and retrieve their data. Blob fields don't store data in the record buffer (see TBlobField.GetDataSize) and they use a different set of methods when storing or retrieving that data.

The size of each record is returned by the call to TField.GetDataSize. For the TStringField, this is the required string size + 1.

TCustomClientDataSet.InitBufferPointers uses this as part of it's calculation for the value of FRecBufSize which is used as the memory size to allocate for each record in TCustomClientDataSet.AllocRecordBuffer.

So, to answer your questions:

  • TClientDataSet can't be configured to do this any differently.
  • It can be solved by other field types but they would have to descend from TBlobField. The buffer size is allocated up front so the regular fields can't contain different sizes depending on their contents.
  • I am not sure about drop in replacements. Dev Express have a dxMemData but I don't know whether it runs into the same problems or if it is a drop in replacement.
Vandiver answered 28/3, 2019 at 1:46 Comment(1)
Clear, thanks for the insight. It's indeed the way it's allocates. I was kinda hoping a way to influence that, but you've made it clear that that's not an option with the client dataset as it is.Toast
H
3

whenever I need rather long "string" field in CDS I tend to create memo one instead. besides aforementioned display issue (which can be addressed rather painless) there are few other restrictions so I have custom cds descendant. hyperbase (not vglib) internal string format is the same so it won't change anything in that regard. btw there are dacs (such as firedac) allowing to customize and choose target field type mapping. not sure whether ado components could be patched/enhanced to achieve similar functionality though. moreover iirc firedac dataset has the option to control internal string field layout ("inline" in-row buffer or just pointer to dynamically allocated one), but isn't 1:1 replacement for cds.

Huelva answered 28/3, 2019 at 8:0 Comment(2)
Interesting, especially the option to control the internal string field layout. I was hoping that was possible in TClientDataSet as well, but it seems not, but indeed, if I can find a good way to map a char or varchar query result to a memo field, I can also make it a memo field in the client dataset. I think that is probably the best way forward.Toast
if replacing ado with different dac isn't an option check its internals to see whether it is possible to implement field type mapping feature thereHuelva

© 2022 - 2024 — McMap. All rights reserved.