TStringList of objects taking up tons of memory in Delphi XE
Asked Answered
I

10

6

I'm working on a simulation program.

One of the first things the program does is read in a huge file (28 mb, about 79'000 lines,), parse each line (about 150 fields), create a class for the object, and add it to a TStringList.

It also reads in another file, which adds more objects during the run. At the end, it ends up being about 85'000 objects.

I was working with Delphi 2007, and the program used a lot of memory, but it ran OK. I upgraded to Delphi XE, and migrated the program over and now it's using a LOT more memory, and it ends up running out of memory half way through the run.

So in Delphi 2007, it would end up using 1.4 gigs after reading in the initial file, which is obviously a huge amount, but in XE, it ends up using almost 1.8 gigs, which is really huge and leads to running out and getting the error

So my question is

  1. Why is it using so much memory?
  2. Why is it using so much more memory in XE than 2007?
  3. What can I do about this? I can't change how big or long the file is, and I do need to create an object for each line and to store it somewhere

Thanks

Ingredient answered 25/8, 2011 at 16:4 Comment(9)
Are you sure the numbers aren't 1.4 and 2.8?Yugoslav
Unicode is bigger than AnsiString. This are on the basis of TStringList class.Priesthood
until you can rely on solely using 64 bit processes you should redesign your app to use memory more frugally. Even at 1.4gb you will be pushing the limits of address space on a 32 bit system.Substantiate
@Andreas:if all his data were strings, yes, it would be around 2.8GB, but I assume that string data is 800MB (as opposed to 400MB in Ansi), and the rest (1GB) is occupied by his objects. As David Heffernan says, in one of the comments, both, 1.4GB and 1.8GB, are pushing the limits.Gaynellegayner
(Or, perhaps the data consists only of strings, and 1.8 is where the app stops. That is, if the app would run as expected, it would end up consuming 2.8.)Yugoslav
@Andreas: Could be. But even a class with 150 fields of Integers would still have an instance size of around 600 bytes. Take, say, 89,000 such items, and you already have 53+MB. So at least some of the fields will indeed be strings.Gaynellegayner
@David: well, then I guess we are really pushing the limit having 90+ GB models (200+ million instances) available in memory after a load of app. 6 hours... (and still serving 100's of users as well) You do the math, it can't be done, yet we do it. Don't ask me how, coz... I only know in broad terms. Got a couple of wiz colleagues for the details :-)) But we sure are not using TStringLists or TObjectLists as containers... Heck, we don't even have data in our objects, and we ignore/overwrite the Monitor Field that was so sneakily introduced in D2009. 64bit conversion is gonna be fun :-))Involucre
@marjan doesn't matter how clever you colleagues are, they can't contravene the 4gb address space limit.Substantiate
@david: no, but it seems like it sometimes...Involucre
D
6

It's hard to say why your 28 MB file is expanding to 1.4 GB worth of objects when you parse it out into objects without seeing the code and the class declarations. Also, you say you're storing it in a TStringList instead of a TList or TObjecList. This sounds like you're using it as some sort of string->object key/value mapping. If so, you might want to look at the TDictionary class in the Generics.Collections unit in XE.

As for why you're using more memory in XE, it's because the string type changed from an ANSI string to a UTF-16 string in Delphi 2009. If you don't need Unicode, you could use a TDictionary to save space.

Also, to save even more memory, there's another trick you could use if you don't need all 79,000 of the objects right away: lazy loading. The idea goes something like this:

  • Read the file into a TStringList. (This will use about as much memory as the file size. Maybe twice as much if it gets converted into Unicode strings.) Don't create any data objects.
  • When you need a specific data object, call a routine that checks the string list and looks up the string key for that object.
  • Check if that string has an object associated with it. If not, create the object from the string and associate it with the string in the TStringList.
  • Return the object associated with the string.

This will keep both your memory usage and your load time down, but it's only helpful if you don't need all (or a large percentage) of the objects immediately after loading.

Diplostemonous answered 25/8, 2011 at 16:35 Comment(4)
Thanks for the reply. Unfortunately, I do need all the objects. The program is an organ allocator. I read in an initial waitlist of patients, and then I either read in a new patient, update a patient on the list, or try to allocate organs, and when I allocate I need to look through all the patients to see if they match (which requires having all the patient values)Ingredient
@KingOfKong: If you have specific criteria you're trying to match against like that, I'd suggest changing your architecture. What you're describing is a perfect match for a relational database and a SQL query. Doing it that way will dramatically decrease both your memory usage and the amount of time it takes to run a search.Diplostemonous
+1 Unless the requirement really is ultra time-critical, reassigning one single object on the fly during the search should work just fine. It basically is a choice between memory usage and duration.Ames
Keep in mind that TStringlist requires twice the amount of memory to load (also in D2007). Once to load in, once to be split into strings. The former is freed, the latter is kept.Briarwood
C
10

Just one idea which may save memory.

You could let the data stay on the original files, then just point to them from in-memory structures.

For instance, it's what we do for browsing big log files almost instantly: we memory-map the log file content, then we parse it quick to create indexes of useful information in memory, then we read the content dynamically. No string is created during the reading. Only pointers to each line beginning, with dynamic arrays containing the needed indexes. Calling TStringList.LoadFromFile would be definitively much slower and memory consuming.

The code is here - see the TSynLogFile class. The trick is to read the file only once, and make all indexes on the fly.

For instance, here is how we retrieve a line of text from the UTF-8 file content:

function TMemoryMapText.GetString(aIndex: integer): string;
begin
  if (self=nil) or (cardinal(aIndex)>=cardinal(fCount)) then
    result := '' else
    result := UTF8DecodeToString(fLines[aIndex],GetLineSize(fLines[aIndex],fMapEnd));
end;

We use the exact same trick to parse JSON content. Using such a mixed approach is used by the fastest XML access libraries.

To handle your high-level data, and query them fast, you may try to use dynamic arrays of records, and our optimized TDynArray and TDynArrayHashed wrappers (in the same unit). Arrays of records will be less memory consuming, will be faster to search in because the data won't be fragemented (even faster if you use ordered indexes or hashes), and you'll be able to have high-level access to the content (you can define custom functions to retrieve the data from the memory mapped file, for instance). Dynamic arrays won't fit fast deletion of items (or you'll have to use lookup tables) - but you wrote you are not deleting much data, so it won't be a problem in your case.

So you won't have any duplicated structure any more, only logic in RAM, and data on memory-mapped file(s) - I added a "s" here because the same logic could perfectly map to several source data files (you need some "merge" and "live refresh" AFAIK).

Caloric answered 25/8, 2011 at 17:47 Comment(0)
D
6

It's hard to say why your 28 MB file is expanding to 1.4 GB worth of objects when you parse it out into objects without seeing the code and the class declarations. Also, you say you're storing it in a TStringList instead of a TList or TObjecList. This sounds like you're using it as some sort of string->object key/value mapping. If so, you might want to look at the TDictionary class in the Generics.Collections unit in XE.

As for why you're using more memory in XE, it's because the string type changed from an ANSI string to a UTF-16 string in Delphi 2009. If you don't need Unicode, you could use a TDictionary to save space.

Also, to save even more memory, there's another trick you could use if you don't need all 79,000 of the objects right away: lazy loading. The idea goes something like this:

  • Read the file into a TStringList. (This will use about as much memory as the file size. Maybe twice as much if it gets converted into Unicode strings.) Don't create any data objects.
  • When you need a specific data object, call a routine that checks the string list and looks up the string key for that object.
  • Check if that string has an object associated with it. If not, create the object from the string and associate it with the string in the TStringList.
  • Return the object associated with the string.

This will keep both your memory usage and your load time down, but it's only helpful if you don't need all (or a large percentage) of the objects immediately after loading.

Diplostemonous answered 25/8, 2011 at 16:35 Comment(4)
Thanks for the reply. Unfortunately, I do need all the objects. The program is an organ allocator. I read in an initial waitlist of patients, and then I either read in a new patient, update a patient on the list, or try to allocate organs, and when I allocate I need to look through all the patients to see if they match (which requires having all the patient values)Ingredient
@KingOfKong: If you have specific criteria you're trying to match against like that, I'd suggest changing your architecture. What you're describing is a perfect match for a relational database and a SQL query. Doing it that way will dramatically decrease both your memory usage and the amount of time it takes to run a search.Diplostemonous
+1 Unless the requirement really is ultra time-critical, reassigning one single object on the fly during the search should work just fine. It basically is a choice between memory usage and duration.Ames
Keep in mind that TStringlist requires twice the amount of memory to load (also in D2007). Once to load in, once to be split into strings. The former is freed, the latter is kept.Briarwood
Y
3
  • In Delphi 2007 (and earlier), a string is an Ansi string, that is, every character occupies 1 byte of memory.

  • In Delphi 2009 (and later), a string is a Unicode string, that is, every character occupies 2 bytes of memory.

AFAIK, there is no way to make a Delphi 2009+ TStringList object use Ansi strings. Are you really using any of the features of the TStringList? If not, you could use an array of strings instead.

Then, naturally, you can choose between

type
  TAnsiStringArray = array of AnsiString;
  // or
  TUnicodeStringArray = array of string; // In Delphi 2009+, 
                                         // string = UnicodeString
Yugoslav answered 25/8, 2011 at 16:8 Comment(7)
The only thing I really want to do is store the objects and then have a way of deleting/adding them. I'm also using FindIdx to search through the list to find the objects to update/deleteIngredient
Would an array of strings use that much less memory? ISTM that his objects consume most of the memory. I don't know for wht the objects are, but he should probably find a better way to process and store them, e.g. memory mapped files. But that would probably require quite a rewrite.Gaynellegayner
The problem might be the objects, they have about 60 fields, about 10 of which are lists. most of the lists are small but one in particular is 200 fields longIngredient
@Rudy, yes, memory mapped files is one of the techniques I'd recommend as well, though it is gonna be one heck of a rewrite.Involucre
@KingOfKong: see what you can do about duplicates. Don't know enough about string handling to know whether string refcounting is gonna help when you keep adding new objects that just happen to have the same string contents, but you could try to reduce memory consumption by identifying redundant/duplicate data and reusing objects instantiated earlier: ie when one instance has the same list contents as another instance.Involucre
Wasn't there a period when string was equivalent to WideString (just before switching to UnicodeString)? I thought there was, but perhaps it was some misapprehension on my side.Subclinical
@Andriy IMHO there was never such an epoch. String changed from AnsiString to UnicodeString with Delphi 2009. And WideString were always present, since early Delphi versions, for handling the Ole BSTR string type: in fact, WideString are handled as COM/OLE strings, using the Windows API for them, and with no reference count, whereas String (Ansi* or Unicode*) are pure Delphi types, allocated on Delphi heap, with reference counting.Caloric
M
3

Reading though the comments, it sounds like you need to lift the data out of Delphi and into a database.

From there it is easy to match organ donors to receivers*)

SELECT pw.* FROM patients_waiting pw
INNER JOIN organs_available oa ON (pw.bloodtype = oa.bloodtype) 
                              AND (pw.tissuetype = oa.tissuetype)
                              AND (pw.organ_needed = oa.organ_offered)
WHERE oa.id = '15484'

If you want to see the patients that might match against new organ-donor 15484.

In memory you only handle the few patients that match.

*) simplified beyond all recognition, but still.

Mccubbin answered 25/8, 2011 at 17:45 Comment(1)
Yes - the SQL option is always good, when you've to handle complex queries. If you create proper indexes, the SQL planner will probably do a better optimization job than brute-force lookup of objects in memory. And I suspect you can even use an in-memory database for this purpose, for instance using SQLite.Caloric
L
1

In addition to Andreas' post:

Before Delphi 2009, a string header occupied 8 bytes. Starting with Delphi 2009, a string header takes 12 bytes. So every unique string uses 4 bytes more than before, + the fact that each character takes twice the memory.

Also, starting with Delphi 2010 I believe, TObject started using 8 bytes instead of 4. So for each single object created by delphi, delphi now uses 4 more bytes. Those 4 bytes were added to support the TMonitor class I believe.

If you're in desperate need to save memory, here's a little trick that could help if you have a lot of string value that repeats themselve.

var
  uUniqueStrings : TStringList;

function ReduceStringMemory(const S : String) : string;
var idx : Integer;
begin
  if not uUniqueStrings.Find(S, idx) then
    idx := uUniqueStrings.Add(S);

  Result := uUniqueStrings[idx]
end;

Note that this will help ONLY if you have a lot of string values that repeat themselves. For exemple, this code use 150mb less on my system.

var sl : TStringList;
  I: Integer;
begin
  sl := TStringList.Create;
  try
    for I := 0 to 5000000 do
      sl.Add(ReduceStringMemory(StringOfChar('A',5)));every
  finally
    sl.Free;
  end;
end;
Legatee answered 25/8, 2011 at 17:3 Comment(3)
I just checked it: TObject.Create.InstanceSize=4 before Delphi 2010 - that is only the pointer to the class type.Caloric
@Rudy Exactly. So much for a still-not-working Monitor field. End of trolling. ;)Caloric
About memory consumption, it's a bit more complex than that. Depending on the memory size, it has an header and some alignment. It depends on the memory manager used, but due to per-block allocation structure and reallocation preparation, a plain object will use more than 4 bytes of memory heap, whatever (at least a 4 pointer header for every memory block). From the global memory consumption POV, changing from 4 bytes to 8 bytes didn't change anything (at least when using FastMM4).Caloric
G
1

I also read in a lot of strings in my program that can approach a couple of GB for large files.

Short of waiting for 64-bit XE2, here is one idea that might help you:

I found storing individual strings in a stringlist to be slow and wasteful in terms of memory. I ended up blocking the strings together. My input file has logical records, which may contain between 5 and 100 lines. So instead of storing each line in the stringlist, I store each record. Processing a record to find the line I need adds very little time to my processing, so this is possible for me.

If you don't have logical records, you might just want to pick a blocking size, and store every (say) 10 or 100 strings together as one string (with a delimiter separating them).

The other alternative, is to store them in a fast and efficient on-disk file. The one I'd recommend is the open source Synopse Big Table by Arnaud Bouchez.

Greasepaint answered 25/8, 2011 at 17:31 Comment(1)
Thanks for your promotion for Open Source projects!Caloric
E
0

May I suggest you try using the jedi class library (JCL) class TAnsiStringList, which is like TStringList fromDelphi 2007 in that it is made up of AnsiStrings.

Even then, as others have mentioned, XE will be using more memory than delphi 2007.

I really don't see the value of loading the full text of a giant flat file into a stringlist. Others have suggested a bigtable approach such as Arnaud Bouchez's one, or using SqLite, or something like that, and I agree with them.

I think you could also write a simple class that will load the entire file you have into memory, and provide a way to add line-by-line object links to a giant in-memory ansichar buffer.

Eupheemia answered 26/8, 2011 at 4:14 Comment(0)
C
0

Starting with Delphi 2009, not only strings but also every TObject has doubled in size. (See Why Has the Size of TObject Doubled In Delphi 2009?). But this would not explain this increase if there are only 85,000 objects. Only if these objects contain many nested objects, their size could be a relevant part of the memory usage.

Cubiform answered 26/8, 2011 at 5:49 Comment(1)
They have and they are: The problem might be the objects, they have about 60 fields, about 10 of which are lists. most of the lists are small but one in particular is 200 fields long.Ames
B
0

Are there many duplicate strings in your list? Maybe trying to only store unique strings will help reducing the memory size. See my Question about a string pool for a possible (but maybe too simple) answer.

Baseball answered 26/8, 2011 at 6:56 Comment(0)
B
0

Are you sure you don't suffer from a case of memory fragementation?

Be sure to use the latest FastMM (currently 4.97), then take a look at the UsageTrackerDemo demo that contains a memory map form showing the actual usage of the Delphi memory.

Finally take a look at VMMap that shows you how your process memory is used.

Broadspectrum answered 26/8, 2011 at 8:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.