How to solve memory segmentation and force FastMM to release memory to OS?
Asked Answered
P

3

6

Note: 32 bit application, which is not planned to be migrated to 64 bit.

I'm working with a very memory consuming application and have pretty much optimized all the relevant paths in respect to memory allocation/de-allocation. (there are no memory leaks, no handle leaks, no any other kind of leaks in the application itself AFAIK and tested. 3rd party libs which I cannot touch are of course candidates but unlikely in my scenario)

The application will frequently allocate large single and bi-dimensional dynamic arrays of single and packed records of up to 4 singles. By large I mean 5000x5000 of record(single,single,single,single) is normal. Also having even 6 or 7 such arrays in work at a given time. This is needed as there are a lot of cross-computations made on these arrays and having them read from disk would be a real performance killer.

Having this clarified, I am getting out of memory errors a lot because of these large dynamic arrays which will not go away after releasing them, no matter if I setlength them to 0 or finalize them. This is of course something FastMM is doing in order to be fast, I know that much.

I am tracking both FastMM allocated blocks and process consumed memory (RAM + PF) by using:

function CurrentProcessMemory(AWaitForConsistentRead:boolean): Cardinal;
var
  MemCounters: TProcessMemoryCounters;
  LastRead:Cardinal;
  maxCnt:integer;
begin
  result := 0;// stupid D2010 compiler warning
  maxCnt := 0;
  repeat
    Inc(maxCnt);
    // this is a stabilization loop;
    // in tight loops, the system doesn't get
    // much chance to release allocated resources, which in turn will get falsely
    // reported by this function as still being used, resulting in a false-positive
    // memory leak report in the application.
    // so we do a tight loop here, waiting, until the application reported memory
    // gets stable.
    LastRead := result;
    MemCounters.cb := SizeOf(MemCounters);
    if GetProcessMemoryInfo(GetCurrentProcess,
        @MemCounters,
        SizeOf(MemCounters)) then
      Result := MemCounters.WorkingSetSize + MemCounters.PagefileUsage
    else
      RaiseLastOSError;
    if AWaitForConsistentRead and (LastRead <> 0) and (abs(LastRead - result)>1024) then
    begin
      sleep(60);
      application.processmessages;
    end;
  until (not AWaitForConsistentRead) or (abs(LastRead - result)<1024) or (maxCnt>1000);
  // 60 seconds wait is a bit too much
  // so if the system is that "unstable", let's just forget it.
end;

function CurrentFastMMMemory:Cardinal;
var mem:TMemoryManagerUsageSummary;
begin
  GetMemoryManagerUsageSummary(mem);
  result := mem.AllocatedBytes + mem.OverheadBytes;
end;

I am running the code on a 64bit computer and my top memory consumption before crashes is about 3.3 - 3.4 GB. After that, I get memory/resources related crashes anywhere in the application. Took me some time to pin it down on the large dynamic arrays usage which were buried down in some 3rd party library.

The way I am getting over this is that I made the application resume itself from where it left off, by re-starting itself and closing with certain parameters. This is all nice and dandy if memory consumption is fair and current operation finishes.

The big problem happens when the current memory usage is 1GB and the next operation to process requires 2.5 GB memory or more to be processed. My current code limited itself to an upper value of 1.5 GB used memory before resuming, but in this situation, I'd have to drop the limit down under 1 GB which would basically have the application resume itself after each operation and not even that guaranteeing that everything will be fine.

What if another operation will have a larger data set to process and it will require a total of 4GB or more memory?

To note that I am not talking about actual 4 GB in memory, but consumed memory by allocating huge dynamic arrays which the OS doesn't get back once de-allocated and hence it still sees it as consumed, so it adds up.

So, my next point of attack is to force fastmm to release all (or at least part of) memory to the OS. I'm specifically targeting the huge dynamic arrays here. Again, these are in a 3rd party library so re-coding that is not really in the top options. It's much easier and faster to tinker in the fastmm code and write a proc to release the memory.

I can't switch from FastMM as currently the entire application and some of the 3rd party libs are heavily coded around the use of PushAllocationGroup in order to quickly find and pinpoint any memory leaks. I know I can write a dummy FastMM unit to solve the compilation references, but I will be left without this quick and certain leak detection.

In conclusion: is there any way I can force FastMM to release at least some of it's large blocks to the OS? (well, sure there is, the actual question is: did anybody write it and if so, mind sharing?)

Thanks

later edit:

I will come up with a small relevant test application soon. It doesn't appear to be that easy to mock up one

Preempt answered 18/12, 2013 at 20:53 Comment(11)
Interesting that you mention memory segmentation in your question, but no reference to memory fragmentation as the issue.Urdar
Why are you packing records? That typically leads to worse performance due to mis-alignment.Actinometer
And as regards the "stupid D2010 compiler warning", the compiler is accurate. If you remove result := 0 then LastRead := result reads an uninitialized variable.Actinometer
@Marcus Adams: memory fragmentation, in my scenario, is not really the issue. FastMM allocates huge blocks, then requires more, but instead of somehow re-using existing available ones, dies on the request. So ok, you could look at it as memory fragmentation, but with very huge fragments. Which kind of over-go the term of "fragment"Preempt
@David Heffernan: the initial code, when the comment was made, didn't have the loop in place. Try it. The RaiseLastOSError call wasn't seen as an exception and as such it was considered a valid path and the function was warned/hinted as note returning a value. (Delphi 2010). I think this happens with calling a procedure that raises abort as well.Preempt
@Preempt Fragmentation is exactly what this is. Are the requests really 380MB blocks?Actinometer
I was commenting on the code in the Q. I see what you mean about RaiseLastOSError. Of course, you cannot expect the compiler to know what's in there. So I would not call the compiler stupid. I have a rather different way shut the compiler up. I'd create an overload of RaiseLastOSError that accepts an untyped var parameter which it ignores before forwarding the call to the real RaiseLastOSError. Pass Result and the compiler has been shut up.Actinometer
@David Heffernan: in the application, they vary from 100 and something MB to 800 and something MB. I'll update the questionPreempt
@ciuly, since you are planning to stay with 32bits, you should explicitly mark that in your question. The fact what you tried to run that on 64bit computer is as relevant as the colour of computer case.Sieber
@Free Consulting the 64 bit OS testing is actually relevant. From what I remember, on a 32 bit OS, by default (without the /3GB option) an application can really only use 2GB of memory, wheres I use sometimes even above 3. So it has a bit of relevance.Preempt
@FreeConsulting LARGEADDRESSAWARE of course!Actinometer
A
4

I doubt that the issue is actually down to FastMM. For huge memory blocks, FastMM will not do any sub-allocation. Your allocation request will be handled with a straight VirtualAlloc. And then deallocation is VirtualFree.

That's assuming that you are allocating those 380MB objects in one contiguous block. I suspect that what you actually have are ragged 2D dynamic arrays. And they are not single allocations. a 5000x5000 ragged 2D dynamic arrays takes 5001 allocations to initialise. One for the row pointers, and 5000 for the rows. Those will be medium FastMM blocks. There will be sub-allocation.

I think you are asking too much. In my experience, any time you need over 3GB of memory in a 32 bit process, it's game over. Fragmentation of address space will stop you before you run out of memory. You cannot hope for this to work. Switch to 64 bit, or use a cleverer, less demanding allocation pattern. Or do you really need dense 2D arrays? Can you use sparse storage?

If you cannot alleviate your memory demands that way, you could use memory mapped files. This would allow you to make use of the extra memory that your 64 bit system has. The system's disk cache can be larger than 4GB and so your app can traverse more than 4GB of memory without actually needing to hit the disk.

You could certainly try different memory managers. I honestly do not hold out any hope that it would help. You could write a trivial replacement memory manager that used HeapAlloc. And enable the low fragmentation heap (enabled by default from Vista on). But I sincerely doubt that it will help. I'm afraid that there won't be a quick fix for you. To resolve this you face a more fundamental modification to your code.

Actinometer answered 18/12, 2013 at 21:18 Comment(5)
@FreeConsulting Not if the compiler is D2010 as per the code. Or did I miss the part of the question which stated that the 64 bit compiler was being used. Please do point it out for me.Actinometer
I didn't switch to 64 nor planning. The problem is not in needing 3 GB, but in needing 500 MB 10 times. You allocate a:array of array of record a,b,c,d:single; end; and do SetLength(a, 5000, 5000); do whatever SetLength(a, 0, 0) OR Finalize(a); and you do this in a loop and see what happens. After a few iterations, you run out of memory. Call it segmentation, fragmentation, or even a "leak": down the road, it shouldn't happen.Preempt
No probs at all doing that here. I can do 1000 iterations no problem. I think I can do it forever. I'm sure your actual program does more. You really cannot expect to have 3GB address space reserved in a 4GB program, with lots of allocate/reallocate. That's bound to lead to fragmentation.Actinometer
Ah, I overlooked that comment about stupidity, sorry.Sieber
@David Heffernan: yes my program does more, I thought that would suffice as a test case (based on my debugging the application), I also see it does not. I will make a reproducible test app tomorrow (almost 1 am here at the moment).Preempt
E
2

Your issue as others have said is most likely attributable to memory fragmentation. You could test this by using VirtualQuery to create a picture of how memory is allocated to your application. You will very likely find that although you may have more than enough total memory for a new array, you don't have enough contiguous memory.

FastMem already does a lot to try and avoid problems due to memory fragmentation. "Small" allocations are done at the low end of the address space, whereas "large" allocations are done at the high end. This avoids a common problem where a series of large then small allocations followed by all large allocations being released results in a large amount of fragmented memory that is almost unusable. (Certainly unusable by anything slightly larger than the original large allocations.)

To see the benfits of FastMem's approach, imagine your memory layed out as follows:

Each digit represent a 100mb block.
[0123456789012345678901234567890123456789]

Small allocations represented by "s".
Large allocations repestented by capital letters.
[0sssss678901GGGGFFFFEEEEDDDDCCCCBBBBAAAA]

Now if you free all your large blocks, you should have no trouble performing similar large allocations later.
[0sssss6789012345678901234567890123456789]

The problem is that "large" and "small" are relative, and highly dependent on the nature of your application. FastMem defines a dividing line between "large" and "small". If you happen to have some small allocations that FastMem would classify as large, you may encounter the following problem.

[0sss4sGGGGsFFFFsEEEEsDDDDsCCCCsBBBBsAAAA]

Now if you free the large blocks you're left with:
[0sss4s6789s1234s6789s1234s6789s1234s6789]

And an attempt to allocate something larger than 400mb will fail.


Options

  1. You may be able to tweak the FastMem settings so that all your "small" allocations are also considered small by FastMem. However, there are a few situations where this won't work:
    • Any DLLs you use that allocate memory to your application but bypass FastMem may still cause fragmentation.
    • If you don't release all your large blocks together, those that remain may induce fragmentation which will slowly get worse over time.
  2. You could take on the task of memory management yourself.
    • Allocate one very large block e.g. 3.5GB which you keep for the entire lifetime of the application.
    • Instead of using dynamic arrays, you determine the pointer locations to use when setting up a new array.
  3. Of course the simplest alternative would be to go 64-bit.
  4. You could consider alternate data structures.
    • Do you really need array lookup capability? If not, another structure that allocates in smaller chunks may suffice.
    • Even if you do need array lookup, consider a paged array. Sparse arrays are a combination of arrays and linked lists. Data is stored on pages, with linked lists chaining each page.
    • A simple variant (since you mentioned your arrays are 2 dimensional) would be to leverage that: One dimension forms its own array providing a lookup into one of multiple arrays for the second dimension.
  5. Related to the alternate data structures option, consider storing some data on disk. Yes performance will be slower. But if an efficient caching mechanism can be found, then maybe not so much. It would be better to be a little slower, but not crashing.
Embrey answered 19/12, 2013 at 14:13 Comment(0)
P
0

Dynamic arrays are reference counted in Delphi, so they should be automatic released when they are not used anymore. Like strings, they are handled with COW (copy on write) when shared/stored in several variables/objects. So it seems you have some kind of memory/reference leak (e.g. an object in memory that holds still are reference to an array). Just to be sure: you are not doing any kind of low level pointer tricks, aren't you?

So please yes, post a test program (or send the complete program private via email) so one of us can take a look at it.

Phlox answered 19/12, 2013 at 8:39 Comment(1)
Releasing memory back to the memory manager does not mean that is then released to the system. Sub allocating memory managers may hold on to the memory and re-use it. Also, please don't suggest private e-mail. This site is all about sharing.Actinometer

© 2022 - 2024 — McMap. All rights reserved.