Processing big strings, Is this Large Object Heap Fragmentation?
Asked Answered
C

1

7

I have a .NET 3.5 Application

  • A function is running a million times
  • It's doing search & replace & regex operations in 1MB+ strings (different sized strings)

When I profile the application I can confirm these strings are stored in LOH but also they are reclaimed by GC later on, so at a given time only max 10 of them are in LOH (10 thread is running).

My understanding is, these big strings are located in LOH, then getting reclaimed by GC but yet somehow due to their allocation locations (and being in LOH so not getting compacted) this causes fragmentation. This is happening despite of there is no memory leak in the operation.

It doesn't cause a problem in ~100K times however when it reaches to 1M+ it gives out of memory exceptions.

I'm using ANTS Memory Profiler and this is the result that I got in the early executions:

.NET Using 70MB of 210MB total private bytes allocated in to the application
Number of Fragments: 59
Number of Large Fragments : 48 (99.6% of free memory)
Largest Fragment: 9MB
Free Space: 52% of total memory  (37MB)
Unmanaged Memory: 66% of total private memory (160MB)
  1. Do you think my diagnosis are correct based on the data in hand?
  2. If so, how can I solve this LOH Fragmentation problem? I have to process those strings and they are big strings. Should I find a way to split them up and process like that? In that case running regex etc. in split strings will be really challenging.
Castra answered 11/9, 2011 at 2:31 Comment(6)
One other possible solution: make a separate process that does the string processing, and use a new process for each string (or every 100K if that seems fine for you strings, etc). Each process starts with a clean slate. This is one of the reasons IIS recycles App Pools - fragmentation.Wendelin
@vcsjones, I thought of that before I actually that and overkill it :) I want to be sure this is the cause. I'm new to fine details of GC so I don't want hours and later on to find I didn't actually solved anything! The problem is reproducing the actual problem is pretty hard, it might take a day or two, if I'm lucky. So I kind of take profiler's word for it most of the time.Castra
You could try running the program in 64 bits mode. This would solve the problem because the virtual space is much bigger.Trichloroethylene
I know you asked 3.5, but look here: connect.microsoft.com/VisualStudio/feedback/details/521147/… The lead programmer says they partially fixed the problem in 4.0.Trichloroethylene
Do you know what happens of it's running on x64 but compiled for x86?Castra
I don't think it would change. You'll have to recompile... Or perhaps try 4.0 as I've written. (and you forgot to address your comment...)Trichloroethylene
W
2
  1. Yes. That sounds correct. The LOH is getting fragmented, which leads to the runtime being unable to allocate enough contiguous space for the large strings.

  2. You have a few options, I suppose doing which ever is easiest and effective is the one you should choose. That all depends entirely on how its written.

    1. Break your strings into small enough chunks that they are not in the LOH. (less than 85K - Note: the logic for when an object is put on the LOH isn't that cut-and-dry.) This will allow the GC to be able to reclaim the space. This is by no means guaranteed to fix fragmentation - it can definitely still happen otherwise. If you make the strings smaller, but still end up on the LOH - you'll be putting off the problem. It depends on how much more than 1 million strings you need to handle. The other downside is - you still have to load the string in memory to split it, so it ends up on the LOH anyway. You'd have the shrink the strings before your application even loads them. Kind of a Catch-22. EDIT: Gabe in the comments makes a point that if you can load your string into a StringBuilder first, under the covers it makes good effort to keep things out of the LOH (until you call ToString on it).

    2. Break the processing of the string out into a separate process. Use a process instead of a thread. Use each process to process say, 10K strings, then kill the process and start another. This way, each process starts with a clean slate. The advantage of this is it doesn't change your string processing logic (incase you can't make your strings smaller for processing), and avoids the catch-22 in #1. The downside is this requires probably a bigger change to your application, and coordinating the work between the master process and the slave processing process. The trick is the master can only tell it where the large string is, it can't give it to it directly, otherwise you are back to the catch-22.

Wendelin answered 11/9, 2011 at 2:55 Comment(1)
In general you can probably load your string into a StringBuilder which (at least in .NET 4.0) is very careful to avoid the LOH. This would make it easy to break up the strings. Unfortunately, you can't run a Regex on a StringBuilder and it may not be possible to run the Regex he needs on broken up strings either.Womanly

© 2022 - 2024 — McMap. All rights reserved.