Put GC on hold during a section of code
Asked Answered
G

3

19

Is there a way to put the GC on hold completely for a section of code? The only thing I've found in other similar questions is GC.TryStartNoGCRegion but it is limited to the amount of memory you specify which itself is limited to the size of an ephemeral segment.

Is there a way to bypass that completely and tell .NET "allocate whatever you need, don't do GC period" or to increase the size of segments? From what I found it is at most 1GB on a many core server and this is way less than what I need to allocate yet I don't want GC to happen (I have up to terabytes of free RAM and there are thousands of GC spikes during that section, I'd be more than happy to trade those for 10 or even 100 times the RAM usage).

Edit:

Now that there's a bounty I think it's easier if I specify the use case. I'm loading and parsing a very large XML file (1GB for now, 12GB soon) into objects in memory using LINQ to XML. I'm not looking for an alternative to that. I'm creating millions of small objects from millions of XElements and the GC is trying to collect non-stop while I'd be very happy keeping all that RAM used up. I have 100s of GBs of RAM and as soon as it hits 4GB used, the GC starts collecting non-stop which is very memory friendly but performance unfriendly. I don't care about memory but I do care about performance. I want to take the opposite trade-off.

While i can't post the actual code here is some sample code that is very close to the end code that may help those who asked for more information :

var items = XElement.Load("myfile.xml")
.Element("a")
.Elements("b") // There are about 2 to 5 million instances of "b"
.Select(pt => new
{
    aa = pt.Element("aa"),
    ab = pt.Element("ab"),
    ac = pt.Element("ac"),
    ad = pt.Element("ad"),
    ae = pt.Element("ae")
})
.Select(pt => new 
{
    aa = new
    {
        aaa = double.Parse(pt.aa.Attribute("aaa").Value),
        aab = double.Parse(pt.aa.Attribute("aab").Value),
        aac = double.Parse(pt.aa.Attribute("aac").Value),
        aad = double.Parse(pt.aa.Attribute("aad").Value),
        aae = double.Parse(pt.aa.Attribute("aae").Value)
    },
    ab = new
    {
        aba = double.Parse(pt.aa.Attribute("aba").Value),
        abb = double.Parse(pt.aa.Attribute("abb").Value),
        abc = double.Parse(pt.aa.Attribute("abc").Value),
        abd = double.Parse(pt.aa.Attribute("abd").Value),
        abe = double.Parse(pt.aa.Attribute("abe").Value)
    },
    ac = new
    {
        aca = double.Parse(pt.aa.Attribute("aca").Value),
        acb = double.Parse(pt.aa.Attribute("acb").Value),
        acc = double.Parse(pt.aa.Attribute("acc").Value),
        acd = double.Parse(pt.aa.Attribute("acd").Value),
        ace = double.Parse(pt.aa.Attribute("ace").Value),
        acf = double.Parse(pt.aa.Attribute("acf").Value),
        acg = double.Parse(pt.aa.Attribute("acg").Value),
        ach = double.Parse(pt.aa.Attribute("ach").Value)
    },
    ad1 = int.Parse(pt.ad.Attribute("ad1").Value),
    ad2 = int.Parse(pt.ad.Attribute("ad2").Value),
    ae = new double[]
    {
        double.Parse(pt.ae.Attribute("ae1").Value),
        double.Parse(pt.ae.Attribute("ae2").Value),
        double.Parse(pt.ae.Attribute("ae3").Value),
        double.Parse(pt.ae.Attribute("ae4").Value),
        double.Parse(pt.ae.Attribute("ae5").Value),
        double.Parse(pt.ae.Attribute("ae6").Value),
        double.Parse(pt.ae.Attribute("ae7").Value),
        double.Parse(pt.ae.Attribute("ae8").Value),
        double.Parse(pt.ae.Attribute("ae9").Value),
        double.Parse(pt.ae.Attribute("ae10").Value),
        double.Parse(pt.ae.Attribute("ae11").Value),
        double.Parse(pt.ae.Attribute("ae12").Value),
        double.Parse(pt.ae.Attribute("ae13").Value),
        double.Parse(pt.ae.Attribute("ae14").Value),
        double.Parse(pt.ae.Attribute("ae15").Value),
        double.Parse(pt.ae.Attribute("ae16").Value),
        double.Parse(pt.ae.Attribute("ae17").Value),
        double.Parse(pt.ae.Attribute("ae18").Value),
        double.Parse(pt.ae.Attribute("ae19").Value)
    }
})
.ToArray();
Gearard answered 16/5, 2016 at 20:31 Comment(30)
Why do you want to put it on hold? GC is designed to only run when it needs to, and to run it times of low activity if possible, so if you're thinking it will improve performance you are probably mistaken. In fact, it may be worse because memory will likely build up quickly, increasing the amount of paging and fragmentation.Outstay
@DStanley Because there's no "low activity" in my case, it's a single continuous process that goes from point A to B and has to get from A to B as fast as possible , with no concern for anything else than going from A to B in that time (not a server, not something with multiple users etc) and because there is no memory pressure, i'm very fine with memory building up quickly as i have a "lot" of memory free (128GB in dev, potentially 2TB in production, the GC is spiking like crazy with only 7GB used up in the process)Gearard
Maybe this could help you: #6006365Nonattendance
Is this not a duplicate, #6006365?Twopenny
@NikBo I've already seen this answer (is where i found the TryStartNoGCRegion). The first answer is for Gen 2 objects (long lasting). I could try but i don't think that's the issue here but that's there's a lot being added to gen 0 / 1 all the time and the GC is spending a lot of time scaning them and promoting objectsGearard
@Twopenny not exactly, by title it seems it is but in content it is not, the need he has is very different. In his question he's trying to avoid stops at all costs for fast latency, i don't care about that at all, i'm in a situation where i don't want to stop the GC because it's "a critical time where my program shouldn't stop" but instead because i don't want GC at all as i want to trade performance (not latency) in exchange for memory. So i'm not looking at a quick pause but for a way to tell .NET "do the oposite of what you normally do, ignore memory, it's fine if it fills up"Gearard
To be clear i want to be in that state for tens of minutes maybe up to hours, so the goal is really different, if there are 2TB free i'm looking for a way to tell the GC "use up all of that before you even think of triggering". I know the GC will trigger no matter what if you go further than a segment size in allocation which is why i need to find if i can inrease those segment sizes (4 to 1GB only depending on proc count on servers, i'd like to increase it to something more sensible for my use case like 64 or 128 GB)Gearard
I want 100s of GB's of RAM too!Aureus
I think you can find something useful to your problem, here in this article: infoq.com/articles/Big-Memory-Part-1 and second part : infoq.com/articles/Big-Memory-Part-2Monodrama
Perhaps you've picked the wrong tool for the job then? If you want explicit control over memory, pick an (unmanaged) language where you have that control (but also, responsibilities)Dicho
If the variables that referenced the objects didn't go out of scope then there would be no garbage collection.Twopenny
@Dicho No it's clearly the right tool for the job, it works is readable and maintainable in one single big linq query and the 50 or so lines of linq to xml / Anonymous objects creation. Even with the slowdown of GC trigering all the time it is the best tool for the job, i'd just like to turn it into an even better toolGearard
@Twopenny Quite a few may go out of scope (intermediary select statement returns would go out of scope pretty much as soon as they are created , well as soon as they are consumed by the next step in the select i guess)Gearard
@ArkadiuszK None of those solutions would help me as i'm not the one allocating the data so i can't preallocate / manage it, all i can hope for is a way to tell the GC "everything is fine, don't collect, don't check, just sleep untill i crash with OutOfMemory"Gearard
Tried to use "server" variant of GC by setting <gcServer enabled="true"/> in configuration file? It might have increased memory theshold, which might help a least a bit in your case.Blitz
As .Net use virtual memory adressing and that you don't have control about using paging file or RAM in the physical layer and this adressing is limited on 2 GB. If you don't do the following: Use at least .Net 4.5; target 64 bits plateform; use the <gcAllowVeryLargeObjects> element in your configuration file, the 12 GO will be never instead of soon ! Check theses points before playing with the garbage collector.Impute
@MarcoGuignard I'm already using 4.6 targeting X64 only and gcAllowVeryLargeObjects is enabled althought it has nothing to do with this at all. I have no issue using more ram (another part of this project uses up hundreds of GB of ram in .net sucessfully), i have issues with TIME spent in GC.Gearard
All configuration about the Garbage Collector in .net is here msdn.microsoft.com/en-us/library/bb384202(v=vs.110).aspx. The TryStartNoGCRegion is the "tool" given by Microsoft to prevent all GC operation in critical performance code section (not the whole application) and the LowLatency is value the "keep growing until Windows memory management start crying". There is also tuning option with GcServer and GcConcurrent. GcServer allow the use of deticated thread for the garbage collector, perhaps it could help.Impute
Are you responsible for loading the XML file? If so think about streamlining the parsing instead of parsing the XML as a whole: msdn.microsoft.com/en-us/library/bb387013(v=vs.100).aspxMachado
@Abbondanza This means using XMLReader instead of LINQ to XML which is a no go in terms of readability, i'm not looking for alternative solutions, just for a way to tell the GC to stop for a while, if it's not possible the best second case is exactly what i have right now (streaming the file wouldn't help anything, i still can't do anything before it's all in memory, it would help intermediary memory consumption which is exactly what i don't care about since memory is what i have plenty of)Gearard
You could still use Linq 2 XML -- check the example code.Machado
If i create the XElements myself sure but how does that differ at all from what XElement.Load() does in the slightest? It would return an IEnumerable instead of a materialized element but since i immediately materialize it it would still generate the same amount of GC and still trigger just as much GCing no?Gearard
@Abbondanza Also it would still mean i need to do the parsing of the xml myself (creating the nested levels) which is a lot of additional work for, i think, no benefit in my use case. Can you clarify how exactly you think this would help with the GC issue because i don't see it at all?Gearard
@RonanThibaudau, what do you with the items collection? Keep it in memory for later use? Or do you process the items one by one? In the latter case my guess is that streamlining the processing will keep your memory profile low (as enumerated items can be GC'ed after processing while not enumerated ones are not occupying memory) and thus not triggering the GC redundantly when process memory hits a certain limit.Machado
@RonanThibaudau, also this gives you the opportunity to hide file I/O operations with computations as both can happen asynchronously. Anyway I think your requirements are orthogonal: you want an easy-to-use tool that hides memory management yet you also want to manage memory (partially) on your own.Machado
@Abbondanza i don't care about hiding file io, nor hiding anything, as i already stated there is no ui nor server response, this is an autonomous process that just needs to be done as quickly as possible, adding asynchrony wouldn't bring anything as there is nothing else to do while waiting ln the IO. Also as already mentioned no i don't process items one by one, i need everything loaded in memory and i need the full result set. My requirements are not orthogonal at all, hell if i compiling the clr from source was easy and legal in production a two character change would fit all my needsGearard
@Abbondanza also i never mentioned that i wanted an "easy to use" tool. But even with this issue i can't see anything more fitting as i could switch to another language, fix that, and no longer fullfill all my other requirements that .net fills perfectlyGearard
Then consider a feature request to the .Net core team if you want to stick to C#.Machado
Maybe the problem is you are trying to perform some operations on all elements at once. Have you considered "paging" e.g. using Take() methodTrattoria
@Peuczyński because i'm not processing the elements themselves, i'm doing Something global on all of them at once, what i do past the query to format them requires every single byte of data from that query, at once, periodGearard
G
3

Currently the best i could find was switching to server GC (which changed nothing by itself) that has larger segment size and let me use a much larger number for no gc section :

        GC.TryStartNoGCRegion(10000000000); // On Workstation GC this crashed with a much lower number, on server GC this works

It goes against my expectations (this is 10GB, yet from what i could find in the doc online my segment size in my current setup should be 1 to 4GB so i expected an invalid argument).

With this setup i have what i wanted (GC is on hold, i have 22GB allocated instead of 7, all the temporary objects aren't GCed, but the GC runs once (a single time!) over the whole batch process instead of many many times per second (before the change the GC view in visual studio looked like a straight line from all the individual dots of GC triggering).

This isn't great as it won't scale (adding a 0 leads to a crash) but it's better than anything else i found so far.

Unless anyone finds out how to increase the segment size so that i can push this further or has a better alternative to completely halt the GC (and not just a certain generation but all of it) i will accept my own answer in a few days.

Gearard answered 23/5, 2016 at 3:22 Comment(3)
If those who downvoted made themselves known instead of just voting and going it would help me improve the answer for future readers instead of well, pondering what to do about it!Gearard
Some people's minds are not flexible enough to get into the details of why you want to prevent GC. They think you have chosen wrong language etc etc. I find its reasonable to stop GC in your case, dont bother too much about downvoters.Blitz
If it can make you feel better I got downvoted without a comment when I shared my knowledge q&a style #37310918 so I know how frustrating it can beTrattoria
T
3

I think the best solution in your case would be this piece of code I used in one of my projects some times ago

var currentLatencySettings = GCSettings.LatencyMode;   
GCSettings.LatencyMode = GCLatencyMode.LowLatency;

//your operations

GCSettings.LatencyMode = currentLatencySettings;

You are surpressing as much as you can (according to my knowledge) and you can still call GC.Collect() manually.

Look at the MSDN article here

Also, I would strongly suggest paging the parsed collection using LINQ Skip() and Take() methods. And finally joining the output arrays

Trattoria answered 19/5, 2016 at 13:57 Comment(3)
As mentioned in the comments my issue isn't gen 2 collection but that i'm creating many small objects (so gen 0 and 1), i don't think this would have any effect althought as i mentioned i will try it (server is busy on another task at the moment).Gearard
Awarded you the bounty as there were no more fitting answers, thanks!Gearard
Glad to hear that, sorry I couldn't be of more helpTrattoria
G
3

Currently the best i could find was switching to server GC (which changed nothing by itself) that has larger segment size and let me use a much larger number for no gc section :

        GC.TryStartNoGCRegion(10000000000); // On Workstation GC this crashed with a much lower number, on server GC this works

It goes against my expectations (this is 10GB, yet from what i could find in the doc online my segment size in my current setup should be 1 to 4GB so i expected an invalid argument).

With this setup i have what i wanted (GC is on hold, i have 22GB allocated instead of 7, all the temporary objects aren't GCed, but the GC runs once (a single time!) over the whole batch process instead of many many times per second (before the change the GC view in visual studio looked like a straight line from all the individual dots of GC triggering).

This isn't great as it won't scale (adding a 0 leads to a crash) but it's better than anything else i found so far.

Unless anyone finds out how to increase the segment size so that i can push this further or has a better alternative to completely halt the GC (and not just a certain generation but all of it) i will accept my own answer in a few days.

Gearard answered 23/5, 2016 at 3:22 Comment(3)
If those who downvoted made themselves known instead of just voting and going it would help me improve the answer for future readers instead of well, pondering what to do about it!Gearard
Some people's minds are not flexible enough to get into the details of why you want to prevent GC. They think you have chosen wrong language etc etc. I find its reasonable to stop GC in your case, dont bother too much about downvoters.Blitz
If it can make you feel better I got downvoted without a comment when I shared my knowledge q&a style #37310918 so I know how frustrating it can beTrattoria
S
1

I am not sure whether its possible in your case, however have you tried processing your XML file in parallel. If you can break down your XML file in smaller parts, you can spawn multiple processes from within your code. Each process handling a separate file. You can then combine all the results. This would certainly increase your performance and also with each process separately you will have its separate allocation of memory, which should also increase your memory allocation at a particular time while processing all the XML files.

Splenetic answered 23/5, 2016 at 5:6 Comment(8)
No multiple process wouldn't help, it's not an xml processor, i'm just loading the whole of it because i need all of it, as an in memory object graph (the whole xml is exported just for that use case)Gearard
@RonanThibaudau XML is text, and text processing isn't exactly known for its speed and memory efficiency (just think about the redundancy level in the tags alone). I bet 80% of the memory is wasted on XElement internals, and thousands of instances of copies of the same strings, and most of your processing time is spent on parsing. If the XML file is exported just for this, perhaps you could replace it with a custom binary serialization format - that should change things quite a bit. Speaking from experience here - when you start to have largish files custom serialization pays off.Langelo
@RonanThibaudau Could you please provide some details on how you use that in memory object graph.Splenetic
@LucasTrzesniewski I know the tradeoffs and i "do" want xml (the file must be human readable and clear, and the original graph from process A isn't in the same shape as the graph in process B so serialization would still mean having to do the conversion, i chose to have that as a clear xml intermediary format). Note that the performance is "just fine" for my need, and i don't care about memory being wasted which is the whole point of this thread, i have plenty of memory, i'm happy wasting it, i'm not happy with the GC trying to save some of it at the expense of performanceGearard
@tarunjindal I can't post the code if that's your question, if not what kind of details would help? I can tell you everything is a single immediately materialized linq query loading a single file from XElement.Load chaining into a select to select the appropriate elements / attribuets and yet another select to convert those into their non xml memory representation as a nested Anonymous object graph.Gearard
@RonanThibaudau I understand, but the GC gets in your way because you're wasting memory, so wasting less of it will reduce the problem you're trying to solve. I'm skeptical about the need for a 12GB file to be human readable - but if that's really the case, then you could come up with a more efficient custom text-based storage format, and an optimized serializer. Are you sure the performance will stay "just fine" when you scale up?Langelo
@LucasTrzesniewski xml isn't that wasteful and coming up with my own hierarchical format is not something i'm willing to allocate time for. XML has both human readability and great tools support and it is fast enough for my needs. In my case the tags / xml parts take up a small part of the total text content so i really doubt i could do better parsing and viewing than XML allows in a reasonable time frameGearard
@tarunjindal I added sample code that is very very similar to what i do, not sure if that helpsGearard

© 2022 - 2024 — McMap. All rights reserved.