Large String in Large Object Heap causes issues - but in any case it has to end up as a String
Asked Answered
J

3

7

I am following up from this question here

The problem I have is that I have some large objects coming from an MSMQ mainly Strings. I have narrowed down my memory problems to these objects being created in the Large Object Heap (LOH) and therefore fragmenting it (confirmed that with some help from the profiler).

In the question I posted above I got some workarounds mainly in the form of splitting up the String into char arrays which I did.

The problem I am facing is that at the end of the string processing (in whatever form that is) I need to send that string to another system which I have no control over. So I was thinking of the following solution to have this String placed in the LOH:

  1. Represent it as an array of char arrays less than 85k each (threshold of Objects to be placed in the LOH)
  2. Compress it on the sender end (i.e. before receiving it in the system we are talking about here which is the receiver) and decompress it only before passing it in the third party system.

Whatever I do - one way or another - the String will have to be complete (no char arrays or compressed).

Am I stuck here? I am thinking if using a managed environment was a mistake here and whether we should bite the bullet and go for a C++ kind of environment.

Thanks, Yannis

EDIT: I have narrowed down the problem to exactly the code posted here

The large string that comes through is placed in the LOH. I have removed every single processing module from point where i have received the message onwards and the memory consumption trend remains the same.

So I guess i need to change the way this WorkContext is passed around between systems.

Jem answered 17/10, 2011 at 7:25 Comment(18)
should you really be sending messages that are that large?Playbill
How are you sending the string to the other system? Can't you use streams? Also, using C++ might not help you, since its heap can get fragmented too.Affricative
Just to be sure... Have you tried using the server GC? https://mcmap.net/q/1470585/-c-gc-for-server/…Vidette
Could you stream the 'string' to the other system? Streaming would avoid having to have it in one continuous block of memory which is why it ends up in the LOHGreenman
I'll add that reading this connect.microsoft.com/VisualStudio/feedback/details/521147/… they tell the already "solved" this problem in 4.0 :-)Vidette
I think the key here is svick's question. If you need to get a fully assembled string, then you need to get a fully assembled string - if you can avoid that, great!Philipphilipa
Mitch - I dont think that Strings of 100-120k are considered large for Message queuing - I might be wrong though. xanatos - I have tried the server GC (which is enabled by default on multi-processor machines like mine). svick - If I choose to stream the String what intermediate infrastructure or system do you think I can use between the sender and the received instead of the current MSMQJem
I'll add that probably it's even worse than you are depicting here: your string is UTF-16. But probably your web service will transmit/receive an UTF-8 version of it, so you convert to string and the string is converted to byte array :-)Vidette
@Vidette - and thats why i find large byte[] in the LOH that I couldnt figure out where they are coming from - thanks for thisJem
@Jem I repeat my question: have you tried the server gc? It's much better for server apps. Much much better.Vidette
yeah i have when profiling - it didnt make any difference. Can you explain maybe how the server gc works with regards to objects in the LOH?Jem
@Jem There isn't any exact specification on how it works. But it's normally "better" as a GC for server load. And you said you had differences between your computer and the server. This can be: A) one of the two is 32 bits and the other is 64 bits or B) different GCVidette
How do you send this? over the network, unmanaged?? a string thats larger than "85k" seems like a lot.. you could try the string builder when building the string but i guess that doesn't help you..Tharp
added a comment in the original post that illustrates the problem and the message mechanismJem
You know that Message message = new Message(); message = _Queue.Receive(); the first new Message() is useless? One less object!Vidette
yeah :) i dont know how that got there :)Jem
I'll ask a stupid question... You know that Message is IDisposable, right? And you know the using pattern right?Vidette
@Vidette - the code i posted isnt actually copy/paste. it clearly contains other bits that i have omitted for the sake of simplicity. but yes i do understand what IDisposable does and i am using it in the actual part of the code that does the work.Jem
Q
0

You maybe could implement a class (call it LargeString), that reuses previously assigned strings and keeps a small collection of them.

Since strings normally are immutable, you'd have to do every change and new assignment by unsafe pointer juggling. After passing a string to the reciever, you'd need to manually mark it as free for reuse. Different message lengths might also be a problem, unless the reciever can cope with messages that are too long, or you have a collection of strings of every length.

Probably not a great idea, but maybe beats rewriting everything in C++.

Quitclaim answered 17/10, 2011 at 7:57 Comment(2)
Thanks Jens - The problem is this: The worker node gets work from a queue (example here: pastebin.com/j0VTVrjK). By the time I assign the LargeString (good idea!) I would have already allocated the WorkContext String. Or are you suggesting to change that to a char[] or something so that it doesnt go to the LOH and then re-use the LargeString structure?Jem
@Yannis: You'd need to avoid assigning your long messages to strings anywhere, of course. I think both of your suggested solutions would work. You could use LargeString just to pass the result to the other system.Quitclaim
G
1

Well your options depend on how the 3rd party system is receiving data. If you can stream to it somehow then you don't have to have it all in memory in one go. If that is the case then compressing (which will probably really help your network load if its easily compressible data) is great as you can decompress through a stream and punt it to the 3rd party system in chunks.

The same of course would work if you split your strings up to go below LoH threshold.

If not then I would still advocate splitting the payload on the MSMQ message, and then using a memory pool of prealloacted and reused byte arrays for the re-assembly before sending it to the client. Microsoft has an implementation you can use http://msdn.microsoft.com/en-us/library/system.servicemodel.channels.buffermanager.aspx

The final option I can think of, is to handle the msmq deserialisation in unmanaged code in C++ and create your own custom large block memory pool using placement new to deserialise the strings into that. You could keep it relatively simple by ensuring your pool buffers are sufficient for the longest message possible rather than trying to be clever and dynamic which is hard.

Gibby answered 21/10, 2011 at 0:29 Comment(0)
V
1

You can try streaming the values using a StringBuilder (the 4.0 version that uses a rope-like implementation).

This example must be executed in Release mode and with the Start Without Debugging attached (CTRL-F5). Both Debug mode and Start Debugging mess too much with the GC.

public class SerializableWork
{
    // This is very often between 100-120k bytes. This is actually a String - not just for the purposes of this example
    public String WorkContext { get; set; }

    // This is quite large as well but usually less than 85k bytes. This is actually a String - not just for the purposes of this example
    public String ContextResult { get; set; }
}

class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Initial memory: {0}", GC.GetTotalMemory(true));
        var sw = new SerializableWork { WorkContext = new string(' ', 1000000), ContextResult = new string(' ', 1000000) };
        Console.WriteLine("Memory with objects: {0}", GC.GetTotalMemory(true));

        using (var mq = new MessageQueue(@".\Private$\Test1"))
        {
            mq.Send(sw);
        }

        sw = null;

        Console.WriteLine("Memory after collect: {0}", GC.GetTotalMemory(true));

        using (var mq = new MessageQueue(@".\Private$\Test1"))
        {
            StringBuilder sb1, sb2;

            using (var msg = mq.Receive())
            {
                Console.WriteLine("Memory after receive: {0}", GC.GetTotalMemory(true));

                using (var reader = XmlTextReader.Create(msg.BodyStream))
                {
                    reader.ReadToDescendant("WorkContext");
                    reader.Read();

                    sb1 = ReadContentAsStringBuilder(reader);

                    reader.ReadToFollowing("ContextResult");
                    reader.Read();

                    sb2 = ReadContentAsStringBuilder(reader);

                    Console.WriteLine("Memory after creating sb: {0}", GC.GetTotalMemory(true));
                }
            }

            Console.WriteLine("Memory after freeing mq: {0}", GC.GetTotalMemory(true));

            GC.KeepAlive(sb1);
            GC.KeepAlive(sb2);
        }

        Console.WriteLine("Memory after final collect: {0}", GC.GetTotalMemory(true));
    }

    private static StringBuilder ReadContentAsStringBuilder(XmlReader reader)
    {
        var sb = new StringBuilder();
        char[] buffer = new char[4096];

        int read;

        while ((read = reader.ReadValueChunk(buffer, 0, buffer.Length)) != 0)
        {
            sb.Append(buffer, 0, read);
        }

        return sb;
    }
}

I read directly the Message.BodyStream of the message in an XmlReader and then I go to the elements I need and I read the data in chunks using XmlReader.ReadValueChunk

In the end nowhere I use string objects. The only big block of memory is the Message.

Vidette answered 21/10, 2011 at 8:53 Comment(0)
Q
0

You maybe could implement a class (call it LargeString), that reuses previously assigned strings and keeps a small collection of them.

Since strings normally are immutable, you'd have to do every change and new assignment by unsafe pointer juggling. After passing a string to the reciever, you'd need to manually mark it as free for reuse. Different message lengths might also be a problem, unless the reciever can cope with messages that are too long, or you have a collection of strings of every length.

Probably not a great idea, but maybe beats rewriting everything in C++.

Quitclaim answered 17/10, 2011 at 7:57 Comment(2)
Thanks Jens - The problem is this: The worker node gets work from a queue (example here: pastebin.com/j0VTVrjK). By the time I assign the LargeString (good idea!) I would have already allocated the WorkContext String. Or are you suggesting to change that to a char[] or something so that it doesnt go to the LOH and then re-use the LargeString structure?Jem
@Yannis: You'd need to avoid assigning your long messages to strings anywhere, of course. I think both of your suggested solutions would work. You could use LargeString just to pass the result to the other system.Quitclaim

© 2022 - 2024 — McMap. All rights reserved.