Memorystream and Large Object Heap
Asked Answered
D

4

16

I have to transfer large files between computers on via unreliable connections using WCF.

Because I want to be able to resume the file and I don't want to be limited in my filesize by WCF, I am chunking the files into 1MB pieces. These "chunk" are transported as stream. Which works quite nice, so far.

My steps are:

  1. open filestream
  2. read chunk from file into byte[] and create memorystream
  3. transfer chunk
  4. back to 2. until the whole file is sent

My problem is in step 2. I assume that when I create a memory stream from a byte array, it will end up on the LOH and ultimately cause an outofmemory exception. I could not actually create this error, maybe I am wrong in my assumption.

Now, I don't want to send the byte[] in the message, as WCF will tell me the array size is too big. I can change the max allowed array size and/or the size of my chunk, but I hope there is another solution.

My actual question(s):

  • Will my current solution create objects on the LOH and will that cause me problem?
  • Is there a better way to solve this?

Btw.: On the receiving side I simple read smaller chunks from the arriving stream and write them directly into the file, so no large byte arrays involved.

Edit:

current solution:

for (int i = resumeChunk; i < chunks; i++)
{
 byte[] buffer = new byte[chunkSize];
 fileStream.Position = i * chunkSize;
 int actualLength = fileStream.Read(buffer, 0, (int)chunkSize);
 Array.Resize(ref buffer, actualLength);
 using (MemoryStream stream = new MemoryStream(buffer)) 
 {
  UploadFile(stream);
 }
}
Debouch answered 12/5, 2010 at 13:15 Comment(0)
B
41

I hope this is okay. It's my first answer on StackOverflow.

Yes absolutely if your chunksize is over 85000 bytes then the array will get allocated on the large object heap. You will probably not run out of memory very quickly as you are allocating and deallocating contiguous areas of memory that are all the same size so when memory fills up the runtime can fit a new chunk into an old, reclaimed memory area.

I would be a little worried about the Array.Resize call as that will create another array (see http://msdn.microsoft.com/en-us/library/1ffy6686(VS.80).aspx). This is an unecessary step if actualLength==Chunksize as it will be for all but the last chunk. So I would as a minimum suggest:

if (actualLength != chunkSize) Array.Resize(ref buffer, actualLength);

This should remove a lot of allocations. If the actualSize is not the same as the chunkSize but is still > 85000 then the new array will also be allocated on the Large object heap potentially causing it to fragment and possibly causing apparent memory leaks. It would I believe still take a long time to actually run out of memory as the leak would be quite slow.

I think a better implementation would be to use some kind of Buffer Pool to provide the arrays. You could roll your own (it would be too complicated) but WCF does provide one for you. I have rewritten your code slightly to take advatage of that:

BufferManager bm = BufferManager.CreateBufferManager(chunkSize * 10, chunkSize);

for (int i = resumeChunk; i < chunks; i++)
{
    byte[] buffer = bm.TakeBuffer(chunkSize);
    try
    {
        fileStream.Position = i * chunkSize;
        int actualLength = fileStream.Read(buffer, 0, (int)chunkSize);
        if (actualLength == 0) break;
        //Array.Resize(ref buffer, actualLength);
        using (MemoryStream stream = new MemoryStream(buffer))
        {
            UploadFile(stream, actualLength);
        }
    }
    finally
    {
        bm.ReturnBuffer(buffer);
    }
}

this assumes that the implementation of UploadFile Can be rewritten to take an int for the no. of bytes to write.

I hope this helps

joe

Barry answered 18/5, 2010 at 13:29 Comment(5)
Thanks a lot! This is seriously a brillian answer. Thanks for pointing out the Array.Resize issue. I also never heard of the BufferManager, sounds like this will help me lot in other areas, too. This is a lot more than I expected so I thought about starting a small bounty and giving it to you but I have to wait 23h after starting a bounty... So you have to wait, too :)Debouch
Thanks for that. I'm really glad I can be of help. Let me know if there's anything else. Looking back at it, it might be worth pointing out that the optimum implementation would share a single instance of the BufferManager across the whole service. I don't know how practical that would be for you.Barry
+1 Just stumbled across this one looking for an answer to a similar issue. Never heard of BufferManager before - awesome! Guess this will be something to remember for the future.Brake
I think you can also do this: var stream = new MemoryStream(buffer, 0, actualLength)Quackery
I think the "if (actualLength != chunkSize)" is unnecessary... from the Array.Resize documentation " If newSize is equal to the Length of the old array, this method does nothing. "Iphlgenia
P
10

See also RecyclableMemoryStream. From this article:

Microsoft.IO.RecyclableMemoryStream is a MemoryStream replacement that offers superior behavior for performance-critical systems. In particular it is optimized to do the following:

  • Eliminate Large Object Heap allocations by using pooled buffers
  • Incur far fewer gen 2 GCs, and spend far less time paused due to GC
  • Avoid memory leaks by having a bounded pool size
  • Avoid memory fragmentation
  • Provide excellent debuggability
  • Provide metrics for performance tracking
Placid answered 4/3, 2017 at 17:53 Comment(0)
S
2

I'm not so sure about the first part of your question but as for a better way - have you considered BITS? It allows background downloading of files over http. You can provide it a http:// or file:// URI. It is resumable from the point that it was interrupted and downloads in chunks of bytes using the RANGE method in the http HEADER. It is used by Windows Update.You can subscribe to events that give information on progress and completion.

Struble answered 12/5, 2010 at 13:31 Comment(2)
Thanks for your suggestion, but I don't want to install IIS on every machine.Debouch
No problem, just a thought. Just to point out you only need IIS on every machine if they are uploading. If the clients are only downloading using BITS then they do not require IIS.Struble
D
1

I have come up with another solution for this, let me know what you think!

Since I don't want to have large amounts of data in the memory I was looking for an elegant way to temporary store byte arrays or a stream.

The idea is to create a temp file (you don't need specific rights to do this) and then use it similar to a memory stream. Making the class Disposable will clean up the temp file after it has been used.

public class TempFileStream : Stream
{
  private readonly string _filename;
  private readonly FileStream _fileStream;

  public TempFileStream()
  {
     this._filename = Path.GetTempFileName();
     this._fileStream = File.Open(this._filename, FileMode.OpenOrCreate, FileAccess.ReadWrite);
  }

  public override bool CanRead
  {
   get
    {
    return this._fileStream.CanRead;
    }
   }

// and so on with wrapping the stream to the underlying filestream

...

    // finally overrride the Dispose Method and remove the temp file     
protected override void Dispose(bool disposing)
  {
      base.Dispose(disposing);

  if (disposing)
  {
   this._fileStream.Close();
   this._fileStream.Dispose();

   try
   {
      File.Delete(this._filename);
   }
   catch (Exception)
   {
     // if something goes wrong while deleting the temp file we can ignore it.
   }
  }
Debouch answered 3/7, 2012 at 8:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.