If I use a MemoryStream for storing a half gigabyte chunk of data and then discard it what long term effect will it have?
Asked Answered
M

1

6

In my Azure role running C# code inside a 64 bit process I want to download a ZIP file and unpack it as fast as possible. I figured I could do the following: create a MemoryStream instance, download to that MemoryStream, then pass the stream to some ZIP handling library for unpacking and once unpacking is done discard the stream. This way I would get rid of write-read-write sequence that unnecessarily performs a lot of I/O.

However I've read that MemoryStream is backed by an array and with half gigabytes that array will definitely be considered a "large object" and will be allocated in a large object heap that doesn't compact on garbage collection. Which makes me worried that maybe this usage of MemoryStream will lead to fragmenting the process memory and negative long term effects.

Will this likely have any long-term negative effects on my process?

Marisolmarissa answered 9/8, 2012 at 11:16 Comment(5)
Cant you send the "download stream" directly to the ZIP handling library?Helbonnas
@Magnus: I guess no, AFAIK ZIP requires non-sequential reads while unpacking.Marisolmarissa
How often you'll need to read large zip files?Serrell
@devundef: Each time an Azure instance starts. Unpacking currently takes about two minutes and that time is wasted since the instance is not doing anything useful during that time.Marisolmarissa
So array pooling is not an option. The LOH has been improved in CLR 4.0: infoq.com/news/2011/10/loh-net-gcSerrell
D
1

The answer is in the accepted answer to the question you linked to. Thanks for providing the reference.

The real problem is assuming that a program should be allowed to consume all virtual memory at any time. A problem that otherwise disappears completely by just running the code on a 64-bit operating system.

I would say if this is a 64 bit process you have nothing to worry about.

The hole that is created only leads to fragmentation of the virtual address space of the LOH. Fragmentation here isn't a big problem for you. In a 64 bit process any whole pages wasted due to fragmentation will just become unused and the physical memory they were mapped to becomes available again to map a new page. Very few partial pages will be wasted because these are large allocations. And locality of reference (the other advantage of defragmentation) is mostly preserved, again because these are large allocations.

Dawson answered 9/8, 2012 at 11:35 Comment(5)
By the way, I still don't think this is the right solution to your original problem. You ought to be able to decompress the zip on the fly. Doesn't the download give you a stream? And you can decompress from a stream to a stream.Dawson
Are you aware of a library that can unpack a ZIP in one read and has a reasonably permissive license?Marisolmarissa
I'm coming from a Java world and I don't know all the equivalents but I know you can read a gzip stream in one pass. Any chance you could switch? Are there lots of files in the zip? Do you need to read all of them?Dawson
There's a whole tree of files and that's why I need ZIP - if that was a single file I would just store it as is.Marisolmarissa
If you don't even need compression, I recommend tar. You'll still need a thirdparty lib. Or you could make your own archive format: <filename>\0<file_length_long><file_bytes><filename>...Dawson

© 2022 - 2024 — McMap. All rights reserved.