In C#/.NEt does a dynamic type take less space than object?
Asked Answered
S

4

10

I have a console application that allows the users to specify variables to process. These variables come in three flavors: string, double and long (with double and long being by far the most commonly used types). The user can specify whatever variables they like and in whatever order so my system has to be able to handle that. To this end in my application I had been storing these as object and then casting/uncasting them as required. for example:

public class UnitResponse
{
    public object Value { get; set; }
}

My understanding was that boxed objects take up a bit more memory (about 12 bytes) than a standard value type.

My question is: would it be more efficient to use the dynamic keyword to store these values? It might get around the boxing/unboxing issue, and if it is more efficient how would this impact performance?

EDIT

To provide some context and prevent the "are you sure you're using enough RAM to worry about this" in my worst case I have 420,000,000 datapoints to worry about (60 variables * 7,000,000 records). This is in addition to a bunch of other data I keep about each variable (including a few booleans, etc.). So reducing memory does have a HUGE impact.

Sigridsigsmond answered 27/1, 2011 at 23:33 Comment(13)
Have you done any profiling? Is this boxing/unboxing actually a bottleneck? Is your program's RAM usage ballooning? Or are you micro-optimizing?Cioban
If there was such a simple way to avoid boxing as you seem to think, why do you think boxing exists in the first place?Johannisberger
@cdhowie: My program consumes copious amounts of RAM, in some cases 20-30GB. Reducing memory usage is imperative.Sigridsigsmond
@Timwi: dynamic didn't exist until .NET4 so it came in well after boxing.Sigridsigsmond
I'd personally be keeping my seven million records in a database, and letting the SQL Server team worry about how to optimize memory usage. Is there some reason why that's not an option?Knickers
@Jeffrey: You missed my point. If there was a simple way to avoid boxing and achieve better memory and performance characteristics, then boxing would never have been invented. But it was, and for good reason. In order to deny this, you’d have to believe that they came up with some new way of doing it in C# 4.0 which they somehow didn’t think of earlier.Johannisberger
@Eric. Unfortunately there is a reason the records are kept in memory, performance. The program does donor imputation of invalid/inconsistent responses. It does this by, for every unit that requires imputation, randomly searching in a ripple pattern from the failed unit out. Imagine then a dataset with the population of Canada (36,000,000+) records and 25% of whom require imputation. That's a LOT of ripple searching. I have evaluated using a cache and an object-oriented database to store the records but it just seems too slow. Any other thoughts?Sigridsigsmond
@Timwi: that is exactly what I was hoping actually. I'm not very familiar with how types are managed in memory in dynamic languages but I was hoping that the best minds in the world had found a way to tackle the problem. My pappy always told me it never hurt to ask :)Sigridsigsmond
@Jeffrey: Is spending $2K to build a custom box with 24GB or RAM an option?Liselisetta
@Brian: We are running it on a virtual server right now, 4CPU with 32GB of RAM. We can upgrade to more if necessary.Sigridsigsmond
@Jeffrey: I think you missed Eric’s point. You say you need performance. Well guess what the optimiser in an SQL engine does. That’s right, it optimises for performance. And in my 5 years of working with MSSQL I have continually been impressed with its ability to do that. If you have 32GB RAM and you access the same 36 million records constantly, then the SQL optimiser will keep them in memory.Johannisberger
@Timwi: I apologize but this is the second time you say I missed a point. I must be quite obtuse. Unfortunately, it is you who doesn't get it. MSSQL might be quite optimized (though that is very open to interpretation) but using it would open up a whole new bottleneck, moving data back and forth from SQL. The data I'm talking about is in object form, with various properties representing what has happened to the record. To use MSSQL as the backened I would need to use NHibernate or some other tool to map the objects. This is VERY slow ...Sigridsigsmond
... especially since NHibernate/MSSQL would be sending individual INSERT/UPDATE/SELECT statements on every random access. I have thought of using an OODB like db4o (mentioned in another comment) because they have better serializers (and caching like MSSQL). To test I simply tried loading all of the data in a single run in to the database. It took 8X longer than reading in from flat files. Add on top of that random access and writing and you have yourself quite a bottleneck.Sigridsigsmond
K
19

OK, so the real question here is "I've got a freakin' enormous data set that I am storing in memory, how do I optimize its performance in both time and memory space?"

Several thoughts:

  • You are absolutely right to hate and fear boxing. Boxing has big costs. First, yes, boxed objects take up extra memory. Second, boxed objects get stored on the heap, not on the stack or in registers. Third, they are garbage collected; every single one of those objects has to be interrogated at GC time to see if it contains a reference to another object, which it never will, and that's a lot of time on the GC thread. You almost certainly need to do something to avoid boxing.

Dynamic ain't it; it's boxing plus a whole lot of other overhead. (C#'s dynamic is very fast compared to other dynamic dispatch systems, but it is not fast or small in absolute terms).

It's gross, but you could consider using a struct whose layout shares memory between the various fields - like a union in C. Doing so is really really gross and not at all safe but it can help in situations like these. Do a web search for "StructLayoutAttribute"; you'll find tutorials.

  • Long, double or string, really? Can't be int, float or string? Is the data really either in excess of several billion in magnitude or accurate to 15 decimal places? Wouldn't int and float do the job for 99% of the cases? They're half the size.

Normally I don't recommend using float over double because its a false economy; people often economise this way when they have ONE number, like the savings of four bytes is going to make the difference. The difference between 42 million floats and 42 million doubles is considerable.

  • Is there regularity in the data that you can exploit? For example, suppose that of your 42 million records, there are only 100000 actual values for, say, each long, 100000 values for each double, and 100000 values for each string. In that case, you make an indexed storage of some sort for the longs, doubles and strings, and then each record gets an integer where the low bits are the index, and the top two bits indicate which storage to get it out of. Now you have 42 million records each containing an int, and the values are stored away in some nicely compact form somewhere else.

  • Store the booleans as bits in a byte; write properties to do the bit shifting to get 'em out. Save yourself several bytes that way.

  • Remember that memory is actually disk space; RAM is just a convenient cache on top of it. If the data set is going to be too large to keep in RAM then something is going to page it back out to disk and read it back in later; that could be you or it could be the operating system. It is possible that you know more about your data locality than the operating system does. You could write your data to disk in some conveniently pageable form (like a b-tree) and be more efficient about keeping stuff on disk and only bringing it in to memory when you need it.

Knickers answered 29/1, 2011 at 16:51 Comment(5)
WOW! Thanks Eric, there are some great ideas on here. It's true, I may be able to get away with int and float rather than long and double, I'd have to check our data to ensure that this is a possibility (we do have at least one identifier that requires a long but it may not be used in the data). Given my limited space I will answer your questions in different comments:Sigridsigsmond
1. We do have some regularity in the data we could exploit, the most common value we work with is a coded value where there are often less than 100 distinct values. We could make up a lot of space hereSigridsigsmond
2. Storing the booleans in a byte is a genius idea, that should save some space and be easy to accomodate.Sigridsigsmond
3. The trouble with paging is the "ripple-search" algorithm we use. As soon as I get a failed unit that lands on a page boundary when I ripple search around it I get a lot of page thrashing. I have thought of using an LRU cache to get around this (since it would stop the thrashing by hitting the cache) but my problem has always been what do I do with the stuff that expires from the cache if I need to load it back into memory? I do have a defined search area so I suppose I could just make the cache big enough that I don't need to reload anything back into memory at all, ... food for thoughtSigridsigsmond
Oh, the StructLayoutAttribute looks interesting but, as you say, quite ugly. I'd consider it, but only as a last resort.Sigridsigsmond
K
14

I think you might be looking at the wrong thing here. Remember what dynamic does. It starts the compiler again, in process, at runtime. It loads hundreds of thousands of bytes of code for the compiler, and then at every call site it emits caches that contain the results of the freshly-emitted IL for each dynamic operation. You're spending a few hundred thousand bytes in order to save eight. That seems like a bad idea.

And of course, you don't save anything. "dynamic" is just "object" with a fancy hat on. "Dynamic" objects are still boxed.

Knickers answered 27/1, 2011 at 23:44 Comment(2)
I didn't know that dynamic objects were boxed underneath. After I read this I did some quick tests with a sample project and saw that they were boxed underneath. Is there anywhere it indicates this in the documentation? Thanks!Sigridsigsmond
@Jeffrey: I refer you to section 4.7 of the C# 4 specification, which states "The type dynamic is indistinguishable from object at run-time."Knickers
C
3

No. dynamic has to do with how operations on the object are performed, not how the object itself is stored. In this particular context, value types would still be boxed.

Also, is all of this effort really worth 12 bytes per object? Surely there's a better use for your time than saving a few kilobytes (if that) of RAM? Have you proved that RAM usage by your program is actually an issue?

Cioban answered 27/1, 2011 at 23:39 Comment(2)
12 bytes per object seems small, but when you have 420,000,000 of them (as I do in some scenarios) then the difference becomes significant. Add into that that for each datapoint (12 bytes per object) I need to keep several boolean values and some references and you have a ton of memory. In some tests we have already broached 8GB of RAM used.Sigridsigsmond
Ok, just checking. If this is the case then you should consider using generics or strongly-typed data structures instead.Cioban
C
2

No. Dynamic will simply store it as an Object.

Chances are this is a micro optimization that will provide little to no benefit. If this really does become an issue then there are other mechanisms you can use (generics) to speed things up.

Cultch answered 27/1, 2011 at 23:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.