Efficient code: short vs integer data types in VB.Net
Asked Answered
G

2

6

I'm writing an application where performance is fairly critical. I'm a bit confused as to which is the most efficient data type for x64 CPUs.

MDSN says that "In some cases, the common language runtime can pack your Short variables closely together and save memory consumption." but also that "The Integer data type provides optimal performance on a 32-bit processor"

I'm using a huge amount of data (average around 5 million values in a jagged array[10 or more][30][128,128]) to generate bitmaps in real time (heat maps of the data values). All of the data points are whole numbers between 200 and 3500 so I can use short or integer. Which would be most efficient?

Thanks.

Gregoriagregorian answered 23/9, 2015 at 9:47 Comment(8)
What means "huge"? Are you at risk of getting an OutOfMemoryException with integers? Otherwise use integers, a CPU is designed to work efficiently with 32-bit values.Soap
On average 128 * 128 * 30 * 10 data values (4915200) in a jagged array. Memory use is OK, about 230 MB on my machine which is average spec for the company. The reason I'm asking is because I'm doing real time manipulation of the images (changing hues and so on) so I need it to be as efficient as possible.Gregoriagregorian
If you want a smaller memory footprint for the raw data then use short. If you want faster rendering of the images another data type might be the correct choice, but there is no code to examine that shows how the data is used.Sonja
Thanks for the reply. I can't exact post code in this context as it will be commercially sensitive but basically I'm taking a range of 128 x 128 cells from Excel holding values in the range 200 - 3500 and converting to RGB values with some simple formulas to colour pixels in bitmaps of 128 x 128 pixels, the approach is similar to the color-scale conditional formatting in Excel. I'm much more concerned with rendering speed than memory footprint.Gregoriagregorian
I wonder how there is any possible way that you cannot find this out by yourself. Simply try it both ways, use Stopwatch to measure. With non-zero odds that it just doesn't make any noticeable difference because the real cost is getting that much data out of an Excel spreadsheet.Mellifluous
It's a complicated beast already, I don't fancy changing data types for dozens of deeply interwoven variables without a bit of advice first.Gregoriagregorian
I'm going to suggest that you use the size of integer that is native for the registers in your CPU. That will most likely be best for processing performance as there is no thunking between shorter and longer types. If memory efficiency is important then use short, but actually measure the memory usage to see if it makes a difference.Borlase
Now I am confused. Doesn't RGB use a byte for each of the channels? Or integer?Sonja
C
1

The Int32 type is most efficient for regular variables, for example loop counters, both in 32 bit and 64 bit applications.

When you handle large arrays of data the efficiency of reading/writing a single value doesn't matter much, what matters is to access the data so that you get as few memory cache misses as possible. A memory cache miss is very expensive compared to an access to cached memory. (Also, a page fault (memory swapped to disk) is very expensive compared to a memory cache miss.)

To avoid cache misses you can store the data as compact as possible, and when you process the data you can access it as linearly as possible so that the memory area that you access is as small as possible.

Using Int16 is most likely to be more efficient than Int32 for any array large enough to span multiple cache blocks, and a cache block is generally just a few kilobytes.

As your values are possible to store in just 12 bits, it might even be more efficient to store each value in 1.5 bytes eventhough that means more processing to handle the data. The reduction of 25% of the data size might more than make up for the extra processing.

Colston answered 23/9, 2015 at 10:23 Comment(2)
Thanks for the reply. I use a loop (stepping 1) for each dimension of the array to iterate the elements so I assume this leads to fairly linear access. Can you please clarify "it might even be more efficient to store each value in 1.5 bytes" - how would you do this? Given the variety of answers I think it's best to road test each option :)Gregoriagregorian
@Absinthe: As there obviously isn't a 1.5 byte data type, you would store a value in parts of two bytes, or several values in a larger data type. You could store the 12 bit values a and b into three bytes: aaaaaaaa aaaabbbb bbbbbbbb. You could also store five values (60 bits) in an Int64 (8 bytes) with four unused bits. To read a 12 bit value value from the Int64 array you would use (arr[i / 5] >> ((i % 5) * 12)) & 0xFFF.Colston
F
-1

As a general rule, the less memory a variable uses, the faster it will be processed and you will have better memory management because your application will use less amount of it.

Short only needs the half of memory integer needs, if you only need a 16 bits number and you are sure it will never be bigger, use Short.

Fructification answered 23/9, 2015 at 10:4 Comment(2)
Thanks for the reply. Would have been nice of who rated this down to explain why.Gregoriagregorian
Not true... If the hardware reads X bits at a time into an X bit register, and memory is packed... then needing only 1/2 X of the bits means that the other 1/2 must be cleared, and perhaps shifted.Spenserian

© 2022 - 2025 — McMap. All rights reserved.