Why use flags+bitmasks rather than a series of booleans?
Asked Answered
C

11

34

Given a case where I have an object that may be in one or more true/false states, I've always been a little fuzzy on why programmers frequently use flags+bitmasks instead of just using several boolean values.

It's all over the .NET framework. Not sure if this is the best example, but the .NET framework has the following:

public enum AnchorStyles
{
    None = 0,
    Top = 1,
    Bottom = 2,
    Left = 4,
    Right = 8
}

So given an anchor style, we can use bitmasks to figure out which of the states are selected. However, it seems like you could accomplish the same thing with an AnchorStyle class/struct with bool properties defined for each possible value, or an array of individual enum values.

Of course the main reason for my question is that I'm wondering if I should follow a similar practice with my own code.

So, why use this approach?

  • Less memory consumption? (it doesn't seem like it would consume less than an array/struct of bools)
  • Better stack/heap performance than a struct or array?
  • Faster compare operations? Faster value addition/removal?
  • More convenient for the developer who wrote it?
Carousel answered 10/9, 2009 at 17:12 Comment(3)
Not that I find that a strong argument, but it does consume less memory. It uses up an int (4 bytes) while each bool uses up one byte. So, 4 bools uses up the same as one int. 32 bools use up 32 bytes, while all those bools can be in the same enum. And if you go to not-recommended paths, you can make enums 8 bytes long (sizeof(long)).Instable
Thanks for clarifying that. It led me to this post: #295405Carousel
So from the responses it is clear that enum flags are more lightweight than structs/arrays of bools in terms of memory. However, it also seems that there are some .NET framework classes that would be well-suited to the task, such as BitVector32 or BitArray. What about a struct that uses a BitVector32 (backed by a uint) for storage and provides properties that get/set bits (as bools) at specific indices? Windows Forms seems to do this. More code for the developer, but it seems like it would perform well and the encapsulation would make it easier for downstream API consumers to use. hmm?Carousel
G
28

It was traditionally a way of reducing memory usage. So, yes, its quite obsolete in C# :-)

As a programming technique, it may be obsolete in today's systems, and you'd be quite alright to use an array of bools, but...

It is fast to compare values stored as a bitmask. Use the AND and OR logic operators and compare the resulting 2 ints.

It uses considerably less memory. Putting all 4 of your example values in a bitmask would use half a byte. Using an array of bools, most likely would use a few bytes for the array object plus a long word for each bool. If you have to store a million values, you'll see exactly why a bitmask version is superior.

It is easier to manage, you only have to deal with a single integer value, whereas an array of bools would store quite differently in, say a database.

And, because of the memory layout, much faster in every aspect than an array. It's nearly as fast as using a single 32-bit integer. We all know that is as fast as you can get for operations on data.

Grettagreuze answered 10/9, 2009 at 17:21 Comment(7)
Isn't getting a bool from Heap, and checking it. Faster than getting int from Heap, and running And and Or on it? bool -s don't need to be stored in array, they can be accessed trough own memory addresses, unlike bits in int.Beneath
@Beneath that bool is an int and while running AND on it maybe unnecessary (depends of the compiler) what if you needed 2 bools. Suddenly that int looks good. Also running the AND on it is a CPU operation, so fast you coulnd't imagine how quick it is. Getting a new bool from RAM is a week's nightmare of slowness, relatively speaking.Grettagreuze
@gbjbaans Bools are 8 bit. There is Stack, and CPU caching. Cache sizes, and memory throughput, are also continuously increasing. But its hard to compare the speed, bc errors are the same or larger amount as actual measurements.Beneath
The point is that with flags, you loose the And time regardless of the case, but Bools can be persisted in CPU registries, i guess i was tired af when wrote the 1st comment.Beneath
@Beneath last comment from me on this. You don't know how CPUs work. Look under the cover of your compiler.Grettagreuze
Im speaking in general sense, C#/.NET can run Unmanaged code. I know enough, have experience with C and Assembler, and repeatedly digging .NET internals.Beneath
Another point on this. When bools are passed to functions as parameters, they are put on CPU registries, logic which specifies values of the bools is identical. By passing them as flags, the compression and decompression are additional operations.Beneath
T
12
  • Easy setting multiple flags in any order.

  • Easy to save and get a serie of 0101011 to the database.

Tokoloshe answered 10/9, 2009 at 17:16 Comment(1)
Note that even as seperate columns, SQL Server will optimize these into a single byte: msdn.microsoft.com/en-us/library/ms177603.aspxRijeka
V
8

Among other things, its easier to add new bit meanings to a bitfield than to add new boolean values to a class. Its also easier to copy a bitfield from one instance to another than a series of booleans.

Voltage answered 10/9, 2009 at 17:17 Comment(5)
It seems to me that adding boolean values to a class is as easy as: bool newState; Regarding copying, it seems just as easy to copy a struct.Carousel
@Winston: Serialization format changes, and good serializers that accept default values for old data, and where old versions do not throw away unknown fields are hard to find. The binary interface changes, which may cause a chain of required updates, and requires full verisoning support for the structure. (of course the contract would have to state explicitely "unknown bits are ignored" or "unknown bits cause an error"). Also, on implementation level, handling them as a whole IS easier.Selfcontained
@Winston what if you have created an API? Then everybody that might upgrade to your new version would have to change there code because a new bool was added to a method. While if it was an enum then no changes on there end must be made to keep there same code using it. Which is why the .NET framework favors enums over booleans.Selhorst
@David - I wasn't talking about adding arguments to methods, but rather adding bool fields to structs, which wouldn't affect any calling methods, but to @Peter's point, would affect serialization.Carousel
Adding a new field to a C# struct/class is not a breaking change (excepting that previous code is unaware of it). In both cases existing code must be updated to be aware of how to use the new flag.Ontario
G
7

It can also make Methods clearer. Imagine a Method with 10 bools vs. 1 Bitmask.

Goto answered 10/9, 2009 at 17:23 Comment(0)
D
3

From a domain Model perspective, it just models reality better in some situations. If you have three booleans like AccountIsInDefault and IsPreferredCustomer and RequiresSalesTaxState, then it doesnn't make sense to add them to a single Flags decorated enumeration, cause they are not three distinct values for the same domain model element.

But if you have a set of booleans like:

 [Flags] enum AccountStatus {AccountIsInDefault=1, 
         AccountOverdue=2 and AccountFrozen=4}

or

  [Flags] enum CargoState {ExceedsWeightLimit=1,  
         ContainsDangerousCargo=2, IsFlammableCargo=4, 
         ContainsRadioactive=8}

Then it is useful to be able to store the total state of the Account, (or the cargo) in ONE variable... that represents ONE Domain Element whose value can represent any possible combination of states.

Dobbin answered 10/9, 2009 at 17:44 Comment(0)
I
2

I would suggest never using enum flags unless you are dealing with some pretty serious memory limitations (not likely). You should always write code optimized for maintenance.

Having several boolean properties makes it easier to read and understand the code, change the values, and provide Intellisense comments not to mention reduce the likelihood of bugs. If necessary, you can always use an enum flag field internally, just make sure you expose the setting/getting of the values with boolean properties.

Interbrain answered 10/9, 2009 at 17:23 Comment(0)
S
2

Raymond Chen has a blog post on this subject.

Sure, bitfields save data memory, but you have to balance it against the cost in code size, debuggability, and reduced multithreading.

As others have said, its time is largely past. It's tempting to still do it, cause bit fiddling is fun and cool-looking, but it's no longer more efficient, it has serious drawbacks in terms of maintenance, it doesn't play nicely with databases, and unless you're working in an embedded world, you have enough memory.

Sesquiplane answered 10/9, 2009 at 17:24 Comment(1)
Raymond is talking about bitfields, not bitmasks.Grettagreuze
M
1

Actually, it can have a better performance, mainly if your enum derives from an byte. In that extreme case, each enum value would be represented by a byte, containing all the combinations, up to 256. Having so many possible combinations with booleans would lead to 256 bytes.

But, even then, I don't think that is the real reason. The reason I prefer those is the power C# gives me to handle those enums. I can add several values with a single expression. I can remove them also. I can even compare several values at once with a single expression using the enum. With booleans, code can become, let's say, more verbose.

Monitorial answered 10/9, 2009 at 17:19 Comment(2)
There are 256 combinations, but only 8 flags. Don't confuse them.Mathematics
256 combinations using bool? It is 8 bool values. 8 bool values is not 256 bytes.Antihelix
K
1
  1. Space efficiency - 1 bit
  2. Time efficiency - bit comparisons are handled quickly by hardware.
  3. Language independence - where the data may be handled by a number of different programs you don't need to worry about the implementation of booleans across different languages/platforms.

Most of the time, these are not worth the tradeoff in terms of maintance. However, there are times when it is useful:

  1. Network protocols - there will be a big saving in reduced size of messages
  2. Legacy software - once I had to add some information for tracing into some legacy software.

Cost to modify the header: millions of dollars and years of effort. Cost to shoehorn the information into 2 bytes in the header that weren't being used: 0.

Of course, there was the additional cost in the code that accessed and manipulated this information, but these were done by functions anyways so once you had the accessors defined it was no less maintainable than using Booleans.

Kazantzakis answered 10/9, 2009 at 17:52 Comment(1)
1. Space efficiency only applies in very dense-packing or extremely limited environments; 2. Time efficiency depends on efficient use of the mask (and it is surely not faster to mask-and-compare a single bit than it is to compare a single boolean value); 3. Not applicable, using a boolean type incorrectly is using a boolean type incorrectly.Ontario
A
-1

It is for speed and efficiency. Essentially all you are working with is a single int.

if ((flags & AnchorStyles.Top) == AnchorStyles.Top)
{
    //Do stuff
} 
Agency answered 10/9, 2009 at 17:16 Comment(5)
That's a pretty high-level answer. Can you be specific about what operations are faster/more efficient and why? Or link to an article that justifies your claim?Carousel
Do I really need to give you proof that working with native types and simple logic expressions is fast and efficient?Agency
Don't forget the order of operations. You have to put parenthesis around the bitwise operation there.Monique
Nice catch I am used to Visual Studio having my back.Agency
-1 No justification is given for "speed and efficiency" and in this case I suspect that it is neither. Consider the counter-style/proposal of: if (AnchorsTop) { .. }.Ontario
I
-1

I have seen answers like Time efficiency and compatibility. those are The Reasons, but I do not think it is explained why these are sometime necessary in times like ours. from all answers and experience of chatting with other engineers I have seen it pictured as some sort of quirky old time way of doing things that should just die because new way to do things are better. Yes, in very rare case you may want to do it the "old way" for performance sake like if you have the classic million times loop. but I say that is the wrong perspective of putting things.

While it is true that you should NOT care at all and use whatever C# language throws at you as the new right-way™ to do things (enforced by some fancy AI code analysis slaping you whenever you do not meet their code style), you should understand deeply that low level strategies aren't there randomly and even more, it is in many cases the only way to solve things when you have no help from a fancy framework. your OS, drivers, and even more the .NET itself(especially the garbage collector) are built using bitfields and transactional instructions. your CPU instruction set itself is a very complex bitfield, so JIT compilers will encode their output using complex bit processing and few hardcoded bitfields so that the CPU can execute them correctly.

When we talk about performance things have a much larger impact than people imagine, today more then ever especially when you start considering multicores.

when multicore systems started to become more common all CPU manufacturer started to mitigate the issues of SMP with the addition of dedicated transactional memory access instructions while these were made specifically to mitigate the near impossible task to make multiple CPUs to cooperate at kernel level without a huge drop in perfomrance it actually provides additional benefits like an OS independent way to boost low level part of most programs. basically your program can use CPU assisted instructions to perform memory changes to integers sized memory locations, that is, a read-modify-write where the "modify" part can be anything you want but most common patterns are a combination of set/clear/increment. usually the CPU simply monitors if there is any other CPU accessing the same address location and if a contention happens it usually stops the operation to be committed to memory and signals the event to the application within the same instruction. this seems trivial task but superscaler CPU (each core has multiple ALUs allowing instruction parallelism), multi-level cache (some private to each core, some shared on a cluster of CPU) and Non-Uniform-Memory-Access systems (check threadripper CPU) makes things difficult to keep coherent, luckily the smartest people in the world work to boost performance and keep all these things happening correctly. todays CPU have a large amount of transistor dedicated to this task so that caches and our read-modify-write transactions work correctly. C# allows you to use the most common transactional memory access patterns using Interlocked class (it is only a limited set for example a very useful clear mask and increment is missing, but you can always use CompareExchange instead which gets very close to the same performance).

To achieve the same result using a array of booleans you must use some sort of lock and in case of contention the lock is several orders of magnitude less permorming compared to the atomic instructions.

here are some examples of highly appreciated HW assisted transaction access using bitfields which would require a completely different strategy without them of course these are not part of C# scope:

  • assume a DMA peripheral that has a set of DMA channels, let say 20 (but any number up to the maximum number of bits of the interlock integer will do). When any peripheral's interrupt that might execute at any time, including your beloved OS and from any core of your 32-core latest gen wants a DMA channel you want to allocate a DMA channel (assign it to the peripheral) and use it. a bitfield will cover all those requirements and will use just a dozen of instructions to perform the allocation, which are inlineable within the requesting code. basically you cannot go faster then this and your code is just few functions, basically we delegate the hard part to the HW to solve the problem, constraints: bitfield only

  • assume a peripheral that to perform its duty requires some working space in normal RAM memory. for example assume a high speed I/O peripheral that uses scatter-gather DMA, in short it uses a fixed-size block of RAM populated with the description (btw the descriptor is itself made of bitfields) of the next transfer and chained one to each other creating a FIFO queue of transfers in RAM. the application prepares the descriptors first and then it chains with the tail of the current transfers without ever pausing the controller (not even disabling the interrupts). the allocation/deallocation of such descriptors can be made using bitfield and transactional instructions so when it is shared between diffent CPUs and between the driver interrupt and the kernel all will still work without conflicts. one usage case would be the kernel allocates atomically descriptors without stopping or disabling interrupts and without additional locks (the bitfield itself is the lock), the interrupt deallocates when the transfer completes. most old strategies were to preallocate the resources and force the application to free after usage.

If you ever need to use multitask on steriods C# allows you to use either Threads + Interlocked, but lately C# introduced lightweight Tasks, guess how it is made? transactional memory access using Interlocked class. So you likely do not need to reinvent the wheel any of the low level part is already covered and well engineered.

so the idea is, let smart people (not me, I am a common developer like you) solve the hard part for you and just enjoy general purpose computing platform like C#. if you still see some remnants of these parts is because someone may still need to interface with worlds outside .NET and access some driver or system calls for example requiring you to know how to build a descriptor and put each bit in the right place. do not being mad at those people, they made our jobs possible.

In short : Interlocked + bitfields. incredibly powerful, don't use it

Interject answered 5/12, 2021 at 17:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.