What is CHAR_BIT?
Asked Answered
N

3

116

Quoting the code for computing the integer absolute value (abs) without branching from http://graphics.stanford.edu/~seander/bithacks.html:

int v;           // we want to find the absolute value of v
unsigned int r;  // the result goes here 
int const mask = v >> sizeof(int) * CHAR_BIT - 1;

r = (v + mask) ^ mask;

Patented variation:

r = (v ^ mask) - mask;

What is CHAR_BIT and how use it?

Ninnette answered 8/7, 2010 at 5:49 Comment(3)
Note: v + mask can lead to int overflow - which is undefined behavior. (v ^ mask) - mask may have like issues.Consecration
This code as written isn't portable by the standards and is dangerous. In general, bit shifting signed numbers can easily invoke implementation-defined (right shift of negatives) or even undefined behavior (left shift a 1 bit into the sign bit) and should be avoided. The addition can also cause undefined behavior. It should be easy to rewrite this code with unsigned int's, which have well defined behavior on all platforms according to the standards.Etamine
It also assumes 2's complement representation of negative values. I realize this code is going to ridiculous lengths to avoid a conditional, but that is probably the easiest, most understandable, and most portable thing to do here.Etamine
N
2

You should be aware that this code depends on the implementation-defined behavior of right bitshift on signed types. gcc promises to always give the sane behavior (sign-bit-extension) but ISO C allows the implementation to zero-fill the upper bits.

One way around this problem:

#ifdef HAVE_SIGN_EXTENDING_BITSHIFT
int const mask = v >> sizeof(int) * CHAR_BIT - 1;
#else
int const mask = -((unsigned)v >> sizeof(int) * CHAR_BIT - 1);
#endif

Your Makefile or config.h etc. can define HAVE_SIGN_EXTENDING_BITSHIFT at build time depending on your platform.

Nuthatch answered 8/7, 2010 at 6:31 Comment(7)
I don’t understand how this can be an accepted answer as it doesn’t answer the question, even though it is a very interesting comment.Ferrochromium
@Mauris: Somebody edited the question and promoted a sub-question to the question title. The original title was admittedly awful, but the OP's question was about how the cited bit hack code works, and "it doesn't, at least not portably, and here's why" is a useful answer.Nuthatch
Ah, I understand. Sadly, this question shows up very high in the Google Search results for "What is CHAR_BIT?", even if that wasn't the original question. :( Given your explanation, I understand why you wrote this answer, but for posterity it might be more useful to either (a) remove your answer and rewrite it as a comment to the question, so that @AraK's shows up on top, or (b) edit your answer so that it answers the current title of the question.Squatter
Due to the difference in intention(s) between the OP's original question and the editor's interpretation thereof, it appears as though the nature of the original request was involuntarily shifted. While both questions (original and edited) have merit, this discrepancy needs to be addressed. I now inquire: Could this answer be added to a wiki? This would possibly help people who are searching for this type of info, although it does not pertain to the original question. After such, the question could be edited again, to fit dato datuashvili's original request. Just a concerned reader...Sterner
I just looked at the history of this question and the original question doesn't actually ask anywhere how the code works. The question that the editor promoted to the title is the only actual question in there.Supplemental
Note: you can find "sign extending bitshift" also referred to as "arithmetic bitshift" (as opposed to "logical bitshift).Refection
For the fallback version, v < 0 ? -1 : 0 is simpler to read, obviously correct, and optimizes at least as well with mainstream compilers. (I guess maybe -1U would be appropriate if you don't want to assume 2's complement either.) godbolt.org/z/MeEoWofK7 - MSVC uses shr/neg for the fully-portable version using the unsigned shift, but all of GCC/Clang/MSVC know the peephole optimization of v < 0 ? -1 : 0 to sar reg, 31 for x86 at least, even ancient versions like GCC4.1 and clang 3.0, with only -O1 optimization. But perhaps some embedded compilers aren't as nice.Hammerhead
C
265

CHAR_BIT is the number of bits in char. These days, almost all architectures use 8 bits per byte but it is not the case always. Some older machines used to have 7-bit byte.

It can be found in <limits.h>.

Choragus answered 8/7, 2010 at 5:52 Comment(14)
Some DSPs have 10 or more bit-bytes.Hindoo
C requires CHAR_BIT>=8 and allows much larger values for DSPs which only have a single type size, often 32bit. POSIX requires CHAR_BIT==8. In general, you can assume any multi-user/multitasking server-oriented or interactive-use-oriented architecture with any chance of being connected to the internet or interchanging textual data with the outside world has CHAR_BIT==8.Nuthatch
As an implication of other requirements C99 (in contrast to C89 as in R.'s comment) also defacto requires CHAR_BIT==8.Griddle
@Jens: Are you referring to the issue of unsigned char being promoted to unsigned int rather than int?Stockyard
@caf: No, it is that C99 requires the types int8_t and uint8_t to exist. Thus there exists a type of width 8. Since sizeof any type must be compatible with sizeof char actually sizeof int8_t must be 1. So CHAR_BIT == 8. I have written up something around that obeservation here: gustedt.wordpress.com/2010/06/01/how-many-bits-has-a-byteGriddle
@Jens Gustedt: Please cite a section in the C99 spec. Of the exact-width integer types, the C99 spec says "These types are optional." (7.18.1.1/3) The minimum-width and fastest-width types are required, however.Rangy
@Jens: As @Rangy says, the exact-width types are optional. The wording is: "Conversely, for each type described herein that the implementation does not provide, <stdint.h> shall not declare that typedef name nor shall it define the associated macros. An implementation shall provide those types described as "required", but need not provide any of the others (described as "optional")." This is one good reason why you shouldn't use those types unless you actually, really do need an exact-width type.Stockyard
@Rangy & caf: sorry I mixed things up. yes the requirement I refered to actually comes from POSIX for stdint.h. So there it is required, and it is also marked as Extension to the ISO C standard, without referring to a particular version of that standard. My bad.Griddle
@Jens Gustedt: POSIX is in ISO and C is in ISO. Version of POSIX has to always refer to the latest version of C standard on the moment of POSIX being accepted as a standard. POSIXv6/SUSv3 (2001) uses C99. SUSv2/earlier use C89. en.wikipedia.org/wiki/Single_UNIX_SpecificationRosy
@datodatuashvili Which machine has 7-bit byte?Bouse
@Bouse stone age computers.Wyoming
@Bouse My old TI-57 pocket calcultor had a 4 bit processor.Tenterhook
@AlexisWilke At least 4 is a power of 2; 7 is not.Bouse
@Alex: The PDP-10 apparently had a suite of instructions to process data as "bytes" of variable width, with a common software convention being to use 7-bit "bytes" such that five characters (and one unused bit) could be stored per word. There are other extremely esoteric and perhaps arguable cases in other answers at that link.Mansoor
S
6

Trying to answer both the explicit question (what is CHAR_BIT) and the implicit question (how does this work) in the original question.


A char in C and C++ represents the smallest unit of memory the C program can address*.

CHAR_BIT in C and C++ represents the number of bits in a char. It must always be at least 8 due to other requirements on the char type. In practice on all modern general purpose computers it is exactly 8 but some historic or specialist systems may have higher values.

Java has no equivalent of CHAR_BIT or sizeof, there is no need for it as all primitive types in Java are fixed size and the internal structure of objects is opaque to the programmer. If translating this code to Java you can simply replace sizeof(int) * CHAR_BIT - 1 by the fixed value 31.

In this particular code, it is being used to calculate the number of bits in an int. Be aware that this calculation assumes that the int type doesn't contain any padding bits.

Assuming that your compiler chooses to sign extend on bit shifts of signed numbers and assuming your system uses 2s complement representation for negative numbers this means that mask will be 0 for a positive or zero value and -1 for a negative value.

To negate a twos complement number we need to perform a bitwise not and then add one. Equivalently we can subtract one and then bitwise negate it.

Again assuming twos complement representation -1 is represented by all ones, so exclusive or with -1 is equivalent to bitwise negation.

So when v is zero the number is left alone, when v is one it is negated.

Something to be aware of is that signed overflow in C and C++ is undefined behaviour. So using this abs implementation on the most negative value leads to undefined behaviour. This can be fixed by adding casts such that the final line of the program is evaluated in unsigned int.

* Which is usually but not necessarily the same as the smallest unit of memory the hardware can address. An implementation can potentially combine multiple units of hardware-addressable memory into one unit of program-addressable memory or split one unit of hardware addressable memory into multiple units of program-addressable memory.

Supplemental answered 3/10, 2017 at 16:55 Comment(2)
"simply replace [...] to value 31" is not that easy with generics, though.Tenterhook
True but if you are translating to Java (I mentioned Java because the original question did, though that was later edited out) does not really have generics in a form that are useful for numeric code.Supplemental
N
2

You should be aware that this code depends on the implementation-defined behavior of right bitshift on signed types. gcc promises to always give the sane behavior (sign-bit-extension) but ISO C allows the implementation to zero-fill the upper bits.

One way around this problem:

#ifdef HAVE_SIGN_EXTENDING_BITSHIFT
int const mask = v >> sizeof(int) * CHAR_BIT - 1;
#else
int const mask = -((unsigned)v >> sizeof(int) * CHAR_BIT - 1);
#endif

Your Makefile or config.h etc. can define HAVE_SIGN_EXTENDING_BITSHIFT at build time depending on your platform.

Nuthatch answered 8/7, 2010 at 6:31 Comment(7)
I don’t understand how this can be an accepted answer as it doesn’t answer the question, even though it is a very interesting comment.Ferrochromium
@Mauris: Somebody edited the question and promoted a sub-question to the question title. The original title was admittedly awful, but the OP's question was about how the cited bit hack code works, and "it doesn't, at least not portably, and here's why" is a useful answer.Nuthatch
Ah, I understand. Sadly, this question shows up very high in the Google Search results for "What is CHAR_BIT?", even if that wasn't the original question. :( Given your explanation, I understand why you wrote this answer, but for posterity it might be more useful to either (a) remove your answer and rewrite it as a comment to the question, so that @AraK's shows up on top, or (b) edit your answer so that it answers the current title of the question.Squatter
Due to the difference in intention(s) between the OP's original question and the editor's interpretation thereof, it appears as though the nature of the original request was involuntarily shifted. While both questions (original and edited) have merit, this discrepancy needs to be addressed. I now inquire: Could this answer be added to a wiki? This would possibly help people who are searching for this type of info, although it does not pertain to the original question. After such, the question could be edited again, to fit dato datuashvili's original request. Just a concerned reader...Sterner
I just looked at the history of this question and the original question doesn't actually ask anywhere how the code works. The question that the editor promoted to the title is the only actual question in there.Supplemental
Note: you can find "sign extending bitshift" also referred to as "arithmetic bitshift" (as opposed to "logical bitshift).Refection
For the fallback version, v < 0 ? -1 : 0 is simpler to read, obviously correct, and optimizes at least as well with mainstream compilers. (I guess maybe -1U would be appropriate if you don't want to assume 2's complement either.) godbolt.org/z/MeEoWofK7 - MSVC uses shr/neg for the fully-portable version using the unsigned shift, but all of GCC/Clang/MSVC know the peephole optimization of v < 0 ? -1 : 0 to sar reg, 31 for x86 at least, even ancient versions like GCC4.1 and clang 3.0, with only -O1 optimization. But perhaps some embedded compilers aren't as nice.Hammerhead

© 2022 - 2024 — McMap. All rights reserved.