Is char guaranteed to be exactly 8-bit long? [duplicate]
Asked Answered
C

7

63

That's all. Didn't find any similar topic so bear with me it there is.

Countershaft answered 19/5, 2009 at 10:7 Comment(1)
If you don't have to support pre-C99 compilers and you want to know the exact size and signedness of your types include and use the types defined in stdint.h.Onward
B
54

From a copy of the ANSI C specification, see Section 3.1.2.5 - Types:

An object declared as type char is large enough to store any member of the basic execution character set. If a member of the required source character set enumerated in $2.2.1 is stored in a char object, its value is guaranteed to be positive. If other quantities are stored in a char object, the behavior is implementation-defined: the values are treated as either signed or nonnegative integers.

The concept of "execution character set" is introduced in Section 2.2.1 - Character sets.

In other words, a char has to be at least big enough to contain an encoding of at least the 95 different characters which make up the basic execution character set.

Now add to that the section 2.2.4.2 - Numerical limits

A conforming implementation shall document all the limits specified in this section, which shall be specified in the headers <limits.h> and <float.h> .

Sizes of integral types

The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

  • maximum number of bits for smallest object that is not a bit-field (byte)
    CHAR_BIT 8

  • minimum value for an object of type signed char
    SCHAR_MIN -127

  • maximum value for an object of type signed char
    SCHAR_MAX +127

  • maximum value for an object of type unsigned char
    UCHAR_MAX 255

....

So there you have it - the number of bits in a char must be at least 8.

Blacktop answered 19/5, 2009 at 10:29 Comment(11)
You missed the further requirements on char types imposed by the requirements of limits.h.Harryharsh
Good point, have amendedBlacktop
Put short, sizeof(char) is guaranteed to result in 1 byte (8 bits). However, some (quite old?) machines use(d) different byte sizes (e.g.: 7 bits).Snappish
@jweyrich, not exactly - the parenthetical bit count is not guaranteed. See #2098649Pullover
@ChrisStratton: I think you misinterpreted my affirmation. A byte may vary in size depending on architecture, but sizeof(char) is guaranteed to be 1 (byte) regardless of that fact. That 1 byte is composed by CHAR_BIT bits.Snappish
It's the parenthetical "(8 bits)" in your guarantee statement which is inaccurate.Pullover
@ChrisStratton: If you read my second sentence, it makes clear that the byte size may vary. But I'm probably wrong on the "quite old?" parenthesis. Thanks for pointing out.Snappish
WRT "quite old" it may depend on the direction - the current-production non-8-bit cases I'm personally aware of are longer than 8 bits, rather than shorter (of which I've only heard of historic anecdotes).Pullover
@PaulDixon, do you mean "at least 8" or "exactly 8"?Cesena
" Their implementation-defined values shall be equal or greater in magnitude to those shown"...so yes, I meant at least 8 bits.Blacktop
Late in my career, I realize the subtle point that C and the width of its data types is not fixed. The role of C is to be the API of the hardware. If the hardware has a char with 3 bits, then so be it. From the C specs, the only guarantees in the data types is in the relations to each other wither >, >= etc. The presence of uint8_t means that char in this hardware is 8 bits. This is 100% true for this hardware and for eternity. But only for that hardware alone. C is like water. Put it in glass, and it's shape will be of that glass.Jalisajalisco
G
12

No, it is not guaranteed to be 8-bits. sizeof(char) is guaranteed to be 1, but that does not necessarily mean one 8-bit byte.

Guggenheim answered 19/5, 2009 at 10:9 Comment(6)
reference / explanation? possibles solutions? please :)Countershaft
It is guaranteed in the C standard that char is exactly 1 byte. Almost all computers these days have 8 bits/byte thoughFinney
Reference - the C Standard, which I don't have a copy of. Explanation - not all platforms use 8-bit bytes. Solution - what's the problem?Guggenheim
@1800 - you are forgetting about unicode. sizeof a unicode character may also be 1.Guggenheim
And the C++ standard does not mention an 8-bit minimum - it says a char must be able to hold any member of the implementations basic character set. I imagine C says something similar.Guggenheim
yes, it does mention that, as above. a balance of +10, really?Wigfall
S
11

no, char data type must contain at least 8 bits (see ANSI C specification)

Sihonn answered 19/5, 2009 at 10:10 Comment(3)
exact quote, please?Boche
ANSI C may say that, ANSI C++ does not.Guggenheim
@Neil: the standard does include <climits> (thus CHAR_BIT), and says its content is the same as C's <limits.h>.Cigarillo
G
9

The C99 standard draft says that a byte must be at least 8-bit wide, because <limits.h> contains a macro CHAR_BIT which yields the number of bits per byte, and is guaranteed to be at least 8 (§5.2.4.2.1).

The C++ standard draft includes C's <limits.h> under the name <climits> (§18.2.2).

Gob answered 19/5, 2009 at 11:11 Comment(0)
A
5

Let's see exactly what the standard says :

5.2.4.2.1 Sizes of integer types
...
Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.


number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8

This tells us that a byte is at least 8 bits (the paragraph just aboves

If the value of an object of type char is treated as a signed integer when used in an expression, the value of CHAR_MIN shall be the same as that of SCHAR_MIN and the value of CHAR_MAX shall be the same as that of SCHAR_MAX. Otherwise, the value of CHAR_MIN shall be 0 and the value of CHAR_MAX shall be the same as that of UCHAR_MAX. The value UCHAR_MAX shall equal 2^CHAR_BIT - 1


For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.


For unsigned integer types other than unsigned char,the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter).

These passages tell us that :

  • an unsigned char needs to represent 2^CHAR_BIT-1 values, which can be encoded on minimum CHAR_BIT bits (according to the conventional bit representation, which is prescribed by the standard)
  • an unsigned char does not contain any additional (padding) bits
  • a signed char takes exactly the same space as an unsigned char
  • a char is implemented in the same way as either as signed or unsigned char

Conclusion : a char and it's variants unsigned char and signed char are guaranteed to be exactly a byte in size, and a byte is guaranteed to be at least 8 bits wide.

Now they are other indications (but not formal proof as above) that a char is indeed one byte :

Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number,order,and encoding of which are either explicitly specified or implementation-defined.


Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n]


The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer.If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.


When applied to an operand that has type char, unsigned char,or signed char, (or a qualified version thereof) the result is 1. When applied to an operand that has array type, the result is the total number of bytes in the array. 88) When applied to an operand that has structure or union type, the result is the total number of bytes in such an object, including internal and trailing padding.

(Note that there is an ambiguity here. Does the sizeof(char) here override the sizeof(type) rule or does it merely gives an example ?)

Still, there's a problem left to tackle. What exactly is a byte ? According to the standard it is "the smallest object that is not a bit-field". Note that this theoretically may not correspond to a machine byte, and that there is also ambiguity as to what is referred as an "machine byte" : it could whatever the constructors refers to as "byte", knowing that each constructor may have a different definition of "byte"; or a general definition like "a sequence of bits that a computer processes in individual units" or "the smallest addressable chunk of data".

For instance, a machine that have 7-bits bytes, would have to implement a "C byte" as two machine bytes.

Source of all citations : Committee Draft — September 7, 2007 ISO/IEC 9899:TC3.

Abirritant answered 29/1, 2011 at 21:46 Comment(0)
B
1

From the C standard describing limits.h (some reformatting required):

  1. number of bits for smallest object that is not a bit-field (byte): CHAR_BIT 8
  2. minimum value for an object of type signed char: SCHAR_MIN -127
  3. maximum value for an object of type signed char: SCHAR_MAX +127

CHAR_BIT minimum of 8 ensures that a character is at least 8-bits wide. The ranges on SCHAR_MIN and SCHAR_MAX ensure that representation of a signed char uses at least eight bits.

Blackmun answered 19/5, 2009 at 11:16 Comment(0)
K
-2

First thing I would say is that if you need a type to be an exact number of bits, then use a size specific type. Depending on your platform that could range from __s8 for a signed 8 bit type on Linux to __int8 in VC++ on Windows.

Now, according to Robert Love in his chapter on portability in "Linux Kernel Development" he states that the C standard "leaves the size of the standard types up to implementations, although it does dictate a minimum size."

Then in a footnote at the bottom of the page he says, "With the exception of char which is always 8 bits"

Now I'm not sure what he's basing this on, but maybe it's this section from the ANSI C spec?

2.2.4.2 Numerical limits

A conforming implementation shall document all the limits specified in this section, which shall be specified in the headers limits.h and float.h

"Sizes of integral types limits.h"

The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

maximum number of bits for smallest object that is not a bit-field (byte)

CHAR_BIT 8

minimum value for an object of type signed char

SCHAR_MIN -127

maximum value for an object of type signed char

SCHAR_MAX +127

maximum value for an object of type unsigned char

UCHAR_MAX 255

minimum value for an object of type char

CHAR_MIN see below

maximum value for an object of type char

CHAR_MAX see below

maximum number of bytes in a multibyte character, for any supported locale

MB_LEN_MAX 1

minimum value for an object of type short int

SHRT_MIN -32767

maximum value for an object of type short int

SHRT_MAX +32767

maximum value for an object of type unsigned short int

USHRT_MAX 65535

minimum value for an object of type int

INT_MIN -32767

maximum value for an object of type int

INT_MAX +32767

maximum value for an object of type unsigned int

UINT_MAX 65535

minimum value for an object of type long int

LONG_MIN -2147483647

maximum value for an object of type long int

LONG_MAX +2147483647

maximum value for an object of type unsigned long int

ULONG_MAX 4294967295

If the value of an object of type char sign-extends when used in an expression, the value of CHAR_MIN shall be the same as that of SCHAR_MIN and the value of CHAR_MAX shall be the same as that of SCHAR_MAX . If the value of an object of type char does not sign-extend when used in an expression, the value of CHAR_MIN shall be 0 and the value of CHAR_MAX shall be the same as that of UCHAR_MAX ./7/

Kone answered 19/5, 2009 at 11:28 Comment(1)
Ugh, I'd ask readers to please use guaranteed width types via the standard <cstdint> and its uint16_t et al., not platform/compiler-specific __magicWords.Wigfall

© 2022 - 2024 — McMap. All rights reserved.