Are there machines, where sizeof(char) != 1, or at least CHAR_BIT > 8?
Asked Answered
J

3

116

Are there machines (or compilers), where sizeof(char) != 1?

Does C99 standard says that sizeof(char) on standard compliance implementation MUST be exactly 1? If it does, please, give me section number and citation.

Update: If I have a machine (CPU), which can't address bytes (minimal read is 4 bytes, aligned), but only 4-s of bytes (uint32_t), can compiler for this machine define sizeof(char) to 4? sizeof(char) will be 1, but char will have 32 bits (CHAR_BIT macros)

Update2: But sizeof result is NOT a BYTES ! it is the size of CHAR. And char can be 2 byte, or (may be) 7 bit?

Update3: Ok. All machines have sizeof(char) == 1. But what machines have CHAR_BIT > 8 ?

Jamarjamb answered 7/2, 2010 at 1:2 Comment(10)
What are you really worried about? You don't like calling sizeof()?Mariken
I'm worried in C99 standard-compliance. I work closely with C99 compilersJamarjamb
As Unicode becomes even more important, there might come non-standard compilers that use Unicode characters as char (instead of wchar.) Even if the standard says that sizeof(char) must be 1, I wouldn't rely on that assumption.Subplot
@Chip Uni, very interesting. Please, add it as answer. Can you name some compilers with such renaming?Jamarjamb
there's no C compilers where sizeof(char) is not 1, unicode or not.Biforate
@Chip: sizeof(char) is always 1, even if char is 32-bits (as happens on some systems). C has lots of fun warts.Weevily
All versions of the C standard require CHAR_BIT to be at least 8; you can't have CHAR_BIT == 7 and be standard compliant. However, it is perfectly feasible for machines to have CHAR_BIT > 8. Old Cray machines did, I believe (sizeof(char) == sizeof(short) && sizeof(char) == sizeof(int) on those; I don't remember whether sizeof(int) == sizeof(long) or whether CHAR_BIT was 32 or 64; I expect it was 32, and I think sizeof(long) == 1 too. (You can find a reference to, but not online access to, a Cray C manual).Sutlej
This compiler has 16 bit chars.Biforate
What you call "bytes" are better referred to as "octets". In C, "bytes" and "chars" mean the exact same thing - the smallest unit of memory.Mckelvey
@Mckelvey Well done. You know what I mean. Hwve fun.Harragan
T
106

It is always one in C99, section 6.5.3.4:

When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.

Edit: not part of your question, but for interest from Harbison and Steele's. C: A Reference Manual, Third Edition, Prentice Hall, 1991 (pre c99) p. 148:

A storage unit is taken to be the amount of storage occupied by one character; the size of an object of type char is therefore 1.

Edit: In answer to your updated question, the following question and answer from Harbison and Steele is relevant (ibid, Ex. 4 of Ch. 6):

Is it allowable to have a C implementation in which type char can represent values ranging from -2,147,483,648 through 2,147,483,647? If so, what would be sizeof(char) under that implementation? What would be the smallest and largest ranges of type int?

Answer (ibid, p. 382):

It is permitted (if wasteful) for an implementation to use 32 bits to represent type char. Regardless of the implementation, the value of sizeof(char) is always 1.

While this does not specifically address a case where, say bytes are 8 bits and char are 4 of those bytes (actually impossible with the c99 definition, see below), the fact that sizeof(char) = 1 always is clear from the c99 standard and Harbison and Steele.

Edit: In fact (this is in response to your upd 2 question), as far as c99 is concerned sizeof(char) is in bytes, from section 6.5.3.4 again:

The sizeof operator yields the size (in bytes) of its operand

so combined with the quotation above, bytes of 8 bits and char as 4 of those bytes is impossible: for c99 a byte is the same as a char.

In answer to your mention of the possibility of a 7 bit char: this is not possible in c99. According to section 5.2.4.2.1 of the standard the minimum is 8:

Their implementation-defined values shall be equal or greater [my emphasis] in magnitude to those shown, with the same sign.

— number of bits for smallest object that is not a bit-field (byte)

CHAR_BIT 8

— minimum value for an object of type signed char

SCHAR_MIN -127

— maximum value for an object of type signed char

SCHAR_MAX +127

— maximum value for an object of type unsigned char

UCHAR_MAX 255

— minimum value for an object of type char

CHAR_MIN see below

— maximum value for an object of type char

CHAR_MAX see below

[...]

If the value of an object of type char is treated as a signed integer when used in an expression, the value of CHAR_MIN shall be the same as that of SCHAR_MIN and the value of CHAR_MAX shall be the same as that of SCHAR_MAX. Otherwise, the value of CHAR_MIN shall be 0 and the value of CHAR_MAX shall be the same as that of UCHAR_MAX. The value UCHAR_MAX shall equal 2CHAR_BIT − 1.

Tampa answered 7/2, 2010 at 1:3 Comment(8)
If you know you're working with char types and you know the language requires them to have a size of 1, why is it a good idea to always put the redundant sizeof(char)?Hydrometeor
I wonder what CHAR_BIT is on machines that are not binary (e.g. CHAR_MAX = 499, CHAR_MIN = -500).Pomegranate
@Roger. Generally, of course, it is highly important to use sizeof for implementation independence. Yes, given all of the above, char is a bit of an exception, and it is safe to assume sizeof(char)=1. I said "good idea" because: (a) if someone later changes to use, e.g., long, it lowers the chance of error, since sizeof(char) serves as a reminder, (b) a code reader, such as the OP, who isn't sure about sizeof(char), doesn't waste time worrying if the code is correct, (c) current non-standard or future implementations (unlikely). That is the reason for my habit, anyway.Tampa
(a) and (c) have much more serious ramifications this can't hope to solve, or even get close to solving; also YAGNI. Someone as in (b) just needs to be told once---I don't need to teach them in every line of my code. However, there are drawbacks to using sizeof(char): it's another item to debate/check/etc. in your coding conventions/standards/guidelines, wastes my time wondering if you really know C and what else may be incorrect, takes up visual/mental/text-line "bandwidth".Hydrometeor
Your's is certainly a valid and defensible approach. I agree for (c) YAGNI. (a) is quite possible but, yes, much more thought would be needed. For (b), I agree that teaching someone about C once and for all is a good solution. As for the drawbacks, your points are all valid, but for some, assuming sizeof(char)=1 is a cause for debate too. Your source "bandwidth" comment is valid, but (for others' info - I'm certain you're only too aware) the compiled code is identical: when I do gcc -S the resulting assembly of int a = 7*sizeof(char); or int a = 7*1; or int a = 7; is identical.Tampa
@Ramashalanka: Yes, the compiled code is equivalent. It's all the issues around readability and otherwise how people use the source code that I'm talking about. (And FWIW, I think you have a decent +1 answer here, I just find "always use sizeof(char)" to be misguided and a hotbutton issue for me, even if a small issue.)Hydrometeor
@Mk12: I agree. That comment didn't really fit with the certainty of the rest of the answer, so I've now removed it.Tampa
@Ramashalanka: I suppose it is a subjective thing though. It's fine if some people really want to use it, to be consistent with using it all the rest of the time. In my opinion though, there is no point, since if you can't trust that sizeof(char) == 1 will stay the same, you can't really trust anything.Thermotensile
B
23

There are no machines where sizeof(char) is 4. It's always 1 byte. That byte might contain 32 bits, but as far as the C compiler is concerned, it's one byte. For more details, I'm actually going to point you at the C++ FAQ 26.6. That link covers it pretty well and I'm fairly certain C++ got all of those rules from C. You can also look at comp.lang.c FAQ 8.10 for characters larger than 8 bits.

Upd2: But sizeof result is NOT a BYTES ! it is the size of CHAR. And char can be 2 byte, or (may be) 7 bit?

Yes, it is bytes. Let me say it again. sizeof(char) is 1 byte according to the C compiler. What people colloquially call a byte (8 bits) is not necessarily the same as what the C compiler calls a byte. The number of bits in a C byte varies depending on your machine architecture. It's also guaranteed to be at least 8.

Broadleaved answered 7/2, 2010 at 1:57 Comment(5)
Please!!! C++ is the really DIFFERENT language from C (C99). This question is about plain C only.Jamarjamb
<strike>What can I do when machine/CPU can't access 8-bit bytes? Unaligned access is prohibited.</strike> (Even on x86 malloc returns aligned data and allocate memory in multiplies of 4 bytes.) <strike>Then CHAT_BIT will be greater than 8. Yes, such platform can be rather special.</strike>Jamarjamb
@osgx, I tend to scream as much as you just did when people try to mix C and C++. But I think in this case that one C++ FAQ entry applies equally well to C.Broadleaved
The correct name for "8 bits" is octet. The C Standard uses the word "byte" for an object that is the size of a char. Others may use the word "byte" in different ways, often when they mean "octet", but in C (and C++, or Objective-C) it means "object the size of a char". A char may be more than 8 bits, or more than one octet, but it's always one byte.Conny
Texas Instruments has microcontrollers with 16 bit chars.Schreibe
J
16

PDP-10 and PDP-11 was.

Update: there like no C99 compilers for PDP-10.

Some models of Analog Devices 32-bit SHARC DSP have CHAR_BIT=32, and Texas Instruments DSP from TMS32F28xx have CHAR_BIT=16, reportedly.

Update: There is GCC 3.2 for PDP-10 with CHAR_BIT=9 (check include/limits.h in that archive).

Jamarjamb answered 7/2, 2010 at 2:41 Comment(6)
Don't confuse implementations of similar-but-not-C languages to C. You even said "I'm worried in C99 standard-compliance. I work closely with C99 compilers."Hydrometeor
@Roger: Not fair to call GCC3 not C99 compliant unless you are dealing with extreme edge cases which are considered to be bugs in GCC.Pomegranate
@Joshua, I think Roger says about K&R and pcc historic compilers. Also not fair to claim it C99 compliant before C99 compliance testsuite is ran on PDP-10, when compiled with this port (there can be bugs from porting and from machine itself). But it can be expected to be close to C99 standard as do GCC3.2 on x86.Jamarjamb
@Joshua: CHAR_BIT is allowed, in C99, to be greater than 8, but sizeof(char) must still be 1 (and this answer was much different when I left that comment). I'm not calling GCC3 non-compliant, and C89 makes the same requirement here, BTW. I quoted that text to say that osgx is the one worried about C99 compliancy and uses C99 compilers, so why is he worried about non-C99 compilers?Hydrometeor
I'm interested in "C99 standard-compliance". For that I want to find difference between C99 and not C99 compilers at some problem edges, and find what is allowed for them and what is not. (See for example my question about eliminating of infinite loops)Jamarjamb
Author of PDP-10 GCC here. CHAR_BIT is 9, but sizeof(char) is still 1.Injurious

© 2022 - 2024 — McMap. All rights reserved.