Why do arithmetic operations on unsigned chars promote them to signed integers?
Asked Answered
F

1

15

Many answers to similar questions point out that it is so due to the standard. But, I cannot understand the reasoning behind this decision by the standard setters.

From my understanding an unsigned char does not store the value in 2's complement form. So, I don't see a situation where let's say XORing two unsigned chars would produce unexpected behavior. Therefore, promoting them to int just seems like a waste of space (in most cases) and CPU cycles.

Moreover, why int? If a variable is being declared as unsigned, clearly the unsignedness is important to the programmer, therefore a promotion to an unsigned int would still make more sense than an int, in my opinion.

[EDIT #1] As stated out in the comments, promotion to unsigned int will take place if an int cannot sufficiently accommodate the value in the unsigned char.

[EDIT #2] To clarify the question, if it is about the performance benefit of operating over int than char, then why is it in the standard? This could have been given as a suggestion to compiler designers for better optimization. Now, if someone were to design a compiler which didn't do this that would make their compiler as one not adhering to the C/C++ standard fully, even though, hypothetically this compiler did support all other required features of the language. In a nutshell, I cannot figure out a reason for why I cannot operate directly over unsigned chars, therefore the requirement to promote them to ints, seems unnecessary. Can you give me an example which proves this wrong?

Ferromagnesian answered 27/5, 2020 at 9:34 Comment(13)
32/64 bit operations are faster in modern CPUs than 8 bit arithmetic. If the char is not an array, it's likely that it's already been stored as integer anyway.Moult
This might help idryman.org/blog/2012/11/21/integer-promotionTranslocation
What sort of answer do you want? (Note that the standard doesn't specify promotion of unsigned char to int, and may promote to unsigned int given suitable definitions of char and int).Lymphocytosis
@Translocation that article contains many errors. It's completely misguided in many regards.Lymphocytosis
@MichaelChourdakis Fair enough, but shouldn't that be left upto the compiler to optimize rather than making it part of the standard? What if, I try to compile this kind of a statement on a machine where int = 16 bits?Ferromagnesian
@PaulHankin What do you mean by suitable definitions?Ferromagnesian
For example if unsigned char and unsigned int both have 32 bits, unsigned char will promote to unsigned int, not int.Lymphocytosis
@PaulHankin But how can an unsigned char ever have 32 bits? Isn't it the shortest primitive data type, set to a length of 8 bits? No more, no less than 8 bits, exactly that.Ferromagnesian
@DashwoodIce9: char is one byte, but one byte might be 32 bits (in future system).Ajax
Did you check for yourself whether chars must be 8 bits before disagreeing? It seems not so hard to check, either in the standard itself or online.Lymphocytosis
@PaulHankin, apologies. Yes, I did not check that. I was somehow under the delusion that chars are always 8 bits wide. Some research and Jarod's comment cleared that up. Thanks. I've updated the question accordingly.Ferromagnesian
The only thing I have to add is that a char is precisely CHAR_BIT bits wide. I'll concede that on most modern architectures CHAR_BIT == 8 is true, but you should not assume that to hold universally (now, in the past, or the future).Sherman
For C++, its implicit type promotion rules were inherited from C.Clinkerbuilt
P
7

You can find this document on-line: Rationale for International Standard - Programming Languages - C (Revision 5.10, 2003).

Chapter 6.3 (p. 44 - 45) is about conversions

Between the publication of K&R and the development of C89, a serious divergence had occurred among implementations in the evolution of integer promotion rules. Implementations fell into two major camps which may be characterized as unsigned preserving and value preserving.

The difference between these approaches centered on the treatment of unsigned char and unsigned short when widened by the integer promotions, but the decision had an impact on the typing of constants as well (see §6.4.4.1).

The unsigned preserving approach calls for promoting the two smaller unsigned types to unsigned int. This is a simple rule, and yields a type which is independent of execution environment.

The value preserving approach calls for promoting those types to signed int if that type can properly represent all the values of the original type, and otherwise for promoting those types to unsigned int.

Thus, if the execution environment represents short as something smaller than int, unsigned short becomes int; otherwise it becomes unsigned int. Both schemes give the same answer in the vast majority of cases, and both give the same effective result in even more cases in implementations with two's complement arithmetic and quiet wraparound on signed overflow - that is, in most current implementations. In such implementations, differences between the two only appear when these two conditions are both true:

  1. An expression involving an unsigned char or unsigned short produces an int-wide result in which the sign bit is set, that is, either a unary operation on such a type, or a binary operation in which the other operand is an int or “narrower” type.

  2. The result of the preceding expression is used in a context in which its signedness is significant:

    sizeof(int) < sizeof(long) and it is in a context where it must be widened to a long type, or

    • it is the left operand of the right-shift operator in an implementation where this shift is defined as arithmetic, or

    • it is either operand of /, %, <, <=, >, or >=.

In such circumstances a genuine ambiguity of interpretation arises. The result must be dubbed questionably signed, since a case can be made for either the signed or unsigned interpretation. Exactly the same ambiguity arises whenever an unsigned int confronts a signed int across an operator, and the signed int has a negative value. Neither scheme does any better, or any worse, in resolving the ambiguity of this confrontation. Suddenly, the negative signed int becomes a very large unsigned int, which may be surprising, or it may be exactly what is desired by a knowledgeable programmer. Of course, all of these ambiguities can be avoided by a judicious use of casts.

One of the important outcomes of exploring this problem is the understanding that high-quality compilers might do well to look for such questionable code and offer (optional) diagnostics, and that conscientious instructors might do well to warn programmers of the problems of implicit type conversions.

The unsigned preserving rules greatly increase the number of situations where unsigned int confronts signed int to yield a questionably signed result, whereas the value preserving rules minimize such confrontations. Thus, the value preserving rules were considered to be safer for the novice, or unwary, programmer. After much discussion, the C89 Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving.

QUIET CHANGE IN C89

A program that depends upon unsigned preserving arithmetic conversions will behave differently, probably without complaint. This was considered the most serious semantic change made by the C89 Committee to a widespread current practice.

For reference, you can find more details about those conversions updated to C11 in this answer by Lundin.

Plowshare answered 27/5, 2020 at 11:45 Comment(6)
Thank you Bob. The linked document clears up all doubts as to why implicit promotion to int takes place rather than unsigned int. I, however, still do not see the need to promote it in the first place. Lundin's answer mentions "The harsh reality caused by the integer promotions means that almost no operation in C can be carried out on small types like char or short. Operations are always carried out on int or larger types." He then says that the compiler is allowed to optimize the code. I didn't get how it is being optimized though. Could you clarify that?Ferromagnesian
@Ferromagnesian I belive it was a matter of performance expectations. See e.g. #5069989 or #5347542 . Having the calculations performed on types of the CPU word size should require less cycles. Now, with SIMD instructions set, it's a bit more complicated.Plowshare
@Ferromagnesian -- re: optimization -- under the "as if" rule, the compiler is allowed to do things differently if the result is the same as it would have gotten by exactly following the rules.Mufinella
Since this is tagged C++, I wanted to point out that the C++ standard (cppreference link) says something different -- adding two unsigned shorts should return an unsigned short, even though every compiler I've tried converts it to an int.Moersch
@Moersch I'm not sure where you read that. That link says: "If the operand passed to an arithmetic operator is integral or unscoped enumeration type, then before any other action [...], the operand undergoes integral promotion.". Following that link: "In particular, arithmetic operators do not accept types smaller than int as arguments, and integral promotions are automatically applied after lvalue-to-rvalue conversion, if applicable. [...] unsigned char or unsigned short can be converted to int if it can hold its entire value range, and unsigned int otherwise;"Plowshare
@Plowshare Hmph. I didn't look at the integral promotion page; the language in the operator arithmetic page implies that the smaller types survive, short being mentioned as a separate conversion rank.Moersch

© 2022 - 2024 — McMap. All rights reserved.