Why must a short be converted to an int before arithmetic operations in C and C++?
Asked Answered
E

4

82

From the answers I got from this question, it appears that C++ inherited this requirement for conversion of short into int when performing arithmetic operations from C. May I pick your brains as to why this was introduced in C in the first place? Why not just do these operations as short?

For example (taken from dyp's suggestion in the comments):

short s = 1, t = 2 ;
auto  x = s + t ;

x will have type of int.

Elene answered 23/6, 2014 at 17:30 Comment(9)
@Jefffrey Integral promotion is part of the usual arithmetic conversions. short s=1, t=2; auto x = s+t; then x is an int.Eyelid
possible duplicate of What does 'Natural Size' really mean in C++?Alton
IMO: There was very little to be gained by obliging a compliant complier to perform integer math as short (or as a char). Without a demand, why require it?Polypoid
maxshort + maxshort > maxshortZischke
@Zischke that wouldn't explain why int is not promoted to long (maxint + maxint > maxint).Spiraea
I don't get the downvotes on this question. This is a good question with an interesting answer. Four downvotes and no comments is pretty discouraging.Flaviaflavian
if it wasn't promoted to int you'll see lots of questions like why int char i = 120; int j = i+i; int k = i << 11 doesn't result in the correct value, that's what "not to mention the correct answer more often" in the rationale belowCatchings
@dyp: The rules for why x is type int are completely different in C and C++ though... ;-)Unmannerly
@Shoe: True enough, but overflow does kick in a lot sooner with short than with int. But of course that is insufficient evidence; a satisfactory answer to this question requires a citation, if at all possible.Cundiff
F
46

If we look at the Rationale for International Standard—Programming Languages—C in section 6.3.1.8 Usual arithmetic conversions it says (emphasis mine going forward):

The rules in the Standard for these conversions are slight modifications of those in K&R: the modifications accommodate the added types and the value preserving rules. Explicit license was added to perform calculations in a “wider” type than absolutely necessary, since this can sometimes produce smaller and faster code, not to mention the correct answer more often. Calculations can also be performed in a “narrower” type by the as if rule so long as the same end result is obtained. Explicit casting can always be used to obtain a value in a desired type

Section 6.3.1.8 from the draft C99 standard covers the Usual arithmetic conversions which is applied to operands of arithmetic expressions for example section 6.5.6 Additive operators says:

If both operands have arithmetic type, the usual arithmetic conversions are performed on them.

We find similar text in section 6.5.5 Multiplicative operators as well. In the case of a short operand, first the integer promotions are applied from section 6.3.1.1 Boolean, characters, and integers which says:

If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.48) All other types are unchanged by the integer promotions.

The discussion from section 6.3.1.1 of the Rationale or International Standard—Programming Languages—C on integer promotions is actually more interesting, I am going to selectively quote b/c it is too long to fully quote:

Implementations fell into two major camps which may be characterized as unsigned preserving and value preserving.

[...]

The unsigned preserving approach calls for promoting the two smaller unsigned types to unsigned int. This is a simple rule, and yields a type which is independent of execution environment.

The value preserving approach calls for promoting those types to signed int if that type can properly represent all the values of the original type, and otherwise for promoting those types to unsigned int. Thus, if the execution environment represents short as something smaller than int, unsigned short becomes int; otherwise it becomes unsigned int.

This can have some rather unexpected results in some cases as Inconsistent behaviour of implicit conversion between unsigned and bigger signed types demonstrates, there are plenty more examples like that. Although in most cases this results in the operations working as expected.

Flaviaflavian answered 23/6, 2014 at 17:56 Comment(7)
Yes sometimes it'll smaller and faster because you don't need extra instructions to sign/zero extend the values to int or masked the high bits. In x86 you don't need extra instruction prefixes to change argument sizes eitherCatchings
Too bad the rationale didn't add a secondary rule that if the result of an additive, multiplicative, or bitwise operator is coerced to an unsigned type smaller than int, the expression will behave as if its operands were likewise coerced and the operation performed on the smaller type. There are no defined cases which would contradict such a rule, but some compilers may use promotion as an excuse to infer that a statement like like x*=y; (with both variables unsigned short) promises that x cannot exceed 2147483648/y.Rhachis
if i have something like this int x = 1234 and char *y = &x . Binary representation of 1234 is 00000000 00000000 00000100 11010010 . My machine is little endian so it reverses it and store in memory 11010010 00000100 00000000 00000000 LSB comes first. Now Main Part . if i use printf("%d" , *p). printf will read first byte 11010010only the output is -46 but 11010010 is 210 so why does it print -46 . I am really confused i guess some char to integer promotion is doing something but i don't know.Turnabout
You quote the C99 standard, but is this behaviour no older than that? I need to get to bed, o/w I would see if I could find something in K&R.Cundiff
@Cundiff well wikipedia point to a version of c89 although you can't get an official draft. In that version under Usual arithmetic conversions it describes a very similar procedure. So I would say yes. Note the quote above says slight modifications of those in K&R so K&R should be different.Flaviaflavian
Also see this document which explains how K&R did it whish was different.Flaviaflavian
The details in the document linked above can also be found in the C99 rationale section 6.3.1.1Flaviaflavian
P
21

It's not a feature of the language as much as it is a limitation of physical processor architectures on which the code runs. The int typer in C is usually the size of your standard CPU register. More silicon takes up more space and more power, so in many cases arithmetic can only be done on the "natural size" data types. This is not universally true, but most architectures still have this limitation. In other words, when adding two 8-bit numbers, what actually goes on in the processor is some type of 32-bit arithmetic followed by either a simple bit mask or another appropriate type conversion.

Pergolesi answered 23/6, 2014 at 17:39 Comment(5)
I'm not sure that there's necessarily a bit mask. The processor does the arithmetic in its native word size, and then only stores the lower bits back to memory. (Also, while you're right that most architectures do only do word arithmetic, the one notable exception, Intel, is pretty wide spread.)Petrosal
@JamesKanze You're right. I edited by answer. And yes, Intel is way out there when it comes to optimized arithmetic, especially with theit IPP libraries.Pergolesi
I disagree with "it is not a feature of the language"; it is a feature of the language. It is defined like that because ... but it is defined by the language, not by the processor.Kidd
@JonathanLeffler It is certainly a feature of the language. Of most languages, I think. But Phonon's answer explains why languages have this feature. (It's probably worth pointing out that in the past, machines only had words, not bytes, half-words, etc. And when byte addressing was introduced, it only affected memory access, not registers and operations. So while the PDP-11 had both byte and word instructions, when the target address of a byte instruction was a register, the byte was sign extended to a word.)Petrosal
How the CPU executes commands is completely hidden from user code. You've not answered the question at all.Geezer
I
18

short and char types are considered by the standard sort of "storage types" i.e. sub-ranges that you can use to save some space but that are not going to buy you any speed because their size is "unnatural" for the CPU.

On certain CPUs this is not true but good compilers are smart enough to notice that if you e.g. add a constant to an unsigned char and store the result back in an unsigned char then there's no need to go through the unsigned char -> int conversion. For example with g++ the code generated for the inner loop of

void incbuf(unsigned char *buf, int size) {
    for (int i=0; i<size; i++) {
        buf[i] = buf[i] + 1;
    }
}

is just

.L3:
    addb    $1, (%rdi,%rax)
    addq    $1, %rax
    cmpl    %eax, %esi
    jg  .L3
.L1:

where you can see that an unsigned char addition instruction (addb) is used.

The same happens if you're doing your computations between short ints and storing the result in short ints.

Irredentist answered 23/6, 2014 at 17:49 Comment(0)
L
8

The linked question seems to cover it pretty well: the CPU just doesn't. A 32-bit CPU has its native arithmetic operations set up for 32-bit registers. The processor prefers to work in its favorite size, and for operations like this, copying a small value into a native-size register is cheap. (For the x86 architecture, the 32-bit registers are named as if they are extended versions of the 16-bit registers (eax to ax, ebx to bx, etc); see x86 integer instructions).

For some extremely common operations, particularly vector/float arithmetic, there may be specialized instructions that operate on a different register type or size. For something like a short, padding with (up to) 16 bits of zeroes has very little performance cost and adding specialized instructions is probably not worth the time or space on the die (if you want to get really physical about why; I'm not sure they would take actual space, but it does get way more complex).

Libyan answered 23/6, 2014 at 17:39 Comment(2)
This is not purely an hardware issue, there was a conscious choice made during the drafting of the C99 standard to make integer promotions work a specific way.Flaviaflavian
"Note that the 32-bit registers are also named as if they are extended versions of the 16-bit registers (eax to ax, ebx to bx, etc)" this is true for x86 but not correct for most other architectures. MIPS registers have the same name regardless in 32 or 64-bit mode and they always work in the native size so you can't do arithmetics in 8 or 16 bits anywayCatchings

© 2022 - 2024 — McMap. All rights reserved.