What are integer literal types? And how are they stored?
Asked Answered
D

5

3

I have just started learning C and a question has bugged me for a while now. If I write

int i = -1;
unsigned int j = 2;
unsigned int k = -2;

What is the type of integer literal -1 and 2 and -2, and how does it get converted to get stored in signed int and unsigned int?

What is meant by signed integer, is that the property of variable or integer literal too? Like -2 is signed integer and 2 is unsigned integer?

Damnatory answered 31/12, 2016 at 4:53 Comment(9)
In C literals are constants that have their value predefined. Literals can be of any of the basic data types like an integer constant, a floating constant, a character constant, or a string literal. And regarding signed and unsigned integer it specifies that how the most significant is to be intepreted. In your case all the three -1, 2 and -2 are of type integer. A question regarding difference signed and unsigned integer was asked on stackoverflow. You can refer - #3812522Etherealize
@GAURANGVYAS Detail: Is is odd that in C, character constants are not of type char, but of type int. "An integer character constant has type int" C11 §6.4.4.4 10Unilingual
No one here has mentioned two's complement which is how all "integer" types are stored whether signed or unsigned. (the bits are the same, only the interpretation is different) see: en.wikipedia.org/wiki/Two's_complementCellarer
The expression -1 is not a constant (literal). It's an expression consisting of a unary - operator applied to the constant 1.Erna
@ebyrob: Two's complement doesn't apply to unsigned types. It's the most common representation for signed types, but not the only one.Erna
@KeithThompson representation's other than two's complement are pretty rare: #12277457 Also, the "difference" between signed and unsigned is two's complement, and that difference seems to be what the question is about. It's also one of the first subjects that comes up in an intro to programming course when discussing basic integer types.Cellarer
@ebyrob: Yes, representations other than two's complement are rare. But again, two's complement is used only for signed types. Unsigned types use a pure binary representation.Erna
@KeithThompson See This #41407070 , it is asking same thing , please answer it , it is a doubt i have for somany days now.Damnatory
@KeithThompson Also when you says "Two's complement is only used for signed type" , you mean when object(variable) is of signed type right..??Damnatory
E
11

First off, -1 is not an integer constant. It's an expression consisting of a unary - operator applied to the constant 1.

In C99 and C11, the type of a decimal integer constant is the first of int, long int, or long long int in which its value will fit. Similarly, an octal or hexadecimal literal has type int, unsigned int, long int, unsigned long int, long long int, or unsigned long long int. The details are in N1570 6.4.4.1.

-1 and -2 are constant expressions. The result of the unary - operator has the same type as the operand (even if that result causes an overflow, as -INT_MIN does in most implementations).

int i = -1;

The constant 1 and the expression -1 are both of type int. The value is stored in the int object i; no conversion is necessary. (Strictly speaking, it's converted from int to int, but that doesn't matter.)

unsigned int j = 2;

2 is of type int. It's converted from int to unsigned int.

unsigned int k = -2;

-2 is of type int. It's converted from int to unsigned int. This time, because -2 is outside the range of unsigned int, the conversion is non-trivial; the result is UINT_MAX - 1.

Some terminology:

A constant is what some other languages call a literal. It's a single token that represents a constant value. Examples are 1 and 0xff.

A constant expression is an expression that's required to be evaluated at compile time. A constant is a constant expression; so is an expression whose operands are constants or constant expressions. Examples are -1 and 2+2.

Erna answered 31/12, 2016 at 10:30 Comment(9)
You explain absolutely wonderfull , hats off , i have never seen so clear and wonderfull explaination. I have written a follow up question , , Question is here. "#41408068"Damnatory
You said "-1 is not an integer constant. It's an expression consisting of a unary - operator applied to the constant 1." But " the type of a decimal integer constant is the first of int, long int, or long long int" , so if -1 is not integer constant why would it have any type , because type are defined for integer constant.Damnatory
Every expression has a type.Erna
You said "In C99 and C11, the type of a decimal integer constant is the first of int, long int, or long long int in which its value will fit." . So , isn't it contradictory.?Damnatory
@Stranger: No, why would it be?Erna
decimal integer constant has following type , and you said -1 is not decimal integer constant.Damnatory
I like the added correctness and aim of this answer over my now deleted one.Unilingual
@Damnatory A few years later: There is no contradiction. 1 is a decimal integer constant of type int. It's also an expression. -1 is not a decimal integer constant, but it is an expression, and it also has type int. Its type is not determined by the rules for decimal integer constants, but by the rules for unary the unary - operator. "The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type."Erna
great. you haven't mentioned the literal suffix and why octal or hexadecimal literal is differentFingerling
S
1

In C99 and C11

If you want to specifies the type of your integer you can use an integer constant:

You can write integer with decimal, octal or hexa representation:

int decimal = 42; // nothing special
int octal = 052; // 0 in front of the number
int hexa = 0x2a; // 0x
int HEXA = 0X2A; // 0X

Decimal representation:

By default, the type of -1, 0, 1, etc. is int, long int or long long int. The compiler must peak the type that can handle your value:

int a = 1; // 1 is a int
long int b = 1125899906842624; // 1125899906842624 is a long int

That only work for signed value, if you want unsigned value you need to add u or U:

unsigned int a = 1u;
unsigned long int b = 1125899906842624u;

If you want long int or long long int but not int, you can use l or L:

long int a = 1125899906842624l;

You can combine u and l:

unsigned long int a = 1125899906842624ul;

Finally, if you want only long long int, you can use ll or LL:

unsigned long long int a = 1125899906842624ll;

And again you can combine with u.

unsigned long long int a = 1125899906842624ull;

Octal and Hexadecimal representation:

Without suffix, a integer will match with int, long int, long long int, unsigned int, unsigned long int and unsigned long long int.

int a = 0xFFFF;
long int b = -0xFFFFFFFFFFFFFF;
unsigned long long int c = 0xFFFFFFFFFFFFFFFF;

u doesn't differ from decimal representation. l or L and ll or LL add unsigned value type.


This is similar to string literals.

Stitching answered 31/12, 2016 at 6:31 Comment(0)
C
0

What is the type of integer literal -1 and 2 and -2, and how does it gets convert to get stored in signed int and unsigned int?

The C parser/compiler, as previously said by chux, "understands" your literal as a signed integer - always. They are then casted to fit in the variable you assign to, which can be of different type. Doing this, some bits can be lost or they can change their meaning (for example, assigning a negative value to an unsigned int). Some compiler could warn you about a "literal out of range", other compilers could silently accept (and truncate) your literals.

What Do You Mean By Signed Integer, is that the property of variable or integer literal too , like -2 is signed integer and 2 is unsigned integer.?

It is a property of the variable. In reality, it is a "type" - written as a "two words" identifier.

Cymbiform answered 31/12, 2016 at 6:52 Comment(0)
S
0

I would say it depends on the compiler and the architecture of the machine. Given 8 bits = 1 byte, the following table summarizes different Integer types with their required sizes for both (signed) int and unsigned int on 32 and 64-bit machines:

+------+------+---------+-------+--------+-------------+-----------+ 
|Type  |char  |short int|int    |long int|long long int|int pointer|
+------+-------+--------+-------+--------+-------------+-----------+
|32-bit|8 bits|16 bits  |32 bits|32 bits |64 bits      |32 bits    |
+------+------+---------+-------+--------+-------------+-----------+
|64-bit|8 bits|16 bits  |32 bits|64 bits |64 bits      |64 bits    | 
+------+------+---------+-------+--------+-------------+-----------+

As you may know, the biggest difference between (signed) int and unsigned int is that in (signed) int the Most Significant Bit (MSB) is reserved for the sign of the Integer and hence:

  • a (signed) int having n bits can have a value between -(2^(n-1)) to (2^(n-1))-1
  • an unsigned int having n bits can have a value between 0 to (2^n)-1

Now, we can calculate the range (possible values) of different (singed) int types as follows:

+------+---------+----------+----------+----------+-------------+-----------+
|Type  |char     |short int |int       |long int  |long long int|int pointer|
+------+---------+----------+----------+----------+-------------+-----------+
|32-bit|-(2^7) to|-(2^15) to|-(2^31) to|-(2^31) to|-(2^63) to   |-(2^31) to |
|      |+(2^7)-1 |+(2^15)-1 |+(2^31)-1 |+(2^31)-1 |+(2^63)-1    |+(2^31)-1  |
+------+---------+----------+----------+----------+-------------+-----------+
|64-bit|-(2^7) to|-(2^15) to|-(2^31) to|-(2^63) to|-(2^63) to   |-(2^63) to |
|      |+(2^7)-1 |+(2^15)-1 |+(2^31)-1 |+(2^63)-1 |+(2^63)-1    |+(2^63)-1  |
+------+---------+----------+----------+----------+-------------+-----------+

Furthermore, we can calculate the range (possible values) of different unsigned int types as follows:

+------+-------+----------+-------+--------+-------------+-----------+
|Type  |char   |short int|int     |long int|long long int|int pointer|
+------+-------+---------+--------+--------+-------------+-----------+
|32-bit|0 to   |0 to     |0 to    |0 to    |0 to         |0 to       |
|      |(2^8)-1|(2^16)-1 |(2^32)-1|(2^32)-1|(2^64)-1     |(2^32)-1   |
+------+-------+---------+--------+--------+-------------+-----------+
|64-bit|0 to   |0 to     |0 to    |0 to    |0 to         |0 to       |
|      |(2^8)-1|(2^16)-1 |(2^32)-1|(2^64)-1|(2^64)-1     |(2^64)-1   |
+------+-------+---------+--------+--------+-------------+-----------+

Finally, to see how and why we store a long long int using 8 bytes (64 bits) on a 32-bit machine see this post.

Strawworm answered 31/12, 2016 at 6:57 Comment(9)
The range for signed ints is not correct. As the MSB is not simply a sign bit but two's complement is used on most machines, there is one more negative value than positive values. Also you need one bit too much. For signed int the range is -(2^(n-1)) .. 2^(n-1)-1. Your formula would result in -257..255 for char while it is -128..127Arron
Good catch! I fixed the formula and updated the table as well!Strawworm
I just saw that also the unsigned values are "off by 1". They go from 0 to (2^n) -1. Apart from that the size of your data types for 32 and 64 bit machines are not fixed. On 64 bit machines you could also get 64 bit integers.Arron
Well, another one! I bet you can't find the third! Thanks for your help improving this answer! :-)Strawworm
How does it answer the question though? Where does it say what the type of the integer constant 1 is? It is always of type int regardless of machine. It would also have been far more relevant to cite the table in 6.4.4 than this whole derailing of the topic.Cipango
@Cipango - This answer focuses on the storage and distinct properties of signed and unsiged, showing both signed and unsiged are having the same amount of storage. That being said, the signed and unsigned are two different data types. From the same document that you're citing - subsection 6.2.5, bullet 7: "The standard signed integer types and standard unsigned integer types are collectively called the standard integer types, .." which says int is an opaque data type for signed and unsigned types and I am pretty much sure that C compiler deals with them differently.Strawworm
@MohammadHMofrad The part you cite is about grouping of types, where one group of types have the same integer conversion rank. Nowhere does it say that "int is an opaque data type for signed and unsigned", you are making this up. int is always equivalent to signed int, see 6.2.5/4. It is true that signed int and unsigned int are different types. None of which has anything to do with the question which is about what type an integer constant ("literal") has. Which can be answered by reading/citing 6.4.4.Cipango
@Cipango I am not making something up here because the stuff I borrowed from the reference are quoted and you can find it in the 6.2.5 specification. Also, what I'm trying to say here is that the signed and unsigned types have same storage yet different conversion step. Of course, for a constant of type signed int compiler decays it to a smaller unsigned int while doing the conversion yet the range is smaller compare to the unsigned one which is also in my answer.Strawworm
What. Is. The. Type. Of. An. Integer literal. This is the question. Not "how big is an integer", that's a different question. Nowhere in your "answer" is the term integer literal mentioned, nor is the correct formal term integer constant. Just read the answer by Keith Thompson which does answer the question.Cipango
S
-1

There are good answers here but there is a bit of confusion about why and how the literal integers are handled.

As was pointed out in Keith Thompson's answer, a value like -1 is an expression, comprised of an operator (-) and an operand (1). The unary operator can be interpreted as 0 - 1, where the literal 0 is implied.

The type of the literal 1 is determined by best fit. The type of value 1 happens to be int but this is confusing because the value 1 is not negative. The reason for this is coincidental in this case. When the compiler determines the best fit, it evaluates the smallest data type that will accommodate the value. signed types have a higher precedence than unsigned types of the same size.

This is because a signed int for example, can only store positive values up to 2^31-1 (<= 0x7FFFFFFF), whereas an unsigned int can store positive values up to 2^32-1 (<= 0xFFFFFFFF). So even though the actual value of 1 is positive, it fits into a signed int, which has a higher priority.

If the expression were instead -2147483648 (0 - 0x80000000), then the type of the literal 2147483648 would be unsigned int because it does not fit into 31 bits but it does fit in 32 bits. The type of the entire expression 0 - 2147483648 would be signed long (not signed int), because the value -2147483648 requires 32 bits just for the unsigned value plus 1 more bit for the sign so it is promoted to a 64bit type and stored as 0xFFFFFFFF80000000 which is indeed a signed value (but this is not a literal, it is a computed intermediate value).

No negative integer literal values are actually stored, even though the type may be signed.

Slit answered 7/4 at 22:6 Comment(7)
Re “When the compiler determines the best fit, it evaluates the smallest data type that will accommodate the value”: The smallest and narrowest data type that will accommodate 1 is Bool. After that comes char, signed char, unsigned char, short, and unsigned short. None of these are the type of 1 because the C standard does not say the smallest data type is used. It says the first in a list it gives is used, and, for unsuffixed decimal constants, that list is int, long, long long, and then implementation extended types.Phyliciaphylis
Re “This is because a signed int for example, can only store positive values up to 2^31-1…”: This varies. It is okay to assume it for illustration in an answer, but that should be stated.Phyliciaphylis
Re “signed intunsigned int … it fits into a signed int, which has a higher priority”: unsigned int does not have a priority at all here. Per my comment above, it is not in the list for consideration in this case.Phyliciaphylis
Re “If the expression were instead -2147483648 (0 - 0x80000000), then the type of the literal 2147483648 would be unsigned int”: No, it would be long or long long, because unsigned int is not an option.Phyliciaphylis
Re “The type of the entire expression 0 - 2147483648 would be signed long (not signed int), because the value -2147483648 requires 32 bits just for the unsigned value plus 1 more bit for the sign so it is promoted to a 64bit type”: Well, it could be signed long, but not for the reasons given in that sentence. C does not determine the type of an expression by what is needed to represent the supposed mathematical result of the expression. It determines the type of each operand separately, and then, for -, it applies the usual arithmetic conversions.Phyliciaphylis
@EricPostpischil every one of your comments is completely pedantic, miss the entire point and in many cases, wrong. I'm not going to engage with this nonsense. Things you are disputing were already addressed in my answer. It's not my fault you can't grok that the the type of the literal '1' is an int, a signed int has a higher type priority and literal integers are never signed, do not contradict.Slit
My first and third comments are correct per C 2018 clause 6.4.4.1 paragraph 5. My second comment is correct per C 2018 5.2.4.2.1 1. The fourth also uses 6.4.4.1 5 but also 6.5.6 4. These matter because a programmer using the incorrect rules stated in this answer would in some circumstances write code that produced undesired results. Your best course of action might be to edit this answer to correct it.Phyliciaphylis

© 2022 - 2024 — McMap. All rights reserved.