Why is the size of a character sizeof('a') different in C and C++? [duplicate]
Asked Answered
S

4

347
#include <stdio.h>
int main(void)
{
    printf("sizeof(char) = %zu\n", sizeof(char));
    printf("sizeof('a')  = %zu\n", sizeof('a'));
}

See https://godbolt.org/z/1eThqvMhx

When running this code in C, it prints

sizeof(char) = 1
sizeof('a')  = 4

When running this code in C++, it prints

sizeof(char) = 1
sizeof('a')  = 1

Why does the output differ between languages? What is the size of a character in C and C++? As far as I know, the size of char is 1 byte in both C and C++.

Stole answered 31/1, 2010 at 19:15 Comment(0)
T
406

In C, the type of a character constant like 'a' is actually an int, with size of 4 (or some other implementation-dependent value). In C++, the type is char, with size of 1. This is one of many small differences between the two languages.

Taveda answered 31/1, 2010 at 19:17 Comment(11)
Okay, Can you please specify the standard reference ? :)Stole
This is pretty much a generic answer given the above code, :PArlaarlan
In the C++ Standard it's section 2.13.2/1, in C 6.4.4.4, at least in the doc I've got.Taveda
+1 (Except that, while the "size of 4" obviously applies to nthrgeek's platform, it doesn't necessarily apply to all platforms.)Nureyev
@nthrgeek: I'm too lazy to quote both standards, but the C++ standard has an appendix dedicated to incompatibilities with C. Under Appendix C.1.1, it mentions that "Type of character literal is changed from int to char, which explains the behavior. :)Corcyra
It makes sense that C and C++ would have this difference. C++ is much more strongly typed than C.Spore
@nthrgeek: §6.4.4.4, paragraph 10: "An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer."Stale
@Omnifarious: It's especially needed in C++ for overloading: void foo(int); void foo(char); That's not an issue in C.Nureyev
@nthrgeek: You should not be asking for a standard reference unless you are having an argument about a specific point and you want to understand why the other person has a different opinion. If everybody agrees just accept it. You (as a developer) should be quite intelligent enough to quickly find common answer like this all by yourself.Commensal
What about sizeof and types of unicode characters? sizeof('Ç') == 4 and std::is_same<int, decltype('Ç')>() == true is what I get from g++ while in UTF-8 (which is what I use) Ç should take only two bytes.Tatum
The answer can be improved. sizeof(char)==1 in C++, and 1 means 1 byte in C++. 1 byte does not necessary mean 8 bits though. I know that the answer is already old, but it would be great to add this: [intro.memory] The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set (2.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation definedTorruella
I
33

As Paul stated, it's because 'a' is an int in C but a char in C++.

I cover that specific difference between C and C++ in something I wrote a few years ago, at: http://david.tribble.com/text/cdiffs.htm

Izolaiztaccihuatl answered 13/8, 2012 at 19:42 Comment(2)
Just curious, but are you working on updating that (very detailed) doc to include the new changes in C++11 and C11?Orlosky
Not at the moment. My interest in C and C++ has waned a lot in the last five years or so.Izolaiztaccihuatl
H
23

In C the type of character literals are int and char in C++. This is in C++ required to support function overloading. See this example:

void foo(char c)
{
    puts("char");
}
void foo(int i)
{
    puts("int");
}
int main()
{
    foo('i');
    return 0;
}

Output:

char
Hypotrachelium answered 5/1, 2015 at 16:33 Comment(2)
Interesting detail not mentioned in the other answers. Thanks.Almaraz
best explanation, should be the accepted answer.Giavani
M
8

In C language, character literal is not a char type. C considers character literal as integer. So, there is no difference between sizeof('a') and sizeof(1).

So, the sizeof character literal is equal to sizeof integer in C.

In C++ language, character literal is type of char. The cppreference say's:

1) narrow character literal or ordinary character literal, e.g. 'a' or '\n' or '\13'. Such literal has type char and the value equal to the representation of c-char in the execution character set. If c-char is not representable as a single byte in the execution character set, the literal has type int and implementation-defined value.

So, in C++ character literal is a type of char. so, size of character literal in C++ is one byte.

Alos, In your programs, you have used wrong format specifier for sizeof operator.

C11 §7.21.6.1 (P9) :

If a conversion specification is invalid, the behavior is undefined.275) If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.

So, you should use %zu format specifier instead of %d, otherwise it is undefined behaviour in C.

Mayberry answered 1/11, 2017 at 10:41 Comment(2)
%zu is not supported on many platforms, but better portability, use (int)sizeof(char) and format %dUnalienable
The value of character literals is not necessarily the corresponding ASCII code. It depends on the source and execution character sets and whether the char type is signed or unsigned by default.Unalienable

© 2022 - 2024 — McMap. All rights reserved.