Why is char neither signed or unsigned, but wchar_t is?
Asked Answered
A

3

9

The following C++ program compiles without errors:

void f(char){}
void f(signed char){}
void f(unsigned char){}
int main(){}  

The wchar_t version of the same program does not:

void f(wchar_t){}
void f(signed wchar_t){}
void f(unsigned wchar_t){}
int main(){}

error: redefinition of ‘void f(wchar_t)’
void f(signed wchar_t){}

It seems that wchar_t is unsigned.
Why is there an inconsistency in overloading?

Accordion answered 30/9, 2015 at 1:49 Comment(7)
This doesn't look like legal C to me. Is it C++?Dehypnotize
@ArlieStephens Yeah, better delete the C tag. C++ without any headers being needed.Accordion
@John3136 C++. Built in typeAccordion
wchar_t is allowed to be signed or unsigned per [basic.fundamental].Instrument
@Instrument I see. So wchar_t is either, and char is neither.Accordion
wchar_t differs from char in that char may be a signed type or an unsigned type while still being a type that is not equivalent to signed char and unsigned char, unlike wchar_t which may be a signed or unsigned type with no way to specify a separate signed wchar_t or unsigned wchar_t type.Weakling
Your conclusion is incorrect. It's telling you that wchar_t and signed wchar_t are the same. Interesting that it doesn't make the same complaint about char.Leoni
L
11

The chars are all distinct types and can be overloaded

[basic.fundamental] / 1

[...] Plain char, signed char, and unsigned char are three distinct types, collectively called narrow character types. [...]

wchar_t is also a distinct type, but it cannot be qualified with signed or unsigned, which can only be used with the standard integer types.

[dcl.type] / 2

As a general rule, at most one type-specifier is allowed in the complete decl-specifier-seq of a declaration or in a type-specifier-seq or trailing-type-specifier-seq. The only exceptions to this rule are the following:

[...]

signed or unsigned can be combined with char, long, short, or int.

[dcl.type.simple] / 2

[...] Table 9 summarizes the valid combinations of simple-type-specifiers and the types they specify.

enter image description here

The signedness of wchar_t is implementation defined:

[basic.fundamental] / 5

[...] Type wchar_t shall have the same size, signedness, and alignment requirements (3.11) as one of the other integral types, called its underlying type.

Lorylose answered 30/9, 2015 at 2:1 Comment(2)
I wonder why my compiler let's me get away with "unsigned wchar_t" if it's not a valid qualifier. I certainly wouldn't be able to do something like "unsigned bool"...Accordion
@TrevorHickey File a bug report :)Lorylose
S
4

char is a distinct type from both signed char and unsigned char. wchar_t is yet another distinct type (for type identity purposes), but which has exactly the same properties (size, signedness and alignment) as some other integral type.

From ISO 14882:2003, 3.9.1:

Plain char, signed char, and unsigned char are three distinct types.

(...)

Type wchar_t is a distinct type whose values can represent distinct codes for all members of the largest extended character set specified among the supported locales (22.1.1). Type wchar_t shall have the same size, signedness, and alignment requirements (3.9) as one of the other integral types, called its underlying type.

There is no such thing as signed wchar_t or unsigned wchar_t. It is not mentioned anywhere in the document.

Scarabaeid answered 30/9, 2015 at 2:3 Comment(0)
U
4

char is a fundamental type. wchar_t evolved as first a library solution (in C), and then became a built in type with an underlying type, corresponding to the type that earlier was used to typedef it:

C++11 $3.9.1/5

Type wchar_t shall have the same size, signedness, and alignment requirements (3.11) as one of the other integral types, called its underlying type.

This explains why you cannot change the signedness of wchar_t, but it does not explain why there is a char type with unspecified signedness.


Also, the choice of signed char that most compilers default to, is impractical for several reasons. One reason is that the negative values are annoying and generally have to be cast to unsigned in order to compare them. Another reason is that the C character classification functions require non-negative values (except when being passed EOF). A third reason is that on old magnitude-and-sign or one's complement machines there's one unusable value.

There may be some explanation of that in Stroustrup's “The design and evolution of C++”, but I doubt it.

It sounds like frozen history, something that at one point made some kind of sense, for the technology at the time.

Ureter answered 30/9, 2015 at 2:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.