Is char signed or unsigned by default?
Asked Answered
P

6

206

In the book "Complete Reference of C" it is mentioned that char is by default unsigned.

But I am trying to verify this with GCC as well as Visual Studio. It is taking it as signed by default.

Which one is correct?

Polysynthetic answered 13/1, 2010 at 6:47 Comment(5)
The one C reference book I trust is Harbison & Steele's "C: A Reference Manual" (careferencemanual.com). Of course the standard is the final word, but it's not very readable and only gives the slightest information on pre-standard and common (ie., POSIX) uses that are outside the standard. Harbison & Steele is quite readable, detailed and probably more correct than most references. However it also isn't a tutorial, so if you're in the initial stages of learning it's probably not a great thing to jump into.Genevagenevan
I think the book you are reading is C: The Complete Reference, by Herbert Schildt. From a review of this book (accu.informika.ru/accu/bookreviews/public/reviews/c/c002173.htm): I am not going to recommend this book (too many of you give too much weight to my opinions) but I do not think it deserves the same opprobrium that has been legitimately thrown at some of his other work. As Michael says, a much better reference is Harbison & Steele.Barrier
My two cents here: Because char can be unsigned, as a rule of thumb use an int to read a value using getchar(), which might return EOF. EOF is usually defined as -1 or other negative value, which storing in an unsigned is not what you want. Here's the declaration: extern int getchar(); BTW, this recommendation comes also from "C: A Reference Manual" book.Cruzcruzado
The one C reference I trust is ISO/IEC 9899:2011 :-)Aldas
@MaxChetrusca good advice but bad rationale: even on the signed char case, you'd have to use int to store the return value.Loireatlantique
B
266

The book is wrong. The standard does not specify if plain char is signed or unsigned.

In fact, the standard defines three distinct types: char, signed char, and unsigned char. If you #include <limits.h> and then look at CHAR_MIN, you can find out if plain char is signed or unsigned (if CHAR_MIN is less than 0 or equal to 0), but even then, the three types are distinct as far as the standard is concerned.

Do note that char is special in this way. If you declare a variable as int it is 100% equivalent to declaring it as signed int. This is always true for all compilers and architectures.

Barrier answered 13/1, 2010 at 6:48 Comment(14)
@Alok: the same is not true for some other datatypes, for example int means signed int always, right? Apart from char, what other datatypes have the same confusion in C?Combustion
@eSKay: yes, char is the only type that can be signed or unsigned. int is equivalent to signed int for example.Barrier
There is a hysterical, er, historical reason for this -- early in the life of C the "standard" was flip-flopped at least twice, and some popular early compilers ended up one way and others the other.Gentes
@AlokSinghal: It's also implementatin-defined whether a bit field of type int is signed or unsigned.Rosalba
@KeithThompson thanks for the correction. I tend to forget some details about bit field types since I don't use them much.Barrier
I wonder why ANSI has yet to define any standard means by which code can say things like "within this region, I want the compiler to either regard char as unsigned or refuse compilation if it can't do that"? I understand that the standard must allow for the existence of different dialects of C, but if there's no standard way to say that whether 0xFFFF+1 should yield 0u or 65536, then I would posit that such an expression should be considered meaningless in "standard C".Forespent
@supercat: #if CHAR_MIN < 0 ... #error "Plain char is signed" ... #endifRosalba
@KeithThompson: That would cause a program to refuse compilation on any compiler whose default behavior is to use signed characters, without regard for whether the compiler has a means of switching between signed and unsigned character types.Forespent
@KeithThompson: The way I would like to see such directives, the expectation would be that a good compiler should attempt to support any options the platform could support in practical fashion. The fact that a processor has 32-bit registers shouldn't make it impossible to run code which expects that 0xFFFF+1 equals 0, nor code that expects that 0xFFFFFFFF+1 equals 0x100000000, provided that code which expects such things is marked to let the compiler know of its expectations.Forespent
@ChrisChiasson: I would think it unlikely that even a compiler where character-signedness is configurable would be able to automatically configure itself based upon such a static assertion, much less confine such behavior to a designated region of the code.Forespent
@ChrisChiasson: If there were a defined directive which said "Within this region I want the compiler to either regard char as unsigned or refuse compilation of it can't", compilers would be allowed to treat such directives as static assertions, but could also use them to instead control behavior. If the term "implementation-defined" can stretch far enough to let a compiler define a sequence of configuration settings it will try in sequence to see if any will work, a compiler could use "static assert" as a configuration directive, but that seems really hokey and unlikely.Forespent
Unlike in C++, in C, they are not three distinct types, I believe. In C there are two distinct types: signed char and unsigned char; and char is just an alias for one of those. Standard just does not say which one. Therefore in C++ you need explicit casting between them, in C you do not.Masterson
@Masterson in C, they are not three distinct types In C, char, 'signed char', and unsigned char are distinct types: "The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char."Engelhart
(cont) Footnote 45 adds: "CHAR_MIN, defined in <limits.h>, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the other two and is not compatible with either."Engelhart
O
81

As Alok points out, the standard leaves that up to the implementation.

For gcc, the default is signed, but you can modify that with -funsigned-char. note: for gcc in Android NDK, the default is unsigned. You can also explicitly ask for signed characters with -fsigned-char.

On MSVC, the default is signed but you can modify that with /J.

Odd answered 13/1, 2010 at 6:51 Comment(8)
Interesting that Schildt's description doesn't match MSVC's behavior since his books are usually geared toward MSVC users. I wonder if MS changed the default at some point?Genevagenevan
I thought it wasn't dependent on the compiler, but on the platform. I thought char was left as a third type of "character datatype" to conform to what the systems at that time used as printable characters.Earthwork
GCC docs say it's machine-dependent: "Each kind of machine has a default for what char should be. It is either like unsigned char by default or like signed char by default."Transcription
Can you please provide a source for your note that on android the default is unsigned char?Pignut
@Earthwork the C standard makes no real distinction between compilers, platforms and CPU architectures. It just lumps them all together under "implementation".Rateable
@Transcription So, the part "For gcc, the default is signed" in this answer is wrong?Piraeus
With GCC, char is (typically) signed on x86/x86_64 but unsigned on ARM.Inland
I found the NDK caveat the hard way, thanks!Kendalkendall
G
40

C99 N1256 draft 6.2.5/15 "Types" has this to say about the signed-ness of type char:

The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.

and in a footnote:

CHAR_MIN, defined in <limits.h>, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the other two and is not compatible with either.

Genevagenevan answered 13/1, 2010 at 7:9 Comment(0)
H
10

According to The C Programming Language book by Dennis Ritchie which is the de-facto standard book for ANSI C, plain chars either signed or unsigned are machine dependent, but printable characters are always positive.

Hexad answered 31/7, 2012 at 8:36 Comment(1)
It's not necessarily the case that printable characters are always positive. The C standard guarantees that all members of the basic execution character set have non-negative values.Rosalba
R
10

According to the C standard the signedness of plain char is "implementation defined".

In general implementors chose whichever was more efficient to implement on their architecture. On x86 systems char is generally signed. On arm systems it is generally unsigned (Apple iOS is an exception).

Rateable answered 23/11, 2016 at 18:55 Comment(4)
why unsigned types are more efficent in ARM?, is char signed or unsigned by default on iOS?Dosia
@Rateable Your answer was probably downvoted because Tim Post lost his keys. Seriously though, you shouldn't worry about a single downvote as long as you're sure your answer is correct (which it is in this case). It's happened to me several times to have my posts downvoted for no valid reason. Don't worry about it, sometimes people just do odd things.Sestina
Why is signed char more efficient on x86? Any sources?Hinshelwood
@Hinshelwood Necropost but: I don’t think signed char is more efficient as such on x86, but it’s also not less efficient than unsigned. Reasons for picking it might also include consistency with other integer types defaulting to signed, and maybe signed types sometimes leading to better optimisation due to signed overflow being undefined behaviour (i.e., compiler can assume it won’t overflow).Sulphur
B
4

Now, we known the standard leaves that up to the implementation.

But how to check a type is signed or unsigned, such as char?

I wrote a macro to do this:

#define IS_UNSIGNED(t) ((t)~1 > 0)

and test it with gcc, clang, and cl. But I do not sure it's always safe for other cases.

Bindle answered 20/12, 2018 at 8:57 Comment(2)
What is wrong with usual CHAR_MIN < 0 (or WCHAR_MIN < 0 for wchar_t)?Footage
This builds on the assumption that signed integers are represented in two's complement. Although, this almost always holds, some systems may use one's complement, where all bits set to one means negative zero, which equals positive zero, and your macro returns the wrong answer.Watkin

© 2022 - 2024 — McMap. All rights reserved.