Why putchar, toupper, tolower, etc. take a int instead of a char?

Asked 3/7, 2013 at 16:4 Answered 3/7, 2013 at 17:48

In C, strings are arrays of char (char *) and characters are usually stored in char. I noticed that some functions from the libC are taking as argument integers instead of a char.

For instance, let's take the functions toupper() and tolower() that both use int. The man page says:

If c is not an unsigned char value, or EOF, the behavior of these functions is undefined.

My guess is that with a int, toupper and tolower are able to deal with unsigned char and EOF. But in fact EOF is in practice (is there any rule about its value?) a value that can be stored with a char, and since those functions won't transform EOF into something else, I'm wondering why toupper does not simply take a char as argument.

In any case why do we need to accept something that is not a character (such as EOF)? Could someone provide me a relevant use case?

This is similar with fputc or putchar, that also take a int that is converted into an unsigned char anyway.

I am looking for the precise motivations for that choice. I want to be convinced, I don't want to answer that I don't know if someone ask me one day.

Issiah answered 3/7, 2013 at 16:4 Comment(3)

I do not know any rule that EOF must fit into a char, and I can assure you that char is not guaranteed to be signed, which makes your discussion about using char instead of unsigned char seem the wrong one. You mean signed char throughout. – Opt 3/7, 2013 at 16:9

Did you look at implementations of these functions. I think int parameter is for optimization purposes, because its byte size well fit to sizes of the processor registers. In turn, one byte char variable must be converted to int behind the curtain, and this operation need some processor time to process. – Kentonkentucky 3/7, 2013 at 16:18

Read this: Definition of EOF and how to use it effectively – Dividend 3/7, 2013 at 16:25

C11 7.4

The header <ctype.h> declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

C11 7.21.1

EOF

which expands to an integer constant expression, with type int and a negative value, ...

The C standard explicitly states that EOF is always an int with negative value. And furthermore, the signedness of the default char type is implementation-defined, so it may be unsigned and not able to store a negative value:

C11 6.2.5

If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative. If any other character is stored in a char object, the resulting value is implementation-defined but shall be within the range of values that can be represented in that type.

Kt answered 3/7, 2013 at 16:28 Comment(2)

One of my question is: why a function that converts a letter into another should accept something that is not a letter? (EOF included) – Receptionist 7/7, 2013 at 13:9

The other answer has answered your question (why a function like tolower() needs to accept something that is not a letter, e.g. EOF) -- because in the idiomatic case like tolower(ch = getchar()), ch, which is an int, might well be EOF, and thus it is desirable that tolower can accept EOF. – Fetus 17/12, 2019 at 13:1

BITD a coding method included:

/* example */
int GetDecimal() {
  int sum = 0;
  int ch;
  while (isdigit(ch = getchar())) { /* isdigit(EOF) return 0 */
    sum *= 10;
    sum += ch - '0';
    }
  ungetc(ch, stdin);  /* If c is EOF, operation fails and the input stream is unchanged. */
  return sum;
}

ch with the value of EOF then could be used in various functions like isalpha() , tolower().

This style caused problems with putchar(EOF) which I suspect did the same as putchar(255).

The method is discouraged today for various reasons. Various models like the following are preferred.

int GetDecimal() {
  int ch;
  while (((ch = getchar()) != EOF)) && isdigit(ch)) {
    ...
  }
  ...
}

Headman answered 3/7, 2013 at 17:48 Comment(2)

I've +1 your answer for the example of use. But as you said, it is discouraged so why isdigit should accept something that is not a character? – Receptionist 3/7, 2013 at 22:24

@Maxime I am certain it is historical that isdigit() accepts -1. Conceptually, it not hard to think of EOF as another char. The isthis...() functions are often implemented with an 256 byte array, making a 257 byte array to also accept EOF (-1) is trivial. Since its easy and makes for tighter code, back when tight code was of higher value, its a good extension. Now, with code maintenance of every increasing value, this idiom has lost favor. – Headman 7/7, 2013 at 22:50

If c is not an unsigned char value, or EOF, the behavior of these functions is undefined.

But EOF is a negative int in C and some platforms (hi ARM!) have char the same as unsigned char.

Wood answered 3/7, 2013 at 16:12 Comment(1)

Yes, but in the places where it matters, the C standard says "unsigned char or EOF". – Carrasco 4/7, 2013 at 0:48

Recommended topics

Hot tags