From cppreference isspace(): The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF
.
When *bp
is negative, for example it's -42
, then it is not representable as unsigned char
, because it's negative and unsigned char
, well, must be positive or zero.
On twos-complement systems values are sign extended to bigger "width", so then they will get left-most bits set. Then when you take 0xff
of the wider type, the left-most bits are cleared, and you end up with a positive value, lower or equal to 0xff
, I mean representable as unsigned char
.
Note that arguments to &
undergo implicit promotions, so the result of *bp
is converted to int
before even calling isspace
. Let's assume that *bp = -42
for example and assume a sane platform with 8-bit char that is signed and that int
has 32-bits, then:
*bp & 0xff # expand *bp = -42
(char)-42 & 0xff # apply promotion
(int)-42 & 0xff # lets convert to hex assuming twos-complement
(int)0xffffffd6 & 0xff # do & operation
(int)0xd6 # lets convert to decimal
214 # representable as unsigned char, all fine
Without the & 0xff
the negative value would result in undefined behavior.
I would recommend to prefer isspace((unsigned char)*bp)
.
Basically the simplest isspace
implementation looks like just:
static const char bigarray[257] = { 0,0,0,0,0,...1,0,1,0,... };
// note: EOF is -1
#define isspace(x) (bigarray[(x) + 1])
and in such case you can't pass for example -42
, cause bigarray[-41]
is just invalid.
isspace
, the behavior is undefined if the value of*bp
is not representable asunsigned char
and is not equal toEOF
.- so perhaps this is a fancy cast, instead of doing(unsigned char) *bp
. Isbp
achar*
? – Cob11111111
in binary, or all1
's for a single byte. This is useful if you only want a single byte of data, for example, instead of the entire value (which could be multiple bytes). For example, anint
may be 4 bytes, so if you only want the lowermost 1 byte you can simply doint_variable & 0xff
to get the value. – Henchmanbp
, discarding possible other bytes (by doing& 0xff
) then seeing if the result matches a whitespace character. – Henchmanbp
is achar*
- the formatting threw me off a bit. – Cobbp
and more to do with the type of the argument ofisspace
, which is anint
, which may be multiple bytes. – Henchmanisspace
etc. that I use says "To use these functions safely with plainchar
s (orsigned char
s), the argument should first be converted tounsigned char
" - so if thectype.h
function makes the assumption that it will get[-1, 255]
and has a simple lookup table, likestatic bool isspace[257] = { false, false ... };
then if you send in a negative value (except EOF), it could catch fire. – Cobunsigned char
since implementations are allowed to make the assumption that it'll get EOF or something representable as anunsigned char
. I think KamilCuk's answer explains it. – Cobsizeof ('a')
, for instance] Oh, and there is the signedness of char. – Igor'a'
is anint
in C and not achar
as in C++ but that wasn't the point. – Cobunsigned char
is the same in C and C++. – Cob