Why is isascii() deprecated?

Asked 25/9, 2014 at 5:6 Answered 15/1, 2024 at 11:4

According to the isascii() manpage:

http://linux.die.net/man/3/isascii

POSIX.1-2008 marks isascii() as obsolete, noting that it cannot be used portably in a localized application.

I'm not sure I see where the portability problem is. A very simple implementation of this function is:

int isascii(int ch) { return ch >= 0 && ch < 128; }

In which situations is the above implementation either not sufficient or not portable?

Thank you

Ignatius answered 25/9, 2014 at 5:6 Comment(3)

@SaiyamDoshi: does this do anything different? – Aires 25/9, 2014 at 5:28

Whether that implementation is sufficient and portable depends on what you want to use this function for. What do you have in mind? – Cancroid 25/9, 2014 at 5:55

The function is definitely not portable to systems not using ASCII encoding for characters, like IBM mainframes. – Calicut 25/9, 2014 at 15:53

I suppose it would not work if you have a character encoding that does not use the low seven-bit range exclusively for ASCII. Probably happens in some multibyte encodings, when the given byte is only part of the character.

For example, in Shift-JIS, the second byte can start at 0x40, which overlaps with ASCII. And even in the first byte, there are some slight alterations, such as 0x5C (currency symbol instead of backslash) or 0x7E (some sort of slash instead of tilde).

I found this article where someone explained the reason behind the non-inclusion of POSIX functions in their own OS design:

This function is rather pointless. If we use a character encoding that wasn't ascii compatible, then it doesn't make sense. If we use a sane character encoding such as UTF-8, then you can simply check if the value is at most 127.

Aires answered 25/9, 2014 at 5:19 Comment(1)

i think he's asking for specific encoding in which it would be true, and i can't think of any. the encodings that i know of overlap the initial 128 values. – Emu 25/9, 2014 at 5:22

In which situations is the above implementation either not sufficient or not portable?

When any EBCDIC character set is in use.

There are EBCDIC code pages where non-ASCII characters have a value between 0 and 127, for example the SPS character has value 0x09 in EBCDIC, but is not an ASCII character (it corresponds to the U+0085 Unicode code point which is encoded as 0xC2 0x85 in UTF-8, and is outside the ASCII range).

There are also ASCII characters that have a value greater than 127 in EBCDIC, such as all the alphanumeric characters! See https://en.wikipedia.org/wiki/EBCDIC#Code_page_layout which shows that all of a-z, A-Z and 0-9 are above 127. EBCDIC was always an 8-bit encoding, so the basic alnum chars did not need to be in the low 7 bits.

So for a system using EBCDIC your implementation would give isascii('\u0085') as true, whereas a system using UTF-8 or most other encodings that overlap with ASCII can't even represent \u0085 in a single char. More problematically, for EBCDIC your function gives isascii('a') as false, and isascii('0') etc.

Live demo: https://godbolt.org/z/evK3ErErT

Dagger answered 15/1, 2024 at 11:4 Comment(0)

-2

The meeting minutes have this to say:

isascii: mark obsolete. Application Usage should note that this cannot be used portably in a localized application.

Agaric answered 25/9, 2014 at 5:37 Comment(2)

Which doesn't really tell us anything, does it? He's asking why it cannot be used portably in a localized application. – Sundew 25/9, 2014 at 6:31

It's still good to have someone cite the original source of the decision. This might be the best we can get from posix as to the original reasoning. – Ignatius 25/9, 2014 at 13:30

Recommended topics

Hot tags