What's the proper technical term for "high ascii" characters?

Asked 2/10, 2009 at 17:12 Answered 4/7, 2017 at 6:8

Solved character-encoding terminology ascii character extended-ascii

What is the technically correct way of referring to "high ascii" or "extended ascii" characters? I don't just mean the range of 128-255, but any character beyond the 0-127 scope.

Often they're called diacritics, accented letters, sometimes casually referred to as "national" or non-English characters, but these names are either imprecise or they cover only a subset of the possible characters.

What correct, precise term that will programmers immediately recognize? And what would be the best English term to use when speaking to a non-technical audience?

Reentry answered 2/10, 2009 at 17:12 Comment(1)

I was trying to be concise, but perhaps I should have explained why I asked. I am a translator, my job is software localization. Often (still!) I encounter bugs where only those "national", "extended" characters in my language are garbled on display, usually because a wrong codepage was applied at some point. Therefore I need a term to refer to those specific characters, so that I don't always have to resort to a descriptive sentence, if possible. My audience are programmers, engineers and managers, for whom English isn't always their native tongue. – Lissalissak 2/10, 2009 at 21:5

"Non-ASCII characters"

Frowzy answered 2/10, 2009 at 17:24 Comment(1)

It seems definition by negation is the best we can do. As soon as we add "Unicode", the term won't be applicable in non-Unicode contexts, etc. I liked sgm's idea of "trans-ascii", but a fresh coinage won't cut it, especially when communicating across languages. – Lissalissak 2/10, 2009 at 20:55

ASCII character codes above 127 are not defined. many differ equipment and software suppliers developed their own character set for the value 128-255. Some chose drawing symbols, sone choose accent characters, other choose other characters.

Unicode is an attempt to make a universal set of character codes which includes the characters used in most languages. This includes not only the traditional western alphabets, but Cyrillic, Arabic, Greek, and even a large set of characters from Chinese, Japanese and Korean, as well as many other language both modern and ancient.

There are several implementations of Unicode. One of the most popular if UTF-8. A major reason for that popularity is that it is backwards compatible with ASCII, character codes 0 to 127 are the same for both ASCII and UTF-8.

That means it is better to say that ASCII is a subset of UTF-8. Characters code 128 and above are not ASCII. They can be UTF-8 (or other Unicode) or they can be a custom implementation by a hardware or software supplier.

Grata answered 2/10, 2009 at 18:1 Comment(2)

The UTFs are not "implementations" of Unicode. They are encodings of Unicode text into bytestrings. Unicode text is represented as a sequence of numbers (not ints or longs, numbers), and the UTFs are ways of translating each number into a sequence of one or more bytes. – Stambul 2/10, 2009 at 19:56

Jim, thank you, but I am more or less aware of what those are :) I was only looking for a precise name. – Lissalissak 2/10, 2009 at 20:50

You could coin a term like “trans-ASCII,” “supra-ASCII,” “ultra-ASCII” etc. Actually, “meta-ASCII” would be even nicer since it alludes to the meta bit.

Paleozoic answered 2/10, 2009 at 17:44 Comment(1)

I like "trans-ascii" and I think it correctly expresses the idea, but I am primarily looking for a good term to communicate the concept. Using a self-coined term may not do that :) – Lissalissak 2/10, 2009 at 20:53

"Extended ASCII" is the term I'd use, meaning "characters beyond the original 0-127".

Unicode is one possible set of Extended ASCII characters, and is quite, quite large.

UTF-8 is the way to represent Unicode characters that is backwards-compatible with the original ASCII.

Stomodaeum answered 2/10, 2009 at 17:25 Comment(4)

My thought was "extended ascii" would only refer to 128-255. Anything that cannot be expressed in that range isn't really ascii any more :) – Lissalissak 2/10, 2009 at 17:53

Note also (from wikipedia) that the use of the term 'extended ASCII' has been criticized, because it can be mistaken for an extension of the ASCII standard. – Extrajudicial 27/5, 2010 at 5:53

@thomasrutter; if you're going to alter my answer that much in an edit, please just post a different answer, and/or leave a comment here at least? – Stomodaeum 27/5, 2010 at 13:43

Gee, I was just trying to be helpful. I've rolled everything back. – Extrajudicial 29/5, 2010 at 6:52

A bit sequence that doesn't represent an ASCII character is not definitively a Unicode character.

Depending on the character encoding you're using, it could be either:

an invalid bit sequence
a Unicode character
an ISO-8859-x character
a Microsoft 1252 character
a character in some other character encoding
a bug, binary data, etc

The one definition that would fit all of these situations is:

Not an ASCII character

To be highly pedantic, even "a non-ASCII character" wouldn't precisely fit all of these situations, because sometimes a bit sequence outside this range may be simply an invalid bit sequence, and not a character at all.

Extrajudicial answered 27/5, 2010 at 5:56 Comment(0)

Taken words from an online resource (Cool website though) because I found it useful and appropriate to write and answer.

At first only included capital letters and numbers , but in 1967 was added the lowercase letters and some control characters, forming what is known as US-ASCII, ie the characters 0 through 127. So with this set of only 128 characters was published in 1967 as standard, containing all you need to write in English language.

In 1981, IBM developed an extension of 8-bit ASCII code, called "code page 437", in this version were replaced some obsolete control characters for graphic characters. Also 128 characters were added , with new symbols, signs, graphics and latin letters, all punctuation signs and characters needed to write texts in other languages, such as Spanish. In this way was added the ASCII characters ranging from 128 to 255.

IBM includes support for this code page in the hardware of its model 5150, known as "IBM-PC", considered the first personal computer. The operating system of this model, the "MS-DOS" also used this extended ASCII code.

Leapfrog answered 4/7, 2017 at 6:8 Comment(0)

-1

Non-ASCII Unicode characters.

Groundsel answered 2/10, 2009 at 17:16 Comment(4)

This is incorrect. Unicode has nothing to do with ASCII, except for being backwards compatible for the first 127 code points. – Higher 2/10, 2009 at 18:4

That's the point. All of the Unicode characters that don't have ASCII equivalents. – Groundsel 2/10, 2009 at 18:10

@Dervin: just as values over 127 have nothing to do with ASCII. – Spearing 9/3, 2010 at 13:50

A character outside of the ASCII range is not a Unicode character. It's a character outside of the ASCII range. Depending on the character encoding you're using, it's either: an invalid bit sequence; a Unicode character, an ISO-8859-x character, a Microsoft 1252 character, or a character in some other character encoding. – Extrajudicial 27/5, 2010 at 5:55

-1

If you say "High ASCII", you are by definition in the range 128-255 decimal. ASCII itself is defined as a one-byte (actually 7-bit) character representation; the use of the high bit to allow for non-English characters happened later and gave rise to the Code Pages that defined particular characters represented by particular values. Any multibyte (> 255 decimal value) is not ASCII.

Fine answered 2/10, 2009 at 19:51 Comment(0)

Recommended topics

Hot tags