An encoding is a mapping between bytes and characters from a character set, so it will be helpful to discuss and understand the difference between between bytes and characters.
Think of bytes as numbers between 0 and 255, whereas characters are abstract things like "a", "1", "$" and "Ä". The set of all characters that are available is called a character set.
Each character has a sequence of one or more bytes that are used to represent it; however, the exact number and value of the bytes depends on the encoding used and there are many different encodings.
Most encodings are based on an old character set and encoding called ASCII which is a single byte per character (actually, only 7 bits) and contains 128 characters including a lot of the common characters used in US English.
For example, here are 6 characters in the ASCII character set that are represented by the values 60 to 65.
Extract of ASCII Table 60-65
╔══════╦══════════════╗
║ Byte ║ Character ║
╠══════╬══════════════║
║ 60 ║ < ║
║ 61 ║ = ║
║ 62 ║ > ║
║ 63 ║ ? ║
║ 64 ║ @ ║
║ 65 ║ A ║
╚══════╩══════════════╝
In the full ASCII set, the lowest value used is zero and the highest is 127 (both of these are hidden control characters).
However, once you start needing more characters than the basic ASCII provides (for example, letters with accents, currency symbols, graphic symbols, etc.), ASCII is not suitable and you need something more extensive. You need more characters (a different character set) and you need a different encoding as 128 characters is not enough to fit all the characters in. Some encodings offer one byte (256 characters) or up to six bytes.
Over time a lot of encodings have been created. In the Windows world, there is CP1252, or ISO-8859-1, whereas Linux users tend to favour UTF-8. Java uses UTF-16 natively.
One sequence of byte values for a character in one encoding might stand for a completely different character in another encoding, or might even be invalid.
For example, in ISO 8859-1, â is represented by one byte of value 226
, whereas in UTF-8 it is two bytes: 195, 162
. However, in ISO 8859-1, 195, 162
would be two characters, Ã, ¢.
When computers store data about characters internally or transmit it to another system, they store or send bytes. Imagine a system opening a file or receiving message sees the bytes 195, 162
. How does it know what characters these are?
In order for the system to interpret those bytes as actual characters (and so display them or convert them to another encoding), it needs to know the encoding used. That is why encoding appears in XML headers or can be specified in a text editor. It tells the system the mapping between bytes and characters.