'charset=iso-8859-1' with <!DOCTYPE HTML> is throwing a warning
Asked Answered
G

6

10

I just validated an HTML document using the W3C validator, and found that if I use:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

with:

<!DOCTYPE HTML>
  • It throws a warning Line 4, Column 72: Using windows-1252 instead of the declared encoding iso-8859-1.

However, it is fixed if I use:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

I don't really understand what is happening. Also, I don't even know how to use the DOCTYPE tag, I just copied and pasted one from around the web.

  • Why does this happen?
  • How should I use the DOCTYPE tag?
Giffer answered 3/1, 2012 at 5:21 Comment(0)
G
7

A couple of points:

  1. Any HTML5 validation should be taken with a grain of salt. The specification is still under active development, and not everything is set in stone.
  2. You're using the HTML4 syntax for that meta tag. Try <meta charset="iso-8859-1">

That said, HTML validators don't serve that much purpose in this day and age.

But apparently the default for HTML4 was iso=8869-1. That said, the default charset for HTML5 is UTF-8.

More information about the HTML5 doctype can be found in this post by John Resig.

Gnarly answered 3/1, 2012 at 5:27 Comment(5)
utf-8 is the preferred charset for HTML5. You can find more information hereGnarly
What does this comment mean?: "HTML validators don't serve that much purpose in this day and age."Shellashellac
@Madmartigan Most browsers use different rules for interpreting HTML than the W3 validator does. For instance, put a <title> in the <body> instead of the <head>. No browser in the world will have problems with that! Yet the validator complains that you shouldn't do that. (That said, you really shouldn't do that, but you see what I mean. For browser intercompatibility, testing in many different browsers is more relevant than making sure your source passes the validator.)Compel
I totally disagree: The validator will catch things that can escape the naked eye. Invalid HTML is a great way to get unexpected, inconsistent behavior because each browser may handle it differently. Examples: unclosed or mismatched tags, invalid or broken attributes, quotes where they shouldn't be, unterminated entity strings, improper nesting, missing required attributes, etc. I'm not sure what point you're making with that example.Shellashellac
the way that html is parsed was standardized in 2007 as far as I knowMap
B
20

Changing the DOCTYPE is simply turning off the warning - it isn't actually fixing anything.

iso-8859-1 and windows-1252 are very similar encodings. They differ only in the characters associated with the 32 byte values from 0x80 to 0x9F, which in iso-8859-1 are mapped to control characters and in windows-1252 are mapped to some useful characters such as the Euro symbol.

The control characters are useless in HTML, and web authors often mistakenly declare iso-8859-1 and yet use one or more of those 32 values as if they were using windows-1252, so browsers when they see the iso-8859-1 charset being declared will automatically change this to be windows-1252.

The validator is simply warning you that this will happen. If you're not using any of the 32 byte values, then you can simply ignore the warning - it's not an error. If you are, and you genuinely want the iso-8859-1 interpretation of the byte values and not the windows-1252 interpretation, you are doing something wrong.

Again, this switching happens in browsers for any DOCTYPE, it's just that the HTML5 validator is being more helpful about what it is telling you than the HTML4 validator is.

Betjeman answered 3/1, 2012 at 7:36 Comment(0)
G
7

A couple of points:

  1. Any HTML5 validation should be taken with a grain of salt. The specification is still under active development, and not everything is set in stone.
  2. You're using the HTML4 syntax for that meta tag. Try <meta charset="iso-8859-1">

That said, HTML validators don't serve that much purpose in this day and age.

But apparently the default for HTML4 was iso=8869-1. That said, the default charset for HTML5 is UTF-8.

More information about the HTML5 doctype can be found in this post by John Resig.

Gnarly answered 3/1, 2012 at 5:27 Comment(5)
utf-8 is the preferred charset for HTML5. You can find more information hereGnarly
What does this comment mean?: "HTML validators don't serve that much purpose in this day and age."Shellashellac
@Madmartigan Most browsers use different rules for interpreting HTML than the W3 validator does. For instance, put a <title> in the <body> instead of the <head>. No browser in the world will have problems with that! Yet the validator complains that you shouldn't do that. (That said, you really shouldn't do that, but you see what I mean. For browser intercompatibility, testing in many different browsers is more relevant than making sure your source passes the validator.)Compel
I totally disagree: The validator will catch things that can escape the naked eye. Invalid HTML is a great way to get unexpected, inconsistent behavior because each browser may handle it differently. Examples: unclosed or mismatched tags, invalid or broken attributes, quotes where they shouldn't be, unterminated entity strings, improper nesting, missing required attributes, etc. I'm not sure what point you're making with that example.Shellashellac
the way that html is parsed was standardized in 2007 as far as I knowMap
C
3

It throws a warning Line 4, Column 72: Using windows-1252 instead of the declared encoding iso-8859-1.

It means the file was saved with the encoding Windows-1252 on creation (AKA Western Windows 1252 or CP1252) and your charset declaration says "hey read this file with ISO 8859-1" when that's not the encoding the file has.

The meta charset exist for that reason. It exist to declare the encoding of the file you are sending/reading/using so when, for example a browser, reads the document it knows what encoding the file is using.

In detail, you have this charset declared:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

But the file you are validating is actually encoded in Windows-1252. How? Why? Check the text editor you are using and what encoding it is using to save files. If the editor can be configured to change the encoding, choose the one you want to use.

About HTML5

Using

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

or

<meta charset="iso-8859-1">

are both valid for HTML5. See <meta charset="utf-8"> vs <meta http-equiv="Content-Type">

Cashandcarry answered 4/12, 2013 at 17:38 Comment(2)
The format of the <meta> tags is valid. But the value is not. For HTML5 the value must be utf-8. The charset attribute specifies the character encoding used by the document. This is a character encoding declaration. If the attribute is present, its value must be an ASCII case-insensitive match for the string "utf-8".HTML5 SpecKnighten
@Knighten read OP's question. This answer is in the context of the question, which is why it is using that charset. UTF-8 is the recommendation and "best practice" sure, but you are free to use another one as long as you comply to it, meaning that the file you are sending is actually saved in that charset. Read w3's answer on that w3.org/International/questions/…Cashandcarry
A
1

The W3C validator offers options for which encoding the validator uses. You have specified encoding in your document, so you should see "Encoding: iso-8859-1" in the top block of information once the validator has been run.

To the right of that, there is a pull-down menu. Change the choice from "(detect automatically)" to "iso-8859-1 (Western European)". The validator will then use ISO 8859-1 instead of its own choice, and you will not receive the error.

Adalard answered 5/1, 2013 at 12:38 Comment(0)
D
1

Do the following:

ISO 8859-15. Yeah, -15, and it will work.

Decalescence answered 25/4, 2015 at 21:33 Comment(2)
iso-8859-15 differs from iso-8859-1. If they were the same, only the first two rows would differ for iso-8859-15 and windows-1252 (but they differ outside 0x80..0x9f range too).Smoking
"ISO 8859-15" is not an action. However, the OP has left the building ("Last seen more than 6 years ago").Softspoken
P
0

Don't place too much stock in the validators. There are typically too many Internet Explorer workarounds, particularly in the CSS content, that will trip up the validator. If your pages work in all browsers and your client is happy, it doesn't matter what some validator says.

If you are specifying the HTML5 doctype, then you should be consistent with the meta charset attribute. Try this though for your pages:

<!DOCTYPE HTML>
<html>
<head>
<meta charset="UTF-8">
</head>

<body>
</body>
</html>
Pantheas answered 3/4, 2013 at 20:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.