In order to define charset for HTML5 Doctype, which notation should I use?
Short:
<meta charset="utf-8" />
Long:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
In order to define charset for HTML5 Doctype, which notation should I use?
Short:
<meta charset="utf-8" />
Long:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
In HTML5, they are equivalent. Use the shorter one, as it is easier to remember and type. Browser support is fine since it was designed for backwards compatibility.
<meta charset='utf-8'>
work in IE6? –
Theologue <meta>
to set the character encoding disables the lookahead downloader in IE8, which can impact your page load times. Yeah, yeah, I know... drop IE8. @MészárosLajos can come back here in a couple of years and bust our balls for still supporting IE8. ;-) –
Eagan charset
in meta tag important?, where it is used?, or what is the advantage of charset
in html –
Overwhelm Both forms of the meta charset declaration are equivalent and should work the same across browsers. But, there are a few things you need to remember when declaring your web files character-set as UTF-8:
Apache servers are configured to serve files in ISO-8859-1 by default, so you need to add the following line to your .htaccess
file:
AddDefaultCharset UTF-8
This will configure Apache to serve your files declaring UTF-8 encoding in the Content-Type response header, but your files must be saved in UTF-8 (without BOM) to begin with.
Notepad cannot save your files in UTF-8 without the BOM. A free editor that can is Notepad++. On the program menu bar, select "Encoding > Encode in UTF-8 without BOM". You can also open files and re-save them in UTF-8 using "Encoding > Convert to UTF-8 without BOM".
More on the Byte Order Mark (BOM) at Wikipedia.
meta
and an HTTP header. You just need one of BOM, meta
or HTTP header. –
Kane URIEncoding="utf-8"
to each connector. –
Fecund Why do you say UTF-8 HTML should be without a BOM
Indeed, the absence of the BOM is the very reason you would need an HTTP header or meta tag in the first place. –
Effluent Summing up: don't use BOM for UTF-8
I can't agree with this. The BOM in UTF-8 is very useful for signaling the encoding type. Otherwise we have to guess, or use things like the meta tags this question refers to. The cool thing about the BOM is that it is part of the Unicode spec and thus can be used for all data encoded in Unicode, not just HTML. What we should do is use BOMs everywhere, let legacy software blow up on it, report those bugs and get them fixed. –
Effluent Another reason to go with the short one is that it matches other instances where you might specify a character set in markup. For example:
<script type="javascript" charset="UTF-8" src="/script.js"></script>
<p><a charset="UTF-8" href="http://example.com/">Example Site</a></p>
Consistency helps to reduce errors and make code more readable.
Note that the charset attribute is case-insensitive. You can use UTF-8 or utf-8, however UTF-8 is clearer, more readable, more accurate.
Also, there is absolutely no reason at all to use any value other than UTF-8 in the meta charset attribute or page header. UTF-8 is the default encoding for Web documents since HTML4 in 1999 and the only practical way to make modern Web pages.
Also you should not use HTML entities in UTF-8. Characters like the copyright symbol should be typed directly. The only entities you should use are for the five reserved markup characters: less than, greater than, ampersand, prime, double prime.
Entities need an HTML parser, which you may not always want to use going forward. They introduce errors, make your code less readable, increase your file sizes, and sometimes decode incorrectly in various browsers depending on which entities you used. Learn how to type/insert copyright, trademark, open quote, close quote, apostrophe, em dash, en dash, bullet, Euro, and any other characters you encounter in your content, and use those actual characters in your code.
The Mac has a Character Viewer that you can turn on in the Keyboard System Preference, and you can find and then drag and drop the characters you need, or use the matching Keyboard Viewer to see which keys to type. For example, trademark is Option + 2. UTF-8 contains all of the characters and symbols from every written human language.
So there is no excuse for using -- instead of an em dash. It is not a bad idea to learn the rules of punctuation and typography also ... for example, knowing that a period goes inside a close quote, not outside.
Using a <meta> tag for something like content-type and encoding is highly ironic, since without knowing those things, you couldn't parse the file to get the value of the meta tag.
No, that is not true. The browser starts out parsing the file as the browser's default encoding, either UTF-8 or ISO-8859-1. Since US-ASCII is a subset of both ISO-8859-1 and UTF-8, the browser can read <html><head> just fine either way ... it is the same. When the browser encounters the meta charset tag, if the encoding is different than what the browser is already using, the browser reloads the page in the specified encoding.
That is why we put the meta charset tag at the top, right after the head tag, before anything else, even the title. That way you can use UTF-8 characters in your title.
You must save your file(s) in UTF-8 encoding without BOM
That is not strictly true. If you only have US-ASCII characters in your document, you can Save it as US-ASCII and serve it as UTF-8, because it is a subset. But if there are Unicode characters, you are correct, you must Save as UTF-8 without BOM.
If you want a good text editor that will save your files in UTF-8, I recommend Notepad++.
On the Mac, use Bare Bones TextWrangler (free) from Mac App Store, or Bare Bones BBEdit which is at Mac App Store for $39.99 ... very cheap for such a great tool.
In either app, there is a menu at the bottom of the document window where you specify the document encoding and you can easily choose "UTF-8 no BOM". And of course you can set that as the default for new documents in Preferences.
But if your Webserver serves the encoding in the HTTP header, which is recommended, both [meta tags] are needless.
That is incorrect. You should of course set the encoding in the HTTP header, but you should also set it in the meta charset attribute so that the page can be saved by the user, out of the browser onto local storage and then opened again later, in which case the only indication of the encoding that will be present is the meta charset attribute.
You should also set a base tag for the same reason ... on the server, the base tag is unnecessary, but when opened from local storage, the base tag enables the page to work as if it is on the server, with all the assets in place and so on, no broken links.
AddDefaultCharset UTF-8
Or you can just change the encoding of particular file types like so:
AddType text/html;charset=utf-8 html
A tip for serving both UTF-8 and Latin-1 (ISO-8859-1) files is to give the UTF-8 files a "text" extension and Latin-1 files "txt."
AddType text/plain;charset=iso-8859-1 txt
AddType text/plain;charset=utf-8 text
Finally, consider saving your documents with Unix line endings, not legacy DOS or (classic) Mac line endings, which don't help and may hurt, especially down the line as we get further and further from those legacy systems.
An HTML document with valid HTML5, UTF-8 encoding, and Unix line endings is a job well done. You can share and edit and store and read and recover and rely on that document in many contexts. It's lingua franca. It's digital paper.
"You should also set a base tag..."
should come with the caveats described here. –
Hurtado 
than the default glyph, or some strange character I don't recognize. –
Veronaveronese <meta charset="utf-8">
was introduced with/for HTML5.
As mentioned in the documentation, both are valid. However, <meta charset="utf-8">
is only for HTML5 (and easier to type/remember).
In due time, the old style is bound to become deprecated in the near future. I'd stick to the new <meta charset="utf-8">
.
There's only one way, but up. In tech's case, that's phasing out the old (really, REALLY fast)
Documentation: HTML meta charset Attribute—W3Schools
While not contesting the other answers, I think the following is worthy of mentioning.
http-equiv
) notation and the “short” one are equal. Whichever comes first wins;<meta>
tags;You can test by running echo 'HTTP/1.1 200 OK\r\nContent-type: text/html; charset=windows-1251\r\n\r\n\xef\xbb\xbf<!DOCTYPE html><html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta charset="windows-1251"><title>привет</title></head><body>привет</body></html>' | nc -lp 4500
and pointing your browser at localhost:4500
. (Of course you will want to change or remove parts. The BOM part is \xef\xbb\xbf
. Be wary of the encoding of your shell.)
Please mind that it's very important that you explicitly declare the encoding. Letting browsers guess can lead to security issues.
UTF-7
from what I remember. Also sniffing on the web is generally bad, e.g. when you upload an image something which is sniffed as script content. –
Perpetual Use <meta charset="utf-8" />
for web browsers when using HTML5.
Use <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
when using HTML4 or XHTML, or for outdated DOM parsers, like DOMDocument
in PHP 5.3.
To embed a signature in an email, I would use the long version:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
The reason is that not many email readers use HTML5, so it's always better use old HTML styles. Actually, it's better to use tables than divs + CSS as well.
There is some news based on Mozilla Foundation, and SitePoint:
Do not use this value (
http-equiv=content-type
) as it is obsolete. Prefer thecharset
attribute on the <meta
> element.
encoding
parameter right away. –
Aseptic © 2022 - 2024 — McMap. All rights reserved.
Content-Type
response header will be used. The meta tag is only used when the page is loaded from local disk file system. – Sheereehttp-equiv
version. – ExclamationDOMDocument
does not get the long one :( – Treacherous