UTF-8 html without BOM displays strange characters
Asked Answered
P

2

9

I have some HTML which contains some forign characters (€, ó, á). The HTML document is saved as UTF-8 without BOM. When I view the page in the browser the forign characters seem to get replaced with stranger character combinations (€, ó, Ã). It's only when I save my HTML document as UTF-8 with BOM that the characters then display properly.

I'd really rather not have to include a BOM in my files, but has anybody got any idea why it might do this? and a way to fix it? (other than including a BOM)

Prudhoe answered 1/3, 2012 at 15:6 Comment(4)
Does the HTML identify itself as UTF-8-encoded?Matron
Have you specified the charset in a meta tag?Incivility
Did you do the appropriate things with your server and meta tags to inform the browser that the content is, in fact, UTF-8?Tungting
Try using the standard character encodings utf8-chartable.deScission
H
18

You are probably not specifying the correct character set in your HTML file. The BOM (thanks @Jukka) sends the browser into UTF-.8 mode; in its absence, you need to use other means to declare the document UTF.8.

If you have access to your server configuration, you may want to make sure the server isn't sending the wrong character set info. See e.g. How to change the default encoding to UTF-8 for Apache?

If you have access only to your HTML, adding this meta tag in your document's head should do the trick:

<meta http-equiv='Content-Type' content='Type=text/html; charset=utf-8'>

or as @Mathias points out, the new HTML 5

<meta charset="utf-8"> 

(valid only if you use a HTML 5 doctype, against which there is no good argument any more even if you don't use HTML 5 markup.)

Hochheimer answered 1/3, 2012 at 15:8 Comment(5)
+1. Nowadays you can just use <meta charset="utf-8">. (If you’re not using the HTML5 DOCTYPE in 2012, you’re doing it wrong.)Subtract
The symptoms suggests that the browser tested actually listens to the BOM: apparently neither the server not the document itself declares the encoding, forcing the browser to infer or guess the encoding; and it seems that it then takes a BOM as indicating UTF-8, which makes perfect sense (and in the absence of BOM, the data is taken as iso-8859-1, windows-1252, or something similar, explaining the €, ó, à stuff).Delanty
@Jukka but he has a BOM and is getting € and such - that would mean the browser is not listening to the BOM, doesn't it? (Re-reading question...)Hochheimer
Ahh @Jukka I didn't read the question properly. Fixing, thanks.Hochheimer
@JukkaK.Korpela Usually this is because the server is configured to send everything out MIME-tagged as being in ISO-8859-1, no matter what’s actually in the file. If so, the <meta> won’t be enough to persuade it of the file encoding. For example in Apache, you need a AddDefaultCharset Off directive, which can go in the .htaccess file for that directory under most configurations.Gilman
S
2

Insert <meta charset="utf-8"> in <head>.
Or set the header Content-Type: text/html;charset=utf-8 on the server-side.

You can also do add in .htaccess: AddDefaultCharset UTF-8 more info here http://www.askapache.com/htaccess/setting-charset-in-htaccess.html

Sesquioxide answered 1/3, 2012 at 15:10 Comment(1)
Note that <meta charset="utf-8"> is for HTML5 only. For HTML4 and earlier, use <meta http-equiv-"Content-Type" content="text/html; charset=utf-8"> instead.Junina

© 2022 - 2024 — McMap. All rights reserved.