server-side includes and character encoding
Asked Answered
R

3

4

I created a static website in which each page has the following structure:

  1. Common stuff like header, menu, etc.
  2. Page specific stuff in main content div
  3. Footer

In this website, all the common content is duplicated in each page. In order to improve the maintainability I refactored the pages to use server-side includes (SSI) so that the common content is not duplicated. The structure of each page is now

  1. SSI for Common stuff like header, menu, etc.
  2. Page specific stuff in main content div
  3. SSI for footer

In the refactored site, for some reason the French characters no longer display properly in the page-specific content area, though they display fine in the content included via SSIs.

The included header specifies the character set as:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

If I open one of the main content pages in a browser it tells me that the character encoding is ISO-8859-1. I've tried adding a .htaccess file to the folder with the lines

AddDefaultCharset UTF-8
AddCharset UTF-8 .shtml
AddCharset UTF-8 .html

But still those pesky French accents aren't displaying properly on the version of the site that uses SSIs.

Rage answered 12/2, 2009 at 1:39 Comment(1)
The link to the "refactored site" no longer works, but I suspect there was no BOM (Byte Order Mark) provided at the beginning of it. There is one in the original site. Or at least cURL shows me that familiar ´╗┐<!DOCTYPE html>Circumambient
G
3

You are serving your pages as UTF-8, which is good, but at least some of the page is being dragged in from files which are not actually saved as UTF-8. SSI just throws the raw bytes in, it doesn't attempt to recode the includes so that their charsets match the file they're being included into.

You need to go through all your html and include files in a text editor and make sure each one is saved as UTF-8.

As John mentioned, you can avoid encoding issues by using character references for all non-ASCII characters, but it's a tremendous pain.

Gershom answered 12/2, 2009 at 2:46 Comment(5)
Thanks for the suggestion. In Eclipse (the editor I use), I changed the file encoding of all files to UTF8, but the result is still the same. Is there a way I can check whether Eclipse did actually change the encoding correctly?Tuner
Try loading the files (even just as text) into a web browser, setting View->Character Encoding to ‘UTF-8’ and seeing if the accents display correctly. Even Notepad can do it, at a pinch, so I'd be surprised if Eclipse couldn't!Gershom
Is it the text in the ‘test/index.html’ file that comes out wrong, or in the includes? Have you tried dropping a ‘.htaccess’ file in the folder, containing the line ‘AddDefaultCharset UTF-8’? Currently it is served as plain ‘text/html’ — not that it matters with the <meta> in place, but still.Gershom
The accents in the included files display correctly. It's the accents in the files that do the including that don't work. I tried adding a .htaccess but still no joyTuner
+1 as this solved an identical SSI problem where I had the page encoded as utf-8, but hadn't saved the included component files as utf-8, so thanks!Cyzicus
S
0

Your HTML document is using UTF-8 encoding, try these character codes for your accented letters: http://www.tony-franks.co.uk/UTF-8.htm

Supertonic answered 12/2, 2009 at 1:46 Comment(2)
But why does this only happen when using SSIs? I'm using UTF-8 in the non-SSI version and the accented letters display fine.Tuner
Have you tried adding "AddCharset UTF-8 .shtml" to your http.conf file? I don't know if this will work or not but it's worth a try (assuming you're including .shtml files).Supertonic
C
0

I had the same problem as you and finally found a solution that fixed it.

UTF8 makes an extra line on my site

Save all your files as UTF-8 without BOM (http://en.wikipedia.org/wiki/Byte_order_mark).

Commensal answered 29/5, 2011 at 22:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.