python3: UTF-8 encoding in http.server
Asked Answered
P

2

12

I have encoding problems when serving a simple web page in python3, using BaseHTTPRequestHandler.

Here is a working example:

#!/usr/bin/python3
# -*- coding: utf-8 -*

from http.server import BaseHTTPRequestHandler, HTTPServer
from os import curdir, sep, remove
import cgi

HTML_FILE_NAME = 'test.html'
PORT_NUMBER = 8080

# This class will handles any incoming request from the browser
class myHandler(BaseHTTPRequestHandler):

    # Handler for the GET requests
    def do_GET(self):
        self.path = HTML_FILE_NAME
        try:
            with open(curdir + sep + self.path, 'r') as f:
                self.send_response(200)
                self.send_header('Content-type', 'text/html')
                self.end_headers()
                self.wfile.write(bytes(f.read(), 'UTF-8'))
            return
        except IOError:
            self.send_error(404, 'File Not Found: %s' % self.path)

try:
    # Create a web server and define the handler to manage the incoming request
    with open(HTML_FILE_NAME, 'w') as f:
        f.write('<!DOCTYPE html><html><body> <p> My name is Jérôme </p> </body></html>')
    print('Started httpserver on port %i.' % PORT_NUMBER)

    #Wait forever for incoming http requests
    HTTPServer(('', PORT_NUMBER), myHandler).serve_forever()

except KeyboardInterrupt:
    print('Interrupted by the user - shutting down the web server.')
    server.socket.close()
    remove(HTML_FILE_NAME)

The expected result is to serve a web page displaying My name is Jérôme.

Instead, I have: My name is Jérôme

As you can see, the html page is correctly encoded, with self.wfile.write(bytes(f.read(), 'UTF-8')), so I think the problem comes from the web server.

How to tell the web server to serve the page in UTF-8?

Publus answered 3/6, 2016 at 10:35 Comment(0)
P
11

No problem if I add:

<meta content="text/html;charset=utf-8" http-equiv="Content-Type">
<meta content="utf-8" http-equiv="encoding">

in my html head.

Publus answered 6/6, 2016 at 14:32 Comment(1)
use header is better for instance source code like .js can't have <meta tag>Sheathe
I
11

Your web server is already sending the text encoded to UTF-8 but you need to tell your browser the encoding of the bytes it receives. The HTTP spec. declares ISO-8995-1 as the default.

The HTTP standard way of doing is this is to tag the Content-type header value with a charset sub-key.

Therefore, you should change your code to read:

self.send_header('Content-type', 'text/html; charset=utf-8')

Also, watch out for the encoding of your HTML file. Without an encoding given to open(), it'll be guessed based on your locale. This won't break anything, unless you end up running this script where the locale is C, POSIX or non-latin Windows.

Iorgos answered 6/6, 2016 at 13:41 Comment(1)
the *Also hint saved my day, I added a reference :)Heyduck
P
11

No problem if I add:

<meta content="text/html;charset=utf-8" http-equiv="Content-Type">
<meta content="utf-8" http-equiv="encoding">

in my html head.

Publus answered 6/6, 2016 at 14:32 Comment(1)
use header is better for instance source code like .js can't have <meta tag>Sheathe

© 2022 - 2024 — McMap. All rights reserved.