Unicode chars are converted to broken symbols when I use wkhtmltopdf
Asked Answered
M

4

5

I have HTML that contains some Unicode characters, and saved in "UTF-8" to disk. I can use less to display it, all characters displayed well:

<h1>什么是Action?</h1>
<p>Play程序接收到的大部分请求,都是由<code>Action</code>来处理的。

But when I use "wkhtmltopdf" to convert it to PDF, it shows broken characters:

broken unicode

My command is:

wkhtmltopdf --encoding utf-8 book.html book.pdf

How to fix this?

Membrane answered 12/7, 2012 at 7:24 Comment(0)
M
17

Finally I found the reason: I don't have unicode fonts in my ubuntu server.

I upload some truetype fonts from my local ubuntu to the server, everything works fine.

freewind@freewind:/usr/share/fonts$ cd truetype/
freewind@freewind:/usr/share/fonts/truetype$ ls
arphic             ttf-dejavu               ttf-lao
freefont           ttf-devanagari-fonts     ttf-liberation
kochi              ttf-gujarati-fonts       ttf-malayalam-fonts
msttcorefonts      ttf-indic-fonts-core     ttf-oriya-fonts
openoffice         ttf-japanese-gothic.ttf  ttf-punjabi-fonts
sazanami           ttf-japanese-mincho.ttf  ttf-tamil-fonts
takao              ttf-kacst-one            ttf-telugu-fonts
thai               ttf-kannada-fonts        unfonts
ttf-bengali-fonts  ttf-khmeros-core         wqy

I simply upload them all, it fix this problem, although I don't know which font is the key.

Membrane answered 12/7, 2012 at 10:17 Comment(5)
As a last resort, you could use the Code 2000 font that has plenty of glyphs -- one of the more complete Unicode font out there. en.wikipedia.org/wiki/Code2000Cimah
Accept this as the answer as it fixed the problem. I struggled with this for a while and it was because I was writing (source html) files without a proper encoding new StreamWriter(this.path, false, System.Text.Encoding.UTF8) - there are lots of different reasons why this might fail :)Delapaz
I have also solved this the same way but on centos. Just copied Arial.ttf from my local mac in /Library/Fonts to my remote server to /usr/share/fonts/local (created the local dir myself). then fc-cache -v to update and it workedConall
I am having same problem to convert website odialanguage.com. I have tried by importing fonts also. Can you please help to identify what I am missingHevesy
Installing the Code 2000 font did the trick for me.Liuka
W
3

I was having this problem too. Turned out, the HTML file had a meta tag that was setting the wrong charset. Eg the HTML file had

<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<style>

and the issue was resolved when I switched it to instead utf-8 for the charset, like so:

<head>
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<style>
Weems answered 28/8, 2018 at 4:57 Comment(1)
yes this fixed it for me too, I had simply not defined the charsetDumah
S
0

Try wkhtmltopdf-i386 book.html book.pdf

Shutout answered 12/7, 2012 at 9:23 Comment(0)
G
0

If you are on a MS Windows machine (the above answer is for X Windows font server), the following worked for me:

  1. You can use YaHei or SimSun with wkhtmltoimage.

  2. Explicitly set content using Chinese letters to the new font-family in your style:

    .smsnotification_chinese {
        font-size: 30px;    
        font-family: "Microsoft Yahei", SimSun;
    }
    

    This will work on stock US Windows machines. There is a more robust description of font fallbacks described here for others: Chinese Standard Web Fonts: A Guide to CSS Font Family Declarations for Web Design in Simplified Chinese.

  3. Note: The wkhtmltoimage binary does not work on Azure worker machines due to GDI+ sandbox restrictions. You can get around this by writing your own web service wrapper or using this free wrapper: Convert HTML to PDF in .Net on Azure

Greenish answered 13/5, 2016 at 2:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.