trouble in converting unicode template to pdf using xhtml2pdf

Asked 23/8, 2013 at 2:16 Answered 20/12, 2013 at 12:1

I have unicode used in my html page, which is displaying correctly in the html page. But while converting it into html using xhtml2pdf, it generating black, solid square boxes in the unicodes. Is there some setting for unicode other than UTF-8 setting. I dont think its unicode problem.

# convert HTML to PDF
pisaStatus = pisa.CreatePDF(
        StringIO(sourceHtml.encode('utf-8')),                 
        dest=resultFile)

Complete py code:

# -*- coding: utf-8 -*-

from xhtml2pdf import pisa
from StringIO import StringIO

source = """<html>
            <style>
                @font-face {
                font-family: Preeti;
                src: url("preeti.ttf");
                }

                body {
                font-family: Preeti;
                }
            </style>
            <body>
                This is a test <br/>
                       सरल
            </body>
        </html>"""

# Utility function
def convertHtmlToPdf(source):
    # open output file for writing (truncated binary)

    pdf = StringIO()
    pisaStatus = pisa.CreatePDF(StringIO(source.encode('utf-8')), pdf)

    # return True on success and False on errors
    print "Success: ", pisaStatus.err
    return pdf

# Main program
if __name__=="__main__":
    print pisa.showLogging()
    pdf = convertHtmlToPdf(source)
    fd = open("test.pdf", "w+b")
    fd.write(pdf.getvalue())
    fd.close()

generated pdf file

Do I even Need to include the font-face ??

Royce answered 23/8, 2013 at 2:16 Comment(2)

Is sourceHtml actually a unicode string? Because normally, an HTML file is in some 8-bit encoding, and calling encode('utf-8') on a str that's already UTF-8 (or, worse, something like Latin-1) isn't going to help. – Yalta 23/8, 2013 at 2:21

Yes, they are something like this in sourceHtml, u093e\u0917\u093f \u0924\u092f\u093e\u0930 : <br/>\n\t\t\t\n \n\t\t\t\n\t\t\t\t\u090 9\u092a\u092d\u094b\u0915\u094d\u0924\u093e \u092a\u0942\u0930\u094d\u0923 \u092a\u094d\u092f\u093e\ u0915\u0947\u091c\u0915\u094b \u0932\u093e\u0917\u093f \u0924\u092f\u093e\u0930 : <br/>\n\t\t\t\n – Royce 23/8, 2013 at 2:29

Its partially solved. Providing the absolute path to the font i.e.

    <style>
        @font-face {
        font-family: Preeti;
        src: url("c:/static/fonts/preeti.ttf");
        }

        body {
        font-family: Preeti;
        }
    </style>

Now another problem has raised. I have mixed texts, partially in unicode and partially in normal Font(I think I should say it normal fonts :D), since fonts have been overridden, now the normal Fonts are coming in rectangular boxes. In this case a empty box.

Royce answered 25/8, 2013 at 17:57 Comment(2)

Did you solved this problem, i need to include both hindi and english in PDF file i tried using Mangal (hindi font) but only one of them is displayed at a time – Bystreet 11/10, 2016 at 15:43

this worked for me on xhtml2pdf v0.2.3 with Django template rendering system @Bystreet – Jetta 24/2, 2019 at 10:1

A little late answer but I think that it is important to know why relative paths do not work in fontface for xhtml2pdf:

The CreatePDF function (which is the same with the pisaDocument method as can be seen in https://github.com/chrisglass/xhtml2pdf/blob/master/xhtml2pdf/pisa.py) has a path named parameter. Now, if you don't set this parameter and use a relative path then it will try to find your fonts under a folder named __dummy__ as can be seen on the file https://github.com/chrisglass/xhtml2pdf/blob/master/xhtml2pdf/context.py (search for dummy).

So, that's why your .ttf files only work when you use absolute paths.

To resolve this, you can either:

create a __dummy__ folder and put your .ttf files there, or
pass a value to the path named parameter of CreatePDF

For example, in my case, I am creating PDFs through django, so I passed path='.' and put my .ttf in the same folder as my manage.py -- everything is working fine. Of a better solution would be to define SETTINGS.PROJECT_PATH and use that.

Moldboard answered 20/12, 2013 at 12:1 Comment(0)

From the documentation, it looks like you're supposed to give CreatePDF an encoding, otherwise "this is guessed by the HTML5 parser".

So, say the HTML file's headers specify whatever legacy charset was used for Devanagari. You decode that properly to Unicode somewhere before the code you've shown us, then re-encode it as UTF-8, but the headers are specifying a different charset. In that case, html5lib will guess the wrong charset, and interpret the characters incorrectly and give you mojibake.

Of course I can't be sure that's exactly the problem you're facing without a complete example, but it's likely something like that. And the most likely solution is the same for any of them: If you encode to UTF-8, tell the converter to use UTF-8 instead of guessing:

pisaStatus = pisa.CreatePDF(
    StringIO(sourceHtml.encode('utf-8')),                 
    dest=resultFile,
    encoding='utf-8')

Yalta answered 23/8, 2013 at 18:5 Comment(0)

I had a black box character in my pdf when converting html to pdf with xhtml2pdf and pisa. Turns out I had a BOM (byte-order mark) character in the document.

The BOM can be removed by doing 'save as' in most text editors. In UltraEdit, I did Save As... and selected type UTF-8 (NO BOM).

See: How do I remove the BOM character from my xml file

Patinous answered 11/11, 2013 at 20:4 Comment(1)

From what the author of the question was asking, all the Unicode characters were failing. Not just the first character. – Falster 18/11, 2014 at 22:47

Complete py code:

Recommended topics

Hot tags