Convert HTML page containing Arabic characters to PDF using FlyingSaucer
Asked Answered
G

1

9

I want to convert an HTML page that contains Arabic characters to a PDF file using FlyingSaucer, but the generated PDF does not contain combined characters and prints the output backwards.

HTML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    </head>

    <body style="font-size:15px;font-family: Arial Unicode MS;">

        <center  style="font-size: 18px; font-family: Arial Unicode MS;">
            <b>
                <i style="font-family: Arial Unicode MS;">
                    &#x062C;&#x0645;&#x064A;&#x0639; &#x0627;&#x0644;&#x062D;&#x0642;&#x0648;&#x0642;<br />
                </i>
            </b>
        </center>
    </body>
</html>

Java Excerpt:

String inputFile = "c:\\html.html";
        String url = new File(inputFile).toURI().toURL().toString();
        String outputFile = "c:\\html.pdf";
        OutputStream os = new FileOutputStream(outputFile);

        ITextRenderer renderer = new ITextRenderer();
        renderer.getFontResolver().addFont("c://ARIALUNI.TTF", BaseFont.IDENTITY_H,BaseFont.EMBEDDED);

        renderer.setDocument(url);
        renderer.layout();
        renderer.createPDF(os);
        os.close();

Actual PDF Result: actual result

Expected PDF Result: expected result

What can I do to obtain the right result?

Gastelum answered 2/11, 2014 at 16:54 Comment(4)
Actually you are trying to convert a canvas image to pdf ???Fattish
This looks like a flying-saucer bug to me. Arabic unicode characters are in a well-defined range, and are (obviously) known to be RTL (right to left). Clearly the browser is rendering RTL, but flying saucer is not. Report the bug to google.Spermicide
Did you have a solution for Arabic format?Oh
Thanks, I can handle it by creating image within canvas containing Arabic text, then when converting to pdf, there is image rather than text :) such like this example jsfiddle.net/amaan/WxmQR/1Oh
E
0

While I was working with Arabic font, I faced similar alignment issue. Arabic is an RTL Language. You need specific jars to generate PDFs in an RTL Language. Currently when you are trying to generate PDF, mode is normal LTR because of which you are getting current output.

Eggplant answered 14/8, 2015 at 6:21 Comment(2)
How to fix it??Oh
what about the case where the PDF is mixedGallego

© 2022 - 2024 — McMap. All rights reserved.