Django/Python: generate pdf with the proper language
Asked Answered
S

4

18

I use Pisa/xhtml2pdf in my Django apps to generate pdf from an HTML source. That is:

  1. I generate the HTML file formatted with all 'printing' stuffs (e.g. page-breaks, header, footer, etc.)
  2. I convert this HTML into pdf using Pisa

This process is ok but it is slow (expecially when dealing with long tables) and I must use HTML/CSS according to Pisa features/limitations.

The question is: is this the right way to generate pdf from a web application (i.e. create HTML and then convert it to pdf) or there is a more direct way, that is "write" the pdf with a more suitable language?

Stitt answered 19/12, 2012 at 11:0 Comment(1)
Followup: For pixel-perfect reports we have decided to swicth to a dedicated report designer (Pentaho)Stitt
S
13

WeasyPrint author here. The point of using HTML/CSS to generate PDF (vs. using a lower-level PDF library directly.) is to get automatic layout. It lets you specify high-level constraints like h1 { page-break-after: avoid } and let the layout engine figure it out, rather than specifying the absolute position of everything. The former is much more maintainable when you make changes to your documents.

Some tools like rst2pdf have their own stylesheet syntax, but that’s just a bad way of re-inventing CSS.

But yes, dumping complex stylesheets made for screen might not give great results. It’s better to build the stylesheets with print in mind, or even use completely different stylesheets with @media print in CSS or <link media="print"> in HTML.

Slier answered 19/12, 2012 at 16:26 Comment(0)
H
7

I think generating a pdf from html with libraries like Pisa or http://weasyprint.org/ is the simplest approach. because it takes care of inserting images, css, barcode (on pisa) ... etc

If you want to write the pdf yourself take a look at Reportlab but it will take much longer to implement. In both cases i suggest to always generate the pdf in the background with celery or python-rq for optimization.

Hancock answered 19/12, 2012 at 13:3 Comment(0)
B
4

Pisa is known having various issues - especially with long tables. In general one should avoid using PISA. Other options are:

  • using Reportlab directly
  • z3c.rml (Reportlab template language clone)
  • commercial alternatives:
    • PrinceXML
    • PDFreactor

The general rule when it comes to PDF production: you get what you pay for.

Converters like Pisa or Apache FOP are half-baked solutions that work for simple cases but suck in general.

Brummell answered 19/12, 2012 at 11:20 Comment(5)
Thanks. But do you think that converting HTML to pdf is a good approach?Stitt
Strange followup question...a good approach for doing what? What else do you have as input data?Brummell
I mean: I have to extract data from my app and put it into pdf; the HTML is just a medium between data and pdf. I wonder if I can write directly the pdf or use a different medium.Stitt
Maybe my English is not good enough. Which of your solutions do you suggest: Reportlab (i.e. from Python to Pdf) or Pisa-like "html to pdf" converters (i.e. from Python to HTML to Pdf)?Stitt
Latest xhtml2pdf from github is behaving properly with long tables and now also has pdftotalpages count, also is not documented, check google group. Using it in production for long invoices, after Latex nightmare. Latex is faster anyway but has also limitations, like memory that will make you move to lualatex. There also about 8 packages for creating long tables. Back to good old HTML boy. (It's a joke cause Latex is older, but not in my experience) More: before Latex I implemented pdf generation using libreoffice headless, which randomly dies.Damnation
W
3

You can also use the QT webkit rendering engine to create PDFs from HTML with http://code.google.com/p/wkhtmltopdf/ and django-wkhtmltopdf.

The advantage is that you can write the HTML and CSS as you would normally for WebKit. This works well if you are outputting an existing web page but may be less appropriate if generating PDFs from scratch.

Walter answered 20/12, 2012 at 7:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.