Pisa pdf converter is very slow with large tables
Asked Answered
E

1

9

I'm using Pisa to convert HTML to PDF (in a Django project). It is very slow when handling tables that span over multiple pages:

a 200-rows table takes up to 150 seconds to be converted, while it takes 15 seconds if I split it into smaller tables.

Are there tips or best practices for building HTML tables to be handled by Pisa?

Each answered 29/9, 2011 at 9:4 Comment(8)
it might not help but have you looked at wkhtmltopdf?Lajuanalake
Maybe I will have a look on it if no other solution is possible: I've chosen Pisa because of its integration with Django...Each
FWIW I make some documents that are table based but only about a page of contents that load almost instantly but the reports I make that have close to 1300 rows and span 50 pages take almost 30sec. Which fork are you using? I may try ChrisGlass/Xhtml2pdf to see if it's improved over the non-maintained original versionArtwork
Thanks, I think I'm using the last version, but I'll check your links. My 200-rows table takes about 150 seconds to be converted. If I split it into small tables, the same document takes 15 seconds.Each
I know this thread is a bit old by now, but if you are still looking for some sort of improvement, I started using django-webodt, the provided link is straight to their table syntax. Using open office templates has also proved to be easy and more robust in terms of formatting than the limited css support provided by pisa. The same ~50 page documents I mentioned before now take <10sec to load instead of >1minArtwork
oh and if you try it, make sure you change the stylized, directional apostrophe and quotes to ' and ". It's somewhere in the open office settings.Artwork
did you test reportlab directly?Forthwith
@Efazati: no, I only tried PisaEach
G
8

I had the same problem. The document was just a front page and a huge table. PDF rendering time was increasing exponentially with the size of my content table.

I made a checklist of things to check out which might be the problem

I did simple timing on my PDF rendering function (since it could be the HTML rendering, passing it to StringIO, or creating the HTTP response), and noticed that the pisa.pisaDocument call did take 60 seconds to return. I did a checklist of things that might be the problem, and worked on them each. The checklist included Images, CSS, Markup complexity, and Frames.

Images barely affected the rendering time (I only had one per page, so YMMV). Neither did Frames.

Markup complexity was the main problem of my template. Apparently pisa will render several columns in a table very, very slowly

The table was taking too much time to render, but I noticed that if I split the table into smaller tables, the rendering time didn't increase exponentially anymore, and the time it took to render everything was cut in half. I used the below code in my Django template:

    {% if forloop.counter|divisibleby:20 %}</table><table>{% endif %}

edit: This fix does not work well with repeating table headers so if you're doing repeat="1" you have to know exactly how many rows to fit in each page.

Also, I had this monster of a selector in my CSS:

    html, body, div, span, applet, object, iframe,
    h1, h2, h3, h4, h5, h6, p, blockquote, pre,
    a, abbr, acronym, address, big, cite, code,
    del, dfn, em, img, ins, kbd, q, s, samp,
    small, strike, strong, sub, sup, tt, var,
    b, u, i, center,
    dl, dt, dd, ol, ul, li,
    fieldset, form, label, legend,
    table, caption, tbody, tfoot, thead, tr, th, td,
    article, aside, canvas, details, embed,
    figure, figcaption, footer, header, hgroup,
    menu, nav, output, ruby, section, summary,
    time, mark, audio, video{
        ...
    }

By changing it to * {...} the rendering sped up a bit. This was counter-intuitive since browsers will not render your page as fast when you use the * selector than when you are using the above monster.

Also, for some reason, merging two in-page <style> tags into one tag decreased rendering time, too.

Gambier answered 29/9, 2011 at 9:4 Comment(4)
I also ended up in splitting my tables, but there are cases in which I cannot tell in advance how many rows fit a single page.Each
For my case, it doesn't matter. I just have to pray that nobody tells me to repeat table headers, because this fix doesn't work with repeat="1"Gironde
Breaking up the table gave me a ~3x speed increase. I think the slowness might be due to an issue within Reportlab - groups.google.com/forum/#!topic/xhtml2pdf/vUoq1IRauvgSurveillance
Yes, I'd wager it's a problem with ReportlabGironde

© 2022 - 2024 — McMap. All rights reserved.