Bulk template based pdf generation in PHP using pdftk

Asked 29/8, 2012 at 9:33 Answered 7/9, 2012 at 5:27

I am doing a bulk generation of pdf files based on templates and I ran into big performance issues pretty fast. My current scenario is as follows:

get data to be filled from db
create fdf based on single data row and pdf form
write .fdf file to disk
merge the pdf with fdf using pdftk (fill_form with flatten command)
continue iterating over rows until all .pdf's are generated
all the generated files are merged together in the end and the single pdf is given to the client

I use passthru to give the raw output to the client (saves time writing file), but this is just a little performance improvements. The total operation time is about 50 seconds for 200 records and I would like to get down to at least 10 seconds in some way.

The ideal scenario would be operating all these pdfs in memory and not writing every single one of them to separate file but then the output would be impossible to do as I can't pass that kind of data to external tool like pdftk. One other idea was to generate one big .fdf file with all those rows, but it looks like that is not allowed.

Am I missing something very trivial here?

I'm thanksfull for any advice.

PS. I know I could use some good library like pdflib but I am considering only open licensed libraries now.

EDIT:

I am up to figuring out the syntax to build an .fdf file with multiple pages using the same pdf as a template, spent few hours and couldn't find any good documentation.

Townes answered 29/8, 2012 at 9:33 Comment(3)

can you use some profiling tool like xdebug with webgrind and see what thing is actually taking time (and then you can resolve that), i fell in the very same situation few days back, was using a open source queuing system which was taking time in my case. I was using dompdf which is again open source solution. – Swope 31/8, 2012 at 12:21

I did the profiling and the main thing that takes a lot of time is writing separate pdfs over and over. – Townes 31/8, 2012 at 13:8

Why not just run 6 or 7 pdftk conversions in parallel that should take your total time down to your 10 second threashold. – Braddy 7/9, 2012 at 14:38

After beeing faced with the same problem for a long time (wanted to generate my pdfs based on LaTeX) i finally decided to switch to another crude but effective technique:

i generate my pdfs in two steps: first i generate html with a template engine like twig or smarty. second i use mpdf to generate pdfs out of it. I tryed many other html2pdf frameworks and ended up using mpdf, it's very mature and is developed since a long time (frequent updates, rich functionality). the benefit using this technique: you can use css to design your documents (mpdf completely features css) - which comes along with the css benefit (http://www.csszengarden.com) and generate dynamic tables very easy.

Mpdf parses the html tables and looks for the theader, tfooter element and puts it on each page if your tables are bigger than one page size. Also you have the possibility to define page header and page footer elements with dynamic entities like page nr and so on.

i know, using this detour seems to be a workaround, but to be honest, no latex, pdf whatever engine is as strong and simple as html!

Galloway answered 7/9, 2012 at 5:27 Comment(0)

Try a different less complex library like fpdf (http://www.fpdf.org/)

I find it quite good and lite.

Always find libraries that are small and only do what you need them to do.

The bigger the library the more resources it consumes.

Tenia answered 1/9, 2012 at 18:26 Comment(2)

Yeah, I've seen and used those libraries (fpdf, tcpdf, dompdf), but they don't do what I need to. I have to generate a big PDF based on one PDF template and fill it out with data. None of these libraries do this. They are good for creating own pdf from scratch, but not for templating. – Townes 1/9, 2012 at 20:56

Then you should consider generating them one by one in different PHP calls to ensure the memory is cleared after each generation. Make sure even when you generate that you clear any large variables. Maybe even a non PHP solution. – Tenia 1/9, 2012 at 22:44

This won't help your multiple-page problem, but I notice that pdftk accepts the - character to mean 'read from standard input'.

You may be able to send the .fdf to the pdftk process via it's stdin, in order to avoid having to write them to disk.

Otten answered 1/9, 2012 at 21:31 Comment(0)

Recommended topics

Hot tags