How to bulk generate PDF from HTML templates ready for double-sided printing in PHP?
Asked Answered
D

1

10

I've been struggling with this for a while and feel helpless. Prestashop uses tcpdf to generate invoices and delivery slips from HTML templates filled using Smarty. We are working on updating the invoice design and found tcpdf to be lacking in CSS support. After some research we settled for wkhtmltopdf as the right tool from converting the HTML/CSS templates to PDF.

The problem

The store has a feature for exporting multiple invoices into a single PDF. Using TCPDF I was able to make the batch file ready for double sided printing by inserting a blank page after each invoice that had odd number of pages before the file was generated. But now that we switched to wkhtmltopdf I cannot achieve the same result.

The crucial problem is that while wkhtmltopdf allows for the usage of multiple HTML templates there seems to be no reliable way to determine the number of pages they are each going to have before the file is generated. The header and footer templates can receive the page count that the invoice ends up being but they are separate from the main content and therefore I cannot insert a page break accordingly.

I've also tried to calculate the height of the content / PDF page height but there were various issues with that once I started exporting multiple templates (worked alright with a single template). This approach isn't great either because inserting a blank page into the content itself causes the footer to appear on the new page as well which is not what I want.

My best attempt

The only way I've figured out that could get me around these issues is very inefficient. Each time a template is added to the batch I could pre-generate it using a separate instance of a wrapper for wkhtmltopdf, get the temporary file name, determine how many pages it has using pdfinfo and add a blank HTML template to the main instance accordingly. Here's a draft of a function to get the number of pages of the last template added (from a class that extends the wrapper, based on some other pdfinfo questions I found on SO):

/**
* Return the number of pages in the last invoice template added
* $complete === true => return the length of the entire document
*/
public function getNumPages($complete = false)
{
    if (!$complete) {
        // Generate PDF of last template added

        $tmpPdf = new WKPdf($this->_options);
        $tmpPdf->addPage($this->content, Array(
            'footer-html' => $this->footer
        ));

        /**
           The createPdf method is protected so I need to get 
           the content as string here to force the wrapper to 
           call wkhtmltopdf.
        */
        $tmpPdf->toString();
        $document = $tmpPdf->getPdfFilename();
    } else {

        // Generate entire batch
        $this->createPdf();
        $document = $this->getPdfFilename();
    }

    // Use pdfinfo to get the PDF page count
    $cmd = 'pdfinfo';
    exec("$cmd \"$document\"", $output);

    $pagecount = 0;
    foreach($output as $op)
    {
        // Extract the number
        if(preg_match("/Pages:\s*(\d+)/i", $op, $matches) === 1)
        {
            $pagecount = intval($matches[1]);
            break;
        }
    }

    return $pagecount;
}

This is very inefficient - it takes about 80 seconds to generate a batch of 25 invoices because I have to call wkhtmltopdf 25 times to create the temporary PDF files so that I can call pdfinfo 25 times to get their individual lengths and insert blank pages where necessary and then generate the final document.

The advantage of TCPDF is that it can give you the number of pages on the fly and a similar functionality takes about 5 seconds to generate a batch file of 25 invoices.

Anyone has any ideas on how to speed things up? Or a better idea to do this altogether. I've considered various tools for the generation including dompdf but wkhtmltopdf is simply the most powerful. The batch generation is really only used from the back office by the store admins so maybe they could be patient. But still.

Diplomate answered 8/8, 2018 at 12:50 Comment(6)
I've since tried to replace pdfinfo with the Fpdi library. Not a big fan of having to use two tools (three actually because Fpdi requires Fpdf...) to achieve something as simple as this.Diplomate
Interesting. I'll have a look into this over the weekend. I believe I may have a solution that will help, but there are a few 'ifs'. I'm sure I'll be able to help some way though. Check back here after the weekend.Heronry
I would maybe use TCPDF and use the background image method, then put the custom text where you want. If you really are trying to avoid TCPDF (one look at their documentation and I understand why), then maybe you can help your time by making threads for each PDF #71355Enschede
@ChadK The project is hosted on a shared hosting so I'm not sure if they allow for something like that. It's pure coincidence that they have wkhtmltopdf installed and allow it to be called via exec. I don't know if a pthreads CLI script would be possible to be honest.Diplomate
@PeterTheLobster, would you like to give me some example PDF file (link in internet) which should be generated, please. If I understand you correctly then is it one blank page after each invoice in A4 format?Maintop
@Maintop Hey, sorry. I don't think I can provide you with an example file at the moment as I haven't consulted this with the client. We are basically trying to generate single PDF containing multiple invoices. Due to number of items ordered or ammount of customer data each invoice can be anywhere from 1 to 3 pages long. When generating the bulk pdf containing multiple invoices I therefore need to insert a blank page after each invoice with odd number of pages (1, 3 etc.) so that the entire file can be printed double-sided (to avoid print of multiple invoices on the same sheet of paper).Diplomate
M
2

Unfortunately wkhtmltopdf is the library, which is written in C language and we can not dynamically add one page on the fly like in PHP libraries.

Citate from your comment: Due to number of items ordered or ammount of customer data each invoice can be anywhere from 1 to 3 pages long.

And because of this we can not precalculate the number of pages and write it to a database.

I think you have only one possibility / solution: you have to write behind each invoice a blank page and after the whole PDF was generated you have to edit it with free PHP library like FPDI. In combination with FPDI it is even possible to edit PDF documents.

By PDF editing you could delete all blank pages which you do not need if they starts with odd page number (like 3, 5, etc.). And in FPDI you have the possibility to detect a page number. It is much faster than the solution which you use now.

And the blank(or empty) pages you could detect on content length with FPFI like follows:

<?php
require('fpdf.php');
require_once('setasign/Fpdi/autoload.php');

class Pdf extends \setasign\Fpdi\Fpdi
{
    private $pdfReader;
    function Header()
    {
        if(is_null($this->pdfReader))
        {
            $readerId = $this->getPdfReaderId('blank-pages.pdf');
            $this->pdfReader = $this->getPdfReader($readerId);
        }

        $page_fpdi = $this->pdfReader->getPage($this->PageNo());
        $this->content = $page_fpdi->getContentStream();

        $this->Cell(0, 15, 'page content length: '.strlen($this->content));
    }

    protected function _putimages(){}
}

$pdf = new Pdf();
$pdf->SetFont('Arial', '', 12);
$pdf->AddPage(); //page content length: 70 // page with 'Hello World!' string
$pdf->AddPage(); //page content length: 30 // empty page
$pdf->AddPage(); //page content length: 30 // empty page
$pdf->Output();
?>

My blank-pages.pdf I have generated using FPDF with following code:

<?php
require('fpdf.php');

$pdf = new FPDF();
$pdf->AddPage();
$pdf->SetFont('Arial','B',16);
$pdf->Cell(40,10,'Hello World!');
$pdf->AddPage();
$pdf->AddPage();
$pdf->Output();
?>
Maintop answered 15/8, 2018 at 20:41 Comment(12)
That sounds like a better idea but FPDI doesn't allow you to read page contents as far as I remember. Or am I wrong about that? I wanted to try a similar approach where I would read page numbers from the footer but I think I needed a third library to read the PDF contents. Your suggestion while better still has a flaw: if I use a footer in wkhtmltopdf then the inserted page would not be completely blank (i.e. it would be numbered). But I suppose I could get around that by using JS in the footer to decrement the total page count on render and hide it entirely when page==pageCount.Diplomate
Correction: The footer isn't actually a problem. I briefly forgot that I can just insert a separate blank HTML page/document with no footer html. I was thinking with page breaks too much. I vaguely remember that FPDI allows you to get the size of a page. So maybe the blanks could be of different format and then replaced by the right size or removed as needed. But this does feel very hacky. But it might work so...Diplomate
@PeterTheLobster, for detecting of blank pages you could use the FPDI function getImportedPageSize(). But this blank pages must be in a different size than other pages. Or if you really need a page content you could use SetaPDF-Extractor component which is written in PHP and allows PHP developers extracting textual content from existing PDF documents. But this component is not free – 200 EUR / Project License.Maintop
Or you could use this JavaScript library for extracting textual content from PDF document. But I think you will able to detect blank pages with FPDI function getImportedPageSize() and do not need all this..Maintop
yea the different size page is solution is what I was thinking. I'll try it later today.Diplomate
@PeterTheLobster, may be today is your lucky day or it is more my lucky day, but I have found out how you can detect empty pages after long trying (a lot of time). Because I have assumed that you can not have a various size of pages with wkhtmltopdf I wrote for you the solution for detecting of empty pages. Please see my updated answer.Maintop
Oh that's even better than the different page size idea. I still haven't gotten around to trying it due to some work constraints but I'll award you the bounty now because I might be busy until tomorrow. Thanks for all the helpDiplomate
@PeterTheLobster, you are welcome! Thanks for you too!Maintop
I just got home. Works great. I don't like the fact that it's so hacky but what can you do.Diplomate
@PeterTheLobster, I'm really happy for you! Congratulations!Maintop
@PeterTheLobster, and does work my solution in your firma?Maintop
Yes we've been able to implement it in a production environment with the content length solution. The blank page length of pages generated in the real environment different from the length of the blank pages in a local windows environment but apart from that there haven't been any major problems so far. Thanks again.Diplomate

© 2022 - 2024 — McMap. All rights reserved.