python-pdfkit (wkhtmltopdf) TOC overflow
Asked Answered
S

1

26

I currently am creating a perfectly good PDF. there is nothing technically wrong with it. However, the TOC is ugly.

The TOC is generated via xsl which is passed through jinja2 for simple details to the top section of the page. I have modified the XSL to match the client's branding and design precisely. However, the list keeps growing in height.

Here is the current result (sorry to blur the text) you can see the toc picks up at the right spot on the new page, but there seems to be no way to apply a top margin to the new page: enter image description here

The code: Here is the xsl:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
            xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
            xmlns:outline="http://wkhtmltopdf.org/outline"
            xmlns="http://www.w3.org/1999/xhtml">
  <xsl:output doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
          doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
          indent="yes" />
  <xsl:template match="outline:outline">
    <html>
      <head>
        <title>Table of Contents</title>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <style>

      body{
        background-color: #fff;
        margin-left: 0px;
        margin-top: 0px;
        color:#1e1e1e;
        font-family: arial, verdana,sans-serif;
        font-size: 90px;
      }
      .contentSection{
        position:relative;
        height:3200px;
        width:6100px;
      }
      .profile{
        position:absolute;
        display:inline-block;
        top:200px !important;
      }


      h1 {
        text-align: left;
        font-size: 70px;
        font-family: arial;
        color: #ef882d;
      }
      li {
        border-bottom: 1px dashed rgb(45,117,183);
      }
      span {float: right;}
      li {
        list-style: none;
        margin-top:30px;
      }
      ul {
        font-size: 70px;
        font-family: arial;
        color:#2d75b7;
      }

      ul ul {font-size: 80%; padding-top:0px;}
      ul {padding-left: 0em; padding-top:0px;}
      ul ul {padding-left: 1em; padding-top:0px;}
      a {text-decoration:none; color: color:#2d75b7;}


      #topper{
        width:100%;
        border-bottom:8px solid #ef882d;
      }
      #title{
        position:absolute;
        top:60px;
        font-size:60px;
        left:150px;
        color:#666666;
      }

      h1, h2{
        font-size:60px;
        -webkit-margin-before: 0px;
        -webkit-margin-after: 0px;
        -webkit-margin-start: 0px;
        -webkit-margin-end: 0px;
      }


      #profile{
        position:static;
        -webkit-border-top-left-radius: 40px;
        -webkit-border-bottom-left-radius: 40px;
        -moz-border-radius-topleft: 40px;
        -moz-border-radius-bottomleft: 40px;
        border-top-left-radius: 40px;
        border-bottom-left-radius: 40px;
        right:-540px;
        background-color: #2d75b7;
        padding:4px;
        padding-left:60px;
        padding-right:250px;
        color:#fff;
        display:inline-block;
        margin-top:200px;
        float:right;
      }

      #room{
        padding-top: 200px;
        padding-left: 150px;
        display:inline-block;
      }
      #section{
        padding-left: 150px;
        color: #ef882d;
        text-transform: uppercase;
        font-size:60px;
        font-weight: bold;
        display:inline-block;
        margin-top: 30px;
        margin-bottom: 5px;
      }
      #area{
        padding-left: 150px;
        font-size:60px;
        color:#2d75b7;
        margin-top: 15px;
      }
      #dims{
        padding-left: 150px;
        font-size:60px;
        color:#2d75b7;
        margin-top: 15px;
      }
      #toc{
        width:50%;
        margin-top:150px;
        margin-left:300px;
      }
    </style>
    <script>
      var value = {{profile|e}};
    </script>
  </head>
  <body>
    <div class="contentSection">
      <div id="title">A title here</div>
      <div id="topper">
        <div id="profile" class="profile">{{profile|e}}</div>
        <div id="room"> {{profile|e}} </div>
        <div id="area"> Revision Date </div>
        <div id="dims"> {{area|e}} </div>
        <div id="section">Table of Contents</div>
      </div>
      <div id="toc">
        <ul><xsl:apply-templates select="outline:item/outline:item"/></ul>
      </div>
    </div>
  </body>
</html>
 </xsl:template>
  <xsl:template match="outline:item">
    <! begin LI>
    <li>
      <xsl:if test="@title!=''">
        <div>
          <a>
            <xsl:if test="@link">
              <xsl:attribute name="href"><xsl:value-of select="@link"/> . 
 </xsl:attribute>
            </xsl:if>
            <xsl:if test="@backLink">
              <xsl:attribute name="name"><xsl:value-of select="@backLink"/> .   </xsl:attribute>
            </xsl:if>
            <xsl:value-of select="@title" />
          </a>
          <span>
            <xsl:value-of select="@page" />
          </span>
        </div>
      </xsl:if>
      <ul>
        <xsl:comment>added to prevent self-closing tags in QtXmlPatterns</xsl:comment>
        <xsl:apply-templates select="outline:item"/>
      </ul>
    </li>
  </xsl:template>
</xsl:stylesheet>

I have dealt with content overflows in other areas of the PDF using traditional HTML, JavaScript, and a document ready flag. The TOC however requires an XSL file instead.

I tried do this with nth-child css nth-child is ignored.

The question:

*Is there a way within wkhtmltopdf or python pdf-kit to deal with page breaks in the TOC specifically, and place a better margin top on the new page? is there a way to supply a TOC as a traditional html page so that I can do this with javaScript instead? *

Sully answered 15/5, 2018 at 17:56 Comment(5)
Hi @Sully , the python-pdfkit wrapper provides the options to set margins. Please refer to this github repo github.com/JazzCore/python-pdfkit which gives more detail. I haven't worked on this package though.Coliseum
@Coliseum thank you for the kind reply. Unfortunately those are global margins and will not work with the page designs for the document.Sully
You probably want something like this: #42006319Mcabee
@Mcabee it certainly is what I want! But this is the problem, the table of contents does not recognize a new page when one appears. The rest of the document is quite straight forward in using css rules to cope with page breaks in a sophisticated way. The toc does not.Sully
please can you help me make the TOC? i believe i need to pass toc = {'xsl-style-sheet': 'toc.xsl'}, but i don't know what is supposed to go in that file...Conover
S
3

Code review

I made a quick code review in your XSL (and CSS) file. Even if it doesn’t solve your problem, it help reproducing and understanding it. Here is my comments:

  • Your XSL has a typo: <! begin LI> is not a valid XML tab. Is it a comment?

  • I prefer using the concat() XPath function to append characters directly. Because, if you re-indent your code, you may introduce extra whitespaces.

    So, I replaced:

    <xsl:attribute name="href"><xsl:value-of select="@link"/> . </xsl:attribute>
    

    By:

    <xsl:attribute name="href">
      <xsl:value-of select="concat(@link, ' . ')"/>
    </xsl:attribute>
    
  • I added a xs:if to prevent generating an empty <ul> if it is not necessary:

    <xsl:if test="count(outline:item)">
      <ul>
        <xsl:comment>added to prevent self-closing tags in QtXmlPatterns</xsl:comment>
        <xsl:apply-templates select="outline:item"/>
      </ul>
    </xsl:if>
    
  • I also fixed duplicate or mal-formed CSS entries, I replaced:

    li {
      border-bottom: 1px dashed rgb(45, 117, 183);
    }
    
    span {
      float: right;
    }
    
    li {
      list-style: none;
      margin-top: 30px;
    }
    
    ul ul {font-size: 80%; padding-top:0px;}
    ul {padding-left: 0em; padding-top:0px;}
    ul ul {padding-left: 1em; padding-top:0px;}
    a {text-decoration:none; color: color:#2d75b7;}
    

    by:

    span {
      float: right;
    }
    
    li {
      list-style: none;
      margin-top: 30px;
      border-bottom: 1px dashed rgb(45, 117, 183);
    }
    
    ul {
        font-size: 70px;
        font-family: arial;
        color: #2d75b7;
    }
    
    ul ul {
        font-size: 80%;
        padding-left: 1em;
        padding-top: 0px;
    }
    
    a {
        text-decoration: none;
        color: #2d75b7;
    }
    
    
  • If you target XHTML, the <style> tag has a mandatory type attribute. Same remark for the <script> attribute.

    <style type="text/css">...</style>
    <script type="text/javascript">...</script>
    

Reproducing the problem

It was a little hard to reproduce your bug, because of a lack of information. So I guess it.

First, I create a sample TOC file, which look like this:

outline.xml

<?xml version="1.0" encoding="UTF-8"?>
<outline xmlns="http://wkhtmltopdf.org/outline">
  <item>
    <item title="Lorem ipsum dolor sit amet, consectetur adipiscing elit." page="2"/>
    <item title="Cras at odio ultrices, elementum leo at, facilisis nibh." page="8"/>
    <item title="Vestibulum sed libero bibendum, varius massa vitae, dictum arcu." page="19"/>
    ...
    <item title="Sed semper augue quis enim varius viverra." page="467"/>
  </item>
</outline>

This file contains 70 items so that I can see the page breaks.

To build the HTML and PDF I used your (fixed) XSL file and run pdfkit:

import io
import os

import pdfkit
from lxml import etree

HERE = os.path.dirname(__file__)


def layout(src_path, dst_path):
    # load the XSL
    xsl_path = os.path.join(HERE, "layout.xsl")
    xsl_tree = etree.parse(xsl_path)

    # load the XML source
    src_tree = etree.parse(src_path)

    # transform
    transformer = etree.XSLT(xsl_tree)
    dst_tree = transformer.apply(src_tree)

    # write the result
    with io.open(dst_path, mode="wb") as f:
        f.write(etree.tostring(dst_tree, encoding="utf-8", method="html"))


if __name__ == '__main__':
    layout(os.path.join(HERE, "outline.xml"), os.path.join(HERE, "outline.html"))
    pdfkit.from_file(os.path.join(HERE, "outline.html"),
                     os.path.join(HERE, "outline.pdf"),
                     options={'page-size': 'A1', 'orientation': 'landscape'})

note: your page size looks very huge…

Solution

You are right, wkhtmltopdf doesn't take into account the margin in your CSS:

li {
  list-style: none;
  border-bottom: 1px dashed rgb(45, 117, 183);
  margin-top: 30px;  # <-- not working after page break
}

This is a normal behavior, consider for instance the header paragraphs (h1, h2, etc.). A header can have a top margin in order to add white space between a paragraph and the following header, but, if the header starts a new page we want to get rid of the margin, and have the heading touching to top margin of the page.

For your TOC, there is a solution. You can use padding (instead of margin):

li {
  border-bottom: 1px dashed rgb(45, 117, 183);
  list-style: none;
  padding-top: 30px;
}

Actually, the TOC content (#toc element) is fixed:

#toc {
  width: 50%;
  margin-top: 150px;
  margin-left: 300px;
}

So, you can reduce the margin-top to match your need, for instance:

#toc {
  width: 50%;
  margin-top: 120px;
  margin-left: 300px;
}
Salami answered 27/1, 2019 at 14:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.