How to render pd.DataFrame table in pdf with nbconvert+pandoc

Asked 30/6, 2020 at 8:58 Answered 18/2, 2022 at 11:27

Solved python pandas dataframe pdf pandoc

I am generating a pdf from a set of Jupyter notebooks. For each .ipynb file, I'm running

$ jupyter-nbconvert --to markdown Untitled1.ipynb

and then merging them together with:

$ pandoc Untitled1.md [Untitled2.md ...] -f gfm --pdf-engine=pdflatex -o all_notebooks.pdf

(I am mostly following the example here.) One thing I noticed is that the pandas DataFrames, e.g.

import pandas as pd
df = pd.DataFrame({'a':[1,2,3]})
df.head()

are rendered in the pdf as

rather than

Any idea how to fix this issue, please? I am using $ jupyter-nbconvert --version 5.6.1 and $ pandoc --version 2.9.2.1. In the md file the table turns into the html block below. I suspect pandoc does not interpret it correctly. I tried the from-markdown-strict option suggested here, without any luck.

Thank you!

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>a</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
    </tr>
    <tr>
      <th>1</th>
      <td>2</td>
    </tr>
    <tr>
      <th>2</th>
      <td>3</td>
    </tr>
  </tbody>
</table>
</div>

Bowler answered 30/6, 2020 at 8:58 Comment(0)

The issue here is that nbconvert sees the DataFrames as HTML (plus the styling, which you're seeing in the output, issue here), which gets ignored by pandoc's Markdown converter.

One way around this is to change pandas' behavior to not write out DataFrames as HTML in notebooks. You can do this by setting the option at the top of each notebook:

pd.set_option("display.notebook_repr_html", False)

Another option is to use the HTML representation as the intermediate step rather than Markdown:

$ jupyter-nbconvert --to html Untitled1.ipynb
$ pandoc Untitled1.html -t latex --pdf-engine=pdflatex -o all_notebooks.pdf

And of course if you don't need to do other formatting, you can just save your notebooks directly as pdfs:

jupyter-nbconvert --to pdf Untitled1.ipynb

(To combine multiple notebooks, see the discussion here.)

Samurai answered 8/2, 2021 at 15:0 Comment(0)

The problem seems to be in the connection between Jupyter and Pandoc. Jupyter didn't output formatted markdown and hence pandoc doesn't format it in the PDF.

Extortion answered 8/2, 2021 at 5:44 Comment(0)

For me the best way is using ipypublish (https://ipypublish.readthedocs.io/en/latest/)

Install

conda install -c conda-forge ipypublish

Setup pandas

from ipypublish import nb_setup
pd = nb_setup.setup_pandas(escape_latex = False)
...
pd.DataFrame(mydata)

Profit

jupyter nbconvert notebook.ipynb --no-input --no-prompt --to pdf

Make sure you run the notebook again before converting it, such that all the tables are rendered with ipypublish. Then they look cool in the notebook as well as in the PDF.

Borrego answered 18/2, 2022 at 11:27 Comment(0)

Recommended topics

Hot tags