Pdf2Image library failing to read pdf signed using DocuSign
Asked Answered
U

3

10

I'm trying to convert a pdf signed using DocuSign to image format. We are facing the error in convert_from_path method. Code and error are shown below:

import pdf2image

data=pdf2image.convert_from_path('name.pdf')

PDFPageCountError: Unable to get page count.
Syntax Error: Gen inside xref table too large (bigger than INT_MAX)
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Invalid XRef entry
Syntax Error: Invalid XRef entry
Syntax Error: Top-level pages object is wrong type (null)
Command Line Error: Wrong page range given: the first page (1) can not be after the last page (0).
Unquestionable answered 15/3, 2021 at 10:43 Comment(0)
J
11

I requested here that poppler add installation instructions for Ubuntu, even if just linking to this answer. My request was swiftly rejected and closed. If you'd like to see poppler better support the installation and building of their own product on at least the 4th most-popular operating system in the world (Ubuntu), please go upvote this request to show your support.

pdftoppm/Poppler needs to be updated. See here:

  1. Answer by @thijs123
  2. https://gitlab.freedesktop.org/poppler/poppler/-/issues/1014 (thanks to @Bohumir Zamecnik's comment here)

How to install/upgrade to the latest version of Poppler/pdftoppm (version 22.11.0 at the time of this writing) on Linux Ubuntu:

Tested on Ubuntu 20.04:

1. First, try installing from the latest Linux Ubuntu distribution release:

# check current version
# Mine shows: "pdftoppm version 0.86.1"
pdftoppm -v

# try to update it
sudo apt update
sudo apt install poppler-utils

# check current version again
pdftoppm -v

# Now run the `pdftoppm` command again to convert a PDF to a bunch of TIF files,
# for example:
pdftoppm "My PDF Password" -tiff -r 300 "in.pdf" "path/to/output/dir"
# OR (if you don't have a password on the PDF):
pdftoppm -tiff -r 300 "in.pdf" "path/to/output/dir"

# If it works, you're done! Otherwise, upgrade poppler by building it from
# source, as shown below.

On Ubuntu 20.04, one of the lines in my output from sudo apt install poppler-utils says:

poppler-utils is already the newest version (0.86.1-0ubuntu1).

That means it didn't upgrade poppler for me, and I still have the old version. So, I have to keep going.

If the above works for you (because you are in Ubuntu 22.04 or later, for example), then stop. But, the process above does not work for me. So, we need to install from source code.

2. Install from source code to get the latest version of poppler and pdftoppm

Here are my instructions on how to install poppler from source code. It looks like a lot, but it's very repetitive and follows the same patterns again and again to manually install a bunch of dependencies with aptitude, so don't be scared. Just read the instructions and it should work out without great difficulty.

1. Download the source code

# First, check your current version. 
# Mine shows: "pdftoppm version 0.86.1"
pdftoppm -v

# Now go here and look for the latest download link and see what URL it points
# to: https://poppler.freedesktop.org/ 
# Under the "Download" section I see: 
#
#       The latest stable release is poppler-22.11.0.tar.xz, released on
#       November 1, 2022:
#
# Hovering my mouse over the download link shows it to be the link below. 
# **Update the link and version in all commands below.**

# Download it
wget https://poppler.freedesktop.org/poppler-22.11.0.tar.xz
# Extract the compressed file and cd into the extracted dir
tar -xf poppler-22.11.0.tar.xz
cd poppler-22.11.0


# Build and install it. See the "INSTALL" file in this dir for *some* level of 
# help. Most of my instructions below are NOT in there :(.

mkdir -p build
cd build
git clone git://git.freedesktop.org/git/poppler/test

2. Install dependencies

sudo apt update

# Install the "easy" dependencies first:
# I expect these dependencies to install on Ubuntu 20.04 without issue
sudo apt install \
    libfreetype-dev \
    libfontconfig-dev \
    libboost-dev \
    libpng-dev \
    zlib1g-dev \
    liblcms2-dev \
    libcurl4 \
    libcurl4-gnutls-dev

# Install the "hard" dependencies second:
# Try this too, but this may not work for you. If it fails, see below. I expect
# these dependencies to NOT install on Ubuntu 20.04 without issue. If they fail
# to install, you will use `aptitude` to install the failed ones instead, as I
# explain and do below.
sudo apt install \
    libjpeg-dev \
    libcairo-dev \
    libopenjp2-7-dev \
    libtiff-dev \
    libcurl4-gnutls-dev \
    libnss3-dev

# When I run `sudo apt install libjpeg-dev` alone, for instance, I see the
# following output errors:
#
#       Reading package lists... Done
#       Building dependency tree       
#       Reading state information... Done
#       Some packages could not be installed. This may mean that you have
#       requested an impossible situation or if you are using the unstable
#       distribution that some required packages have not yet been created
#       or been moved out of Incoming.
#       The following information may help to resolve the situation:
#       
#       The following packages have unmet dependencies:
#        libjpeg-dev : Depends: libjpeg8-dev but it is not going to be installed
#       E: Unable to correct problems, you have held broken packages.

# So, we will use the `aptitude` package installer tool to to solve that.
# First, install aptitude.
sudo apt install aptitude

# Install `libjpeg-dev` via `aptitude`:
sudo aptitude install libjpeg-dev
#
# Assuming your output looks the same as mine, and the options it gives you are
# the same and in the same order, choose **no** then **yes** to downgrade
# `libjpeg-turbo8`, thereby allowing `libjpeg-dev` to install. See the arrows
# (<=========================) below which I use to indicate where I make my
# selections in the interactive prompts during the installation process via
# the 'aptitude' installation tool:
#
#
#       $ sudo aptitude install libjpeg-dev
#       The following NEW packages will be installed:
#         libjpeg-dev libjpeg-turbo8-dev{ab} libjpeg8-dev{a} 
#       0 packages upgraded, 3 newly installed, 0 to remove and 0 not upgraded.
#       Need to get 238 kB/241 kB of archives. After unpacking 1,041 kB will be used.
#       The following packages have unmet dependencies:
#        libjpeg-turbo8-dev : Depends: libjpeg-turbo8 (= 2.0.3-0ubuntu1) but 2.0.3-0ubuntu1.20.04.3 is installed
#       The following actions will resolve these dependencies:
#       
#            Keep the following packages at their current version:
#       1)     libjpeg-dev [Not Installed]                        
#       2)     libjpeg-turbo8-dev [Not Installed]                 
#       3)     libjpeg8-dev [Not Installed]                       
#       
#       
#       
#       Accept this solution? [Y/n/q/?] n           <=========================
#       The following actions will resolve these dependencies:
#       
#            Downgrade the following packages:                                        
#       1)     libjpeg-turbo8 [2.0.3-0ubuntu1.20.04.3 (now) -> 2.0.3-0ubuntu1 (focal)]
#       
#       
#       
#       Accept this solution? [Y/n/q/?] y           <=========================
#       The following packages will be DOWNGRADED:
#         libjpeg-turbo8 
#       The following NEW packages will be installed:
#         libjpeg-dev libjpeg-turbo8-dev{a} libjpeg8-dev{a} 
#       0 packages upgraded, 3 newly installed, 1 downgraded, 0 to remove and 0 not upgraded.
#       Need to get 356 kB/359 kB of archives. After unpacking 1,040 kB will be used.
#       Do you want to continue? [Y/n/?] y          <=========================
#       Get: 1 http://us.archive.ubuntu.com/ubuntu focal/main amd64 libjpeg-turbo8 amd64 2.0.3-0ubuntu1 [118 kB]
#       Get: 2 http://us.archive.ubuntu.com/ubuntu focal/main amd64 libjpeg-turbo8-dev amd64 2.0.3-0ubuntu1 [238 kB]
#       Fetched 356 kB in 1s (490 kB/s)             
#       dpkg: warning: downgrading libjpeg-turbo8:amd64 from 2.0.3-0ubuntu1.20.04.3 to 2.0.3-0ubuntu1
#       (Reading database ... 474322 files and directories currently installed.)
#       Preparing to unpack .../libjpeg-turbo8_2.0.3-0ubuntu1_amd64.deb ...
#       Unpacking libjpeg-turbo8:amd64 (2.0.3-0ubuntu1) over (2.0.3-0ubuntu1.20.04.3) ...
#       Selecting previously unselected package libjpeg-turbo8-dev:amd64.
#       Preparing to unpack .../libjpeg-turbo8-dev_2.0.3-0ubuntu1_amd64.deb ...
#       Unpacking libjpeg-turbo8-dev:amd64 (2.0.3-0ubuntu1) ...
#       Selecting previously unselected package libjpeg8-dev:amd64.
#       Preparing to unpack .../libjpeg8-dev_8c-2ubuntu8_amd64.deb ...
#       Unpacking libjpeg8-dev:amd64 (8c-2ubuntu8) ...
#       Selecting previously unselected package libjpeg-dev:amd64.
#       Preparing to unpack .../libjpeg-dev_8c-2ubuntu8_amd64.deb ...
#       Unpacking libjpeg-dev:amd64 (8c-2ubuntu8) ...
#       Setting up libjpeg-turbo8:amd64 (2.0.3-0ubuntu1) ...
#       Setting up libjpeg-turbo8-dev:amd64 (2.0.3-0ubuntu1) ...
#       Setting up libjpeg8-dev:amd64 (8c-2ubuntu8) ...
#       Setting up libjpeg-dev:amd64 (8c-2ubuntu8) ...
#       Processing triggers for libc-bin (2.31-0ubuntu9.9) ...

# Install `libcairo-dev` the same way and with the same selections:
sudo aptitude install libcairo-dev
#
#       $ sudo aptitude install libcairo-dev
#       Note: selecting "libcairo2-dev" instead of the virtual package "libcairo-dev"
#       The following NEW packages will be installed:
#         libcairo-script-interpreter2{a} libcairo2-dev libice-dev{a} libpixman-1-dev{a} libpthread-stubs0-dev{a} libsm-dev{a} libx11-dev{ab} libxau-dev{a} libxcb-render0-dev{a} libxcb-shm0-dev{a} libxcb1-dev{a} 
#         libxdmcp-dev{a} libxext-dev{a} libxrender-dev{a} x11proto-core-dev{a} x11proto-dev{a} x11proto-xext-dev{a} xorg-sgml-doctools{a} xtrans-dev{a} 
#       0 packages upgraded, 19 newly installed, 0 to remove and 0 not upgraded.
#       Need to get 2,569 kB of archives. After unpacking 10.5 MB will be used.
#       The following packages have unmet dependencies:
#        libx11-dev : Depends: libx11-6 (= 2:1.6.9-2ubuntu1) but 2:1.6.9-2ubuntu1.2 is installed
#       The following actions will resolve these dependencies:
#       
#            Keep the following packages at their current version:
#       1)     libcairo2-dev [Not Installed]                      
#       2)     libx11-dev [Not Installed]                         
#       3)     libxext-dev [Not Installed]                        
#       4)     libxrender-dev [Not Installed]                     
#       
#       
#       
#       Accept this solution? [Y/n/q/?] n       <=========================
#       The following actions will resolve these dependencies:
#       
#            Downgrade the following packages:                                
#       1)     libx11-6 [2:1.6.9-2ubuntu1.2 (now) -> 2:1.6.9-2ubuntu1 (focal)]
#       
#       
#       
#       Accept this solution? [Y/n/q/?] y       <=========================
#       The following packages will be DOWNGRADED:
#         libx11-6 
#       The following NEW packages will be installed:
#         libcairo-script-interpreter2{a} libcairo2-dev libice-dev{a} libpixman-1-dev{a} libpthread-stubs0-dev{a} libsm-dev{a} libx11-dev{a} libxau-dev{a} libxcb-render0-dev{a} libxcb-shm0-dev{a} libxcb1-dev{a} 
#         libxdmcp-dev{a} libxext-dev{a} libxrender-dev{a} x11proto-core-dev{a} x11proto-dev{a} x11proto-xext-dev{a} xorg-sgml-doctools{a} xtrans-dev{a} 
#       0 packages upgraded, 19 newly installed, 1 downgraded, 0 to remove and 0 not upgraded.
#       Need to get 3,141 kB of archives. After unpacking 10.5 MB will be used.
#       Do you want to continue? [Y/n/?] y      <=========================

# Install `libopenjp2-7-dev` the same way and with the same selections:
sudo aptitude install libopenjp2-7-dev
#
#       $ sudo aptitude install libopenjp2-7-dev
#       The following NEW packages will be installed:
#         libopenjp2-7-dev{b} 
#       0 packages upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
#       Need to get 26.7 kB of archives. After unpacking 168 kB will be used.
#       The following packages have unmet dependencies:
#        libopenjp2-7-dev : Depends: libopenjp2-7 (= 2.3.1-1ubuntu4) but 2.3.1-1ubuntu4.20.04.1 is installed
#       The following actions will resolve these dependencies:
#       
#            Keep the following packages at their current version:
#       1)     libopenjp2-7-dev [Not Installed]                   
#       
#       
#       
#       Accept this solution? [Y/n/q/?] n       <=========================
#       The following actions will resolve these dependencies:
#       
#            Downgrade the following packages:                                      
#       1)     libopenjp2-7 [2.3.1-1ubuntu4.20.04.1 (now) -> 2.3.1-1ubuntu4 (focal)]
#       
#       
#       
#       Accept this solution? [Y/n/q/?] y       <=========================
#       The following packages will be DOWNGRADED:
#         libopenjp2-7 
#       The following NEW packages will be installed:
#         libopenjp2-7-dev 
#       0 packages upgraded, 1 newly installed, 1 downgraded, 0 to remove and 0 not upgraded.
#       Need to get 168 kB of archives. After unpacking 168 kB will be used.
#       Do you want to continue? [Y/n/?] y      <=========================

# Install `libtiff-dev` the same way and with the same selections:
sudo aptitude install libtiff-dev
#
#       $ sudo aptitude install libtiff-dev
#       The following NEW packages will be installed:
#         libjbig-dev{a} liblzma-dev{ab} libtiff-dev{b} libtiffxx5{a} 
#       0 packages upgraded, 4 newly installed, 0 to remove and 0 not upgraded.
#       Need to get 461 kB of archives. After unpacking 1,796 kB will be used.
#       The following packages have unmet dependencies:
#        libtiff-dev : Depends: libtiff5 (= 4.1.0+git191117-2build1) but 4.1.0+git191117-2ubuntu0.20.04.3 is installed
#        liblzma-dev : Depends: liblzma5 (= 5.2.4-1) but 5.2.4-1ubuntu1.1 is installed
#       The following actions will resolve these dependencies:
#       
#            Keep the following packages at their current version:
#       1)     liblzma-dev [Not Installed]                        
#       2)     libtiff-dev [Not Installed]                        
#       
#       
#       
#       Accept this solution? [Y/n/q/?] n       <=========================
#       The following actions will resolve these dependencies:
#       
#            Downgrade the following packages:                                                     
#       1)     liblzma5 [5.2.4-1ubuntu1.1 (now) -> 5.2.4-1 (focal)]                                
#       2)     libtiff5 [4.1.0+git191117-2ubuntu0.20.04.3 (now) -> 4.1.0+git191117-2build1 (focal)]
#       
#       
#       
#       Accept this solution? [Y/n/q/?] y       <=========================
#       The following packages will be DOWNGRADED:
#         liblzma5 libtiff5 
#       The following NEW packages will be installed:
#         libjbig-dev{a} liblzma-dev{a} libtiff-dev libtiffxx5{a} 
#       0 packages upgraded, 4 newly installed, 2 downgraded, 0 to remove and 0 not upgraded.
#       Need to get 715 kB of archives. After unpacking 1,788 kB will be used.
#       Do you want to continue? [Y/n/?] y      <=========================

# Install `libcurl4-gnutls-dev` the same way and with the same selections:
sudo aptitude install libcurl4-gnutls-dev
#
#       $ sudo aptitude install libcurl4-gnutls-dev
#       The following NEW packages will be installed:
#         libcurl4-gnutls-dev{b} 
#       0 packages upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
#       Need to get 318 kB of archives. After unpacking 1,525 kB will be used.
#       The following packages have unmet dependencies:
#        libcurl4-gnutls-dev : Depends: libcurl3-gnutls (= 7.68.0-1ubuntu2) but 7.68.0-1ubuntu2.12 is installed
#       The following actions will resolve these dependencies:
#       
#            Keep the following packages at their current version:
#       1)     libcurl4-gnutls-dev [Not Installed]                
#       
#       
#       
#       Accept this solution? [Y/n/q/?] n       <=========================
#       The following actions will resolve these dependencies:
#       
#            Downgrade the following packages:                                      
#       1)     libcurl3-gnutls [7.68.0-1ubuntu2.12 (now) -> 7.68.0-1ubuntu2 (focal)]
#       
#       
#       
#       Accept this solution? [Y/n/q/?] y       <=========================
#       The following packages will be DOWNGRADED:
#         libcurl3-gnutls 
#       The following NEW packages will be installed:
#         libcurl4-gnutls-dev 
#       0 packages upgraded, 1 newly installed, 1 downgraded, 0 to remove and 0 not upgraded.
#       Need to get 549 kB of archives. After unpacking 1,524 kB will be used.
#       Do you want to continue? [Y/n/?] y      <=========================

# Install `libnss3-dev` the same way and with the same selections:
sudo aptitude install libnss3-dev
#
#       $ sudo aptitude install libnss3-dev
#       The following NEW packages will be installed:
#         libnspr4-dev{a} libnss3-dev{b} 
#       0 packages upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
#       Need to get 437 kB of archives. After unpacking 2,611 kB will be used.
#       The following packages have unmet dependencies:
#        libnss3-dev : Depends: libnss3 (= 2:3.49.1-1ubuntu1) but 2:3.49.1-1ubuntu1.8 is installed
#       The following actions will resolve these dependencies:
#       
#            Keep the following packages at their current version:
#       1)     libnss3-dev [Not Installed]                        
#       
#       
#       
#       Accept this solution? [Y/n/q/?] n       <=========================
#       The following actions will resolve these dependencies:
#       
#            Downgrade the following packages:                                 
#       1)     libnss3 [2:3.49.1-1ubuntu1.8 (now) -> 2:3.49.1-1ubuntu1 (focal)]
#       
#       
#       
#       Accept this solution? [Y/n/q/?] y       <=========================
#       The following packages will be DOWNGRADED:
#         libnss3 
#       The following NEW packages will be installed:
#         libnspr4-dev{a} libnss3-dev 
#       0 packages upgraded, 2 newly installed, 1 downgraded, 0 to remove and 0 not upgraded.
#       Need to get 1,608 kB of archives. After unpacking 2,473 kB will be used.
#       Do you want to continue? [Y/n/?] y      <=========================

3. Build poppler and re-test pdftoppm to ensure it has been upgraded and now works

cmake -DTESTDATADRIR=./test -DCMAKE_INSTALL_MANDIR:PATH=/usr/local/share/man ..
time make
sudo make install

# re-initialize your bash terminal by re-sourcing your startup file
. ~/.bashrc

# Reload/find/link shared libraries from the new executable
sudo ldconfig "$(which pdftoppm)"  # PREFERRED
# OR (same thing on my system at least)
sudo ldconfig /usr/local/bin/pdftoppm

# Ensure your version is now newer than it was when we started!
# Mine now outputs `pdftoppm version 22.11.0`!
pdftoppm -v

# Now run the `pdftoppm` command again to convert a PDF to a bunch of TIF files,
# for example:
pdftoppm "My Optional PDF Password" -tiff -r 300 "in.pdf" "path/to/output/dir"
# OR (if you don't have a password on the PDF):
pdftoppm -tiff -r 300 "in.pdf" "path/to/output/dir"

# It should work now. Done!

# NB: the man pages may not be be updated for the newest executables you just
# built. So, you may need to use `pdftoppm --help` instead of `man pdftoppm`.

References:

  1. Looooots of trial and error and my own digging, for many hours.
  2. Main website: https://poppler.freedesktop.org/
  3. Repo on Gitlab: https://gitlab.freedesktop.org/poppler/poppler
  4. Repo install instructions on Gitlab: https://gitlab.freedesktop.org/poppler/poppler/-/blob/master/INSTALL
  5. How to extract *.tar.xz file: https://linuxize.com/post/how-to-extract-unzip-tar-xz-file/
  6. Force install apt-get
  7. cmake list all build options (-DOPTION_NAME): https://mcmap.net/q/209812/-how-to-list-all-cmake-build-options-and-their-default-values
  8. How to install libcurl on Ubuntu: https://askubuntu.com/a/78185/327339
  9. How to install zlib: https://itsfoss.com/install-zlib-ubuntu/
  10. [my repo]: "Syntax Error: Gen inside xref table too large (bigger than INT_MAX)" / "ERROR: 'pdftoppm' failed. ret_code = 1": https://github.com/ElectricRCAircraftGuy/PDF2SearchablePDF/issues/29
    1. My pdf2searchablepdf repo which uses poppler and may require an upgraded version of poppler.
  11. One of the places I learned about aptitude: "Broken packages" do not let me install packages
Judaea answered 4/11, 2022 at 7:3 Comment(12)
Thank you for the thorough walk-thru. This worked a treat for me on our more beefy live environment, but on the dev environment, the making task kept getting stopped because it used too many system resources. Any idea if/when they'll push this seemingly important update to Ubuntu 20?Sezen
I get the c++: fatal error: Killed signal terminated program cc1plus error.Sezen
@dearsina, they will likely never push this update to Ubuntu 20 and 22, and they will likely only fix it in Ubuntu 24 if someone like you or me requests it.Judaea
@dearsina, a google search of that error indicates you are out of RAM. That's an easy fix. Simply give yourself more virtual memory via a swap file. See my instructions here: How do I increase the size of swapfile without removing it in the terminal?. I ran into this problem when building with Bazel. See my notes and info. about that here: java.lang.OutOfMemoryError when running bazel build.Judaea
I would buy you a chocolate if I could haha Thanks for the effort for sharing and saving us time on this!Leonelleonelle
@FernandoWittmann, I'd love that. Meanwhile, can you at least upvote this request to get it some attention? I just added a blurb about it at the top of my answer.Judaea
I transfered the instructions (with slight modifications) into a Dockerfile for quick testing: gist.github.com/FabianTe/f0225b5a2b7e55ff2b86534ba29c9451Calcification
Weirdly enough I am getting this error on a docusign PDF but my pdftoppm is 22.12.0.Polysepalous
@GregW, verify that it is in your PATH and that you don't accidentally have two versions installed at once. Otherwise, I have no idea why that would be happening. Run which pdftoppm to see which is the primary one in your path. Then run $(which pdftoppm) --version.Judaea
@GabrielStaples Ah there we go. Our production docker image has pdftoppm version 20.09.0. Silly me for thinking all builds would have been on the same version! That must be the issue.Polysepalous
I think libgpgmepp-dev package is missing (as I had Could not find a configuration file for package "Gpgmepp" error).Diaster
Related: Updating gpgmepp from 1.13 to latest (at least 1.19).Diaster
E
9

This problem has to do with an older version of poppler. Upgrading to the latest version (21.03.0) solves the problem.

Ethno answered 23/3, 2021 at 9:54 Comment(4)
It's the issue: gitlab.freedesktop.org/poppler/poppler/-/issues/1014Theocrasy
CentOS8 only has 20.11.0, which is sufficient for the fixLowrance
Upgrading poppler on Ubuntu 20.04 proved quite difficult. But, with my instructions I just wrote it should now be easy.Judaea
I am using the method convert_from_bytes, when running locally it works but when I run my script in docker I get the same error. What could it be ? I don't have Poppler installed in my poetry env, should I add it?Paradrop
F
1

You will need to contact Pdf2Image for possible solution, it looks like upgrading should fix it

Fried answered 24/3, 2021 at 9:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.