How to read this pdf form using PyPDF2 in python
Asked Answered
O

1

0

https://www.fda.gov/downloads/AboutFDA/ReportsManualsForms/Forms/UCM074728.pdf

I'm trying to read this pdf using PyPDF2 or Pdfminer, but it is saying that the File has not been decrypted in Pypdf2 and in pdfminer, it is saying that it can decompress that pdf. Somebody let me know how to do this in a python3 windows environment. I can't use poppler as I cant install poppler in this windows.

Olgaolguin answered 13/4, 2018 at 18:7 Comment(1)
I suggest removing the URL from the question-title; it's sufficient to include it in the question's body-text.Raguelragweed
S
3

This is a restricted PDF file. In most cases you can decrypt a file that doesn't prompt you for a password using PyPDF2 with an empty string:

from PyPDF2 import PdfFileReader

reader = PdfFileReader('sample.pdf')
reader.decrypt('')

Unfortunately, it's not the case of your file or any other with 128-bit AES encryption level which is unsupported for the PyPDF2 decrypt() method that will return a NotImplementedError.

As a simple workaround you can save this file as a new file in Adobe Reader or similar and the new file should work for your code.

Also, you can do it programmatically using qpdfas discussed in this GitHub issue:

import os, shutil, tempdir
from subprocess import check_call

    try:
        tempdir = tempfile.mkdtemp(dir=os.path.dirname(filename))
        temp_out = os.path.join(tempdir, 'qpdf_out.pdf')
        check_call(['qpdf', "--password=", '--decrypt', filename, temp_out])
        shutil.move(temp_out, filename)
        print 'File Decrypted'

    finally:
        shutil.rmtree(tempdir)
Seismism answered 14/4, 2018 at 2:4 Comment(3)
Worked like a charm so thank you. First i decrypted using qpdf then i got all the fields in the pdf. it was amazing. Why dont we implement this feature in the PyPDF2Olgaolguin
Is there any way we can identify in pdf wether there are urls, bookmarks, annotations and comments using PyPDF2. @SeismismOlgaolguin
Hi, getting a file not found error in check_call. :(Hesta

© 2022 - 2024 — McMap. All rights reserved.