Open a protected pdf file in python
Asked Answered
B

5

12

I write a pdf cracking and found the password of the protected pdf file. I want to write a program in Python that can display that pdf file on the screen without password.I use the PyPDF library. I know how to open a file without the password, but can't figure out the protected one.Any idea? Thanks

filePath = raw_input()
password = 'abc'
if sys.platform.startswith('linux'):
       subprocess.call(["xdg-open", filePath])
Beaverboard answered 30/9, 2014 at 21:0 Comment(0)
S
18

The approach shown by KL84 basically works, but the code is not correct (it writes the output file for each page). A cleaned up version is here:

https://gist.github.com/bzamecnik/1abb64affb21322256f1c4ebbb59a364

# Decrypt password-protected PDF in Python.
# 
# Requirements:
# pip install PyPDF2

from PyPDF2 import PdfFileReader, PdfFileWriter

def decrypt_pdf(input_path, output_path, password):
  with open(input_path, 'rb') as input_file, \
    open(output_path, 'wb') as output_file:
    reader = PdfFileReader(input_file)
    reader.decrypt(password)

    writer = PdfFileWriter()

    for i in range(reader.getNumPages()):
      writer.addPage(reader.getPage(i))

    writer.write(output_file)

if __name__ == '__main__':
  # example usage:
  decrypt_pdf('encrypted.pdf', 'decrypted.pdf', 'secret_password')
Speechmaking answered 18/1, 2017 at 7:56 Comment(5)
what's decrypted.pdf over here?Connatural
That's output_path, the name of the out file to be written.Speechmaking
just want to say that I have to remove password argument and reader.decrypt(password) part to make it work.Dauphine
How comes that? Was your PDF really encrypted? This is the main point of the script.Speechmaking
I get an error on the line reader.decrypt(password). only algorithm code 1 and 2 are supported. I think it is related to how my pdf is encrypted.Fluxmeter
M
11

You should use pikepdf library nowadays instead:

import pikepdf

with pikepdf.open("input.pdf", password="abc") as pdf:
    num_pages = len(pdf.pages)
    print("Total pages:", num_pages)

PyPDF2 doesn't support many encryption algorithms, pikepdf seems to solve them, it supports most of password protected methods, and also documented and actively maintained.

Milurd answered 17/5, 2020 at 6:7 Comment(1)
Excellent!.. you are correct, PyPDF2 won't decrypt some encryption algorithmsLudwigshafen
I
4

You can use pdfplumber library. Super easy to use and reads machine written pdf files seamlessly, better than any other library i have used.

import pdfplumber
with pdfplumber.open(r'D:\examplepdf.pdf' , password = 'abc') as pdf:
    first_page = pdf.pages[0]
    print(first_page.extract_text())
Inhalator answered 19/10, 2021 at 13:59 Comment(0)
B
1

I have the answer for this question. Basically, the PyPDF2 library needs to install and use in order to get this idea working.

#When you have the password = abc you have to call the function decrypt in PyPDF to decrypt the pdf file
filePath = raw_input("Enter pdf file path: ")
f = PdfFileReader(file(filePath, "rb"))
output = PdfFileWriter()
f.decrypt ('abc')

# Copy the pages in the encrypted pdf to unencrypted pdf with name noPassPDF.pdf
for pageNumber in range (0, f.getNumPages()):
   output.addPage(f.getPage(pageNumber))
   # write "output" to noPassPDF.pdf
   outputStream = file("noPassPDF.pdf", "wb")
   output.write(outputStream)
   outputStream.close()

#Open the file now
   if sys.platform.startswith('darwin'):#open in MAC OX
       subprocess.call(["open", "noPassPDF.pdf"])
Beaverboard answered 23/10, 2014 at 21:22 Comment(1)
"raise NotImplementedError("only algorithm code 1 and 2 are supported")" error.Avidity
H
1

Updated version of Bohumir Zamecnik's Code. for PyPDF 3.0.0 and Above

from PyPDF2 import PdfReader, PdfWriter

def decrypt_pdf(input_path, output_path, password):
  with open(input_path, 'rb') as input_file, \
    open(output_path, 'wb') as output_file:
    reader = PdfReader(input_file)
    reader.decrypt(password)

    writer = PdfWriter()

    for i in range(len(reader.pages)):
      writer.add_page(reader.pages[i])

    writer.write(output_file)

decrypt_pdf('encrypr.pdf', 'decrypted.pdf', 'password')
Hermaherman answered 28/10, 2023 at 6:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.