Convert PDF to HTML using python and pdfkit

About

Asked 16/3, 2023 at 13:42 Answered 16/3, 2023 at 14:17

On this site Adobe write about conversion from pdf to html using pdfkit

They use pdfkit.from_pdf(...) method.

This script uses the ‘pdfkit’ library to convert the PDF file to HTML. To use this script, you will need to install the ‘pdfkit’ library...

When I want to use this method I have error

Traceback (most recent call last):
  File "C:\TestPdfToHtml\script.py", line 7, in <module>
    html_file = pdfkit.from_pdf(pdf_file, "my_html_file.html")
                ^^^^^^^^^^^^^^^
AttributeError: module 'pdfkit' has no attribute 'from_pdf'. Did you mean: 'from_url'?

How can I resolve this problem?

Below is the full script

import pdfkit
# Read the PDF file
pdf_file = open('test2.pdf', 'rb')
# Convert the PDF to HTML
html_file = pdfkit.from_pdf(pdf_file, "my_html_file.html")
# Close the PDF file
pdf_file.close()

Daveta answered 16/3, 2023 at 13:42 Comment(5)

What does the documentation of package pdfkit say? – Uralaltaic 16/3, 2023 at 13:46

Hello, based on the library description Wkhtmltopdf python wrapper to convert html to pdf, are you sure that this is the correct tool for doing this kind of conversion?, I mean convert from pdf to html. – Iveson 16/3, 2023 at 13:49

That seems to be the only page on the entire internet claiming that pdfkit has a from_pdf() function. A thing you can try is seeing if its from_file() function (which exists) happens to open a PDF, something I would not bet on. – Glassware 16/3, 2023 at 13:59

Related: #34838207 – Glassware 16/3, 2023 at 14:2

In documentation of package pdfkit are only 3 functions, from_string/file/html and the doc says nothing about conversion pfd to html, maybe adobe is trolling... – Daveta 16/3, 2023 at 14:6

-8

Maybe the newer version of pdfkit does not support pdfkit.from_pdf. You can try pdfkit.from_file()

pdfkit.from_file(pdf_file, html_file)

Hope this helps.

Turboprop answered 16/3, 2023 at 14:17 Comment(2)

Downvote. I'm pretty sure you didn't try this. It doesn't work. When I try this with an actual PDF input file (even an extremely simple one generated by pdfkit itself), I get: wkhtmltopdf exited with non-zero code 1. error: Exit with code 1, due to unknown error. I'm pretty sure pdfkit just can't do this. I wonder why Adobe posted that nonsense... – Salena 24/5, 2023 at 17:36

This doesn't appear to work, from_file appears to expect a html file as an input and a pdf file as an output – Bushtit 5/6, 2023 at 8:39

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags