Convert PDF to HTML using python and pdfkit
Asked Answered
D

1

9

On this site Adobe write about conversion from pdf to html using pdfkit

They use pdfkit.from_pdf(...) method.

This script uses the ‘pdfkit’ library to convert the PDF file to HTML. To use this script, you will need to install the ‘pdfkit’ library...

When I want to use this method I have error

Traceback (most recent call last):
  File "C:\TestPdfToHtml\script.py", line 7, in <module>
    html_file = pdfkit.from_pdf(pdf_file, "my_html_file.html")
                ^^^^^^^^^^^^^^^
AttributeError: module 'pdfkit' has no attribute 'from_pdf'. Did you mean: 'from_url'?

How can I resolve this problem?

Below is the full script

import pdfkit
# Read the PDF file
pdf_file = open('test2.pdf', 'rb')
# Convert the PDF to HTML
html_file = pdfkit.from_pdf(pdf_file, "my_html_file.html")
# Close the PDF file
pdf_file.close()
Daveta answered 16/3, 2023 at 13:42 Comment(5)
What does the documentation of package pdfkit say?Uralaltaic
Hello, based on the library description Wkhtmltopdf python wrapper to convert html to pdf, are you sure that this is the correct tool for doing this kind of conversion?, I mean convert from pdf to html.Iveson
That seems to be the only page on the entire internet claiming that pdfkit has a from_pdf() function. A thing you can try is seeing if its from_file() function (which exists) happens to open a PDF, something I would not bet on.Glassware
Related: #34838207Glassware
In documentation of package pdfkit are only 3 functions, from_string/file/html and the doc says nothing about conversion pfd to html, maybe adobe is trolling...Daveta
T
-8

Maybe the newer version of pdfkit does not support pdfkit.from_pdf. You can try pdfkit.from_file()

pdfkit.from_file(pdf_file, html_file)

Hope this helps.

Turboprop answered 16/3, 2023 at 14:17 Comment(2)
Downvote. I'm pretty sure you didn't try this. It doesn't work. When I try this with an actual PDF input file (even an extremely simple one generated by pdfkit itself), I get: wkhtmltopdf exited with non-zero code 1. error: Exit with code 1, due to unknown error. I'm pretty sure pdfkit just can't do this. I wonder why Adobe posted that nonsense...Salena
This doesn't appear to work, from_file appears to expect a html file as an input and a pdf file as an outputBushtit

© 2022 - 2024 — McMap. All rights reserved.