Convert PDF to HTML [closed]
Asked Answered
A

4

20

What is the best solution to convert PDF documents to be viewed in the browser as HTML? The site has several PDF documents and the visitor can click on view as HTML and this should be viewed on the screen as an HTML file.

Standard website running PHP, Linux, Apache.

Ancel answered 5/6, 2009 at 15:29 Comment(1)
Have you looked into pdfjs? mozilla.github.io/pdf.jsGlynas
C
6

pdftohtml works fine : fast, stable but the html result is ugly at best. I have used it for quite some time for a web site that has many job resumes.

It is a good solution for extracting textual content however.

I would give the scribd API a try

or the google apps document API. GOogle does a great job a displaying and converting pdf files

Coterie answered 12/2, 2010 at 6:1 Comment(3)
For scientific papers, this looks unbelievable: github.com/coolwanglu/pdf2htmlEXFelafel
@Felafel one caveat: the resulting HTML code is unreadable, generally uneditable, and takes megabytes of space — at least for the PDF I tested it on (2.8 MiB HTML for a 674.5 KiB PDF). This large size makes it in particular bad for serving and makes a bad experience of reading (sluggish scrolling etc.).Kala
@Felafel the result looks good, but html is basically useless - it breaks words apart, encloses each part in various tags, extracts fonts for each size (of the same font) and embeds them making the file huge (as Ruslan said). you're better off converting PDF to a PNG image than using pdf2htmlEXNorthampton
P
4

Have you considered keeping the PDF data in a database and then either dynamically creating the PDF or the html page depending on what the visitors select?

Project answered 5/6, 2009 at 16:8 Comment(0)
N
4

If you have command line access at your hosting provider, there is a utility called pdftohtml inside of the poppler_utils package.

http://poppler.freedesktop.org/

Looks quite easy to use, have not called it from inside of PHP, but it should work.

Northernmost answered 5/6, 2009 at 17:18 Comment(1)
pdftohtml doesn't preserve styleNorthampton
T
1

If you are prepared to call Java from PHP you could have a look at http://www.jpedal.org/html_index.php

Tergiversate answered 17/1, 2012 at 8:0 Comment(1)
Yeah with a yearly 3000$ USD license...Beldam

© 2022 - 2024 — McMap. All rights reserved.