Convert pdf, doc, ppt to html5 [closed]
Asked Answered
S

6

39

I've googled (without any luck) for open source software that can convert doc, ppt, and pdf to HTML5. (Exactly what Scribd does) Are there open source equivalents to the type of conversion Scribd does?

If anyone knows of a paid service, that would also work. Scribd has an API, but that's for use with the flash viewer. Also, I would like to host my own content as I need further control over converted html document.

Skindeep answered 7/7, 2010 at 23:28 Comment(0)
C
15

You're unlikely to find a single offering that does all this, especially in the open source world. It's more likely that you'll end up relying on a mishmash of things, and may even need to chain some converters in order to get to HTML. (Eg PDF -> ps -> HTML)

OpenOffice supports conversion to HTML, and can be called from the command line.

http://pdftohtml.sourceforge.net/ looks reasonably good at converting pdf to html.

For Doc that is Word ML or OpenXML format it's conceivable that you could use XSLT transforms since both input and output formats are XML. I've seen some stylesheets floating around the net that do this, but YMMV.

Incidentally, why is there a specific requirement for open source? MS Powerpoint already supports save-as-HTML for example.

Claudell answered 3/9, 2010 at 7:15 Comment(2)
the old version of powerpoint used to export a series of images for content like pdfs to html. This may have changed as I haven't tried it in a while.Lynnelle
Pdftohtml simply gives a single html page output but looks are not very nice.as the complex output gives nice html output but then it creates separate html document for every page in pdf.That might not be feasible for the large number of pdf files.Farceuse
L
5

Open Office will convert pdf to html but you'll take a hit to design quality.

I suggest either: Crocodoc as a paid service (It provides different flavours for different platforms such as Python,Ruby,Java,PHP Developers are allowed to work on their APIs.) or waiting for an official Adobe tool (it's in the works).

Lynnelle answered 18/2, 2011 at 21:42 Comment(0)
C
3

For PDF to HTML conversion, pdf2htmlEX seems like a pretty good tool (looking at all the examples/samples):

https://github.com/coolwanglu/pdf2htmlEX

Cactus answered 2/10, 2013 at 20:56 Comment(0)
R
1

http://wvware.sourceforge.net/

wvHtml: convert your Word document into HTML4.0.

Possibly: http://www.abisource.com/ but in this case it looks like "open doc" > "export html" manually, maybe plugins help. Not sure, what do you mean: "source software that can convert".

Or this: http://www.zope.org/Members/sf/NuxDocument

Also the pdftohtml will give you an html page output.But you will have to work upon its graphical interface.Since it doesn't seems to be very interactive.

Rondure answered 7/7, 2010 at 23:28 Comment(0)
F
1

For pdf there is an open source project started by mozilla and it's very good: https://github.com/mozilla/pdf.js/

You can see a hello world example : https://github.com/mozilla/pdf.js/tree/master/examples/helloworld

For the rest of document types I think LibreOffice said that are planning to build something in html5, but so far there isn't anything done.

Fearsome answered 11/6, 2013 at 8:10 Comment(0)
S
-1

I know the question is bit old however I have found new Open source tool called flaxpaper http://flexpaper.devaldi.com/

Septillion answered 27/9, 2013 at 9:16 Comment(1)
ok it WAS open source now they charge for service. things changed over the yearsSeptillion

© 2022 - 2024 — McMap. All rights reserved.