Can I do this conversion with any programming language or library?
The short answer is yes, it can be done in any programming language.
Basic steps:
- Convert your HTML to XHTML (+ CSS). This can be done in your program or through an XSLT file.
- Copy your files (XHTML, CSS, any images and fonts) into a directory structure that follows the format.
- Zip the directory structure up and name the archive with a ".epub" extension.
Some web sites to help you get started:
- A good tutorial for what's in an epub file (and how to create one yourself) can be found here: http://www.jedisaber.com/eBooks/Introduction.shtml. I used this to get started myself.
- Specs for the .epub standard are here: http://www.idpf.org/
- A validator for .epubs can be downloaded from here: https://github.com/IDPF/epubcheck
June 2015 Note: The epubcheck validator has moved from google code to GitHub; note the new URL.
Calibre supports a wide variety of input formats, including HTML, and a wide variety of output formats, including EPUB, but it's not "a programming language or library". Are there specific reasons you desire a programming-based approach rather than a free-standing tool? If so, maybe Python and ebookmaker.py, for example, could help you.
A late reply, but I found the Python 3-based ebookmaker to be of value, at least after I contributed a pull request to remove a UTF-8 BOM. One problem with it appears to be that it uses brittle regular expressions to parse HTML, but I guess I'll have to report it there.
Here's pdf to epub, I know that's not what you're after, but it's a start.
The calibre package may have what you want
I am using the following library from Aspose - http://www.aspose.com/categories/.net-components/aspose.words-for-.net/default.aspx
In just two lines of code I am able to do html to epub conversions. Using this currently in a production system.
Document doc = new Document(_sourceFilePath);
doc.Save(_destinationFilePath, SaveFormat.Epub);
I just started to implement such a tool in Java (OpenJDK compatible): html2epub. In order to get rid of manually editing the config file, I'll probably start a separate tool to generate the config file from any given directory (however, it would still be necessary to determine the order of the XHTMLs in the EPUB - for non-programmatical use, developing a GUI helper tool could be considered, for a fully flexible programmatical solution, I haven't come up with an idea yet). Before that, I implemented shell script based converters for custom XML input (hag2epub tools) - in case you're interested, I would probably port them to XHTML input (with a config file for the EPUB metadata or obtaining metadata from the topmost index.html of a directory, if existing).
I have the same issue previously, necause I want to read some webpage content offline on my iPad. I have no idea and I am not a computer savvy. There are calibre or stanza blabla....
But for me they are just formats converters and I need a ePub book creator which will allows me to combine many desired documents together to read. Then I found a bookish html to ePub converter, I save the html page from web then convert with it. It's a quite good tool for me now.
© 2022 - 2024 — McMap. All rights reserved.