Use of XSL-FO, CSS3 instead of CSS2 to create Paginated documents like PDF?
Asked Answered
P

5

34

There are a lot of old texts, like this 2002 book, stating that we must use "CSS for Web" and "XSL-FO for print". I think in nowadays (2012) we can, finally, to use CSS with render engines that understand paged media of CSS2 and something of CSS3... But where the "new texts", the consensus of programmers, and the investment of softhouses?

XSL-FO or "XSL Formatting Objects" (a W3C standard) was the most often used technology to generate PDF documents, from XML or XHTML content. Version 1.1 of XSL-FO was published in 2006, 1.0 in 2001.

CSS2.1 is from 2011, but CSS2.0 is a 1998 standard, revised in 2008... I think standard ages are not a problem. CSS with HTML, XHTML or XML have "the power of print": see tools like PrinceXML, WebKit print module (or wkhtmltopdf), ABCpdf and others.

Choosing between CSS and XSL-FO: with CSS2 you can fit the text exactly to the paper page, etc. It's not a matter of pagination, multiple column layouts, place footnotes, running headers, or margins of a page... Both, CSS (paged media) and XSL-FO, are good standards to do this.

PS: there are some related questions/answers for this context, about webkit transform, converting with with PHP and about Generation PDF from HTML. No one with good answer for this presented question.

Polygon answered 17/5, 2012 at 18:43 Comment(8)
CSS2.1 was not from 2011; it was only made a W3C Recommendation in 2011, and has been around since CSS2 was revised (to, you guessed it, CSS2.1).Dituri
XSL-FO is a vocabulary for exactly describing the presentation and placement of print elements. CSS describes the styling of HTML. If you are starting with something which isn't HTML it's probably fairly tricky to transform it to HTML+CSS to achieve the exact desired result.Eulogize
CSS is much easier to develop, but XSL-FO employs a powerful engine of XSLT. So the choice should be taken based on whether or not your data is de-normalized for visualization. If it is normalized, you need an extra step of de-normalizing which is XSL. If the data is already de-normalized for this particular output document, I would just use CSS.Analog
And why in the first place would you want to "transform HTML into good PDF" when there is CSS2? Looks like you're asking another question that you should try and make more obvious. Do you require PDF for professional printing? What are the requirements you summarize as "good PDF"? TOC? Page references? Or just something that downloads in just one file, more or less seamlessy across browsers, and prints more or less OK?Timmy
Alain, because many of us WANT to "transform HTML into good PDF". We have HTML documents, reports, etc.. that we'd like to deliver using PDF. IE has made this possible for YEARS with a very simple use of a containing table and thead/tfoot. If HTML is to be the document standard moving forward as the world seems to be heading toward, print needs to be addressed and the most basic aspects of print are pagination, margins, headers and footers. They're being largely ignored, even with CSS3 (browser implementations anyway).Simplicidentate
BoltClock and Marcin: the question text was edited at 2012 for your comments. Alain and rainabba: please check de CSS3-page and PrinceXML "print powers" to undertand the context (CSS3 do all that XSL-FO do), and see my new post below, answering and updating more details.Polygon
About W3C standards: after years (since 1998 CSS2-page is near the same!) no CSS3-page, but perhaps CSS2.2-page is comming... So, standardization is still today the main problem.Polygon
ops, about W3C standards: it is coming! The old "css-page" is replaced by "css-break", and "paged media" to "fragmentation"... now is a Candidate Recommendation, w3.org/TR/css-break-3Polygon
P
35

Thanks all comments and answers!

Now, 2014, passed over 1.5 years of my post (May 17 '12), is time to consolidate: no answer was, for me, a "full answer", but all answers (see Nenotlep's and Alex's) contributed to form a big picture. My main motivation now, to consolidate, is the @mzjn's news (here) of 2013-11.

XSL-FO is officially dying

On Sat, 2013-11-02, Liam R. E. Quin wrote: "We have closed the Working Group because not enough people were taking part", W3C XML Activity Lead, about the failure of XSL-FO 2.0 continuity. (see a better copy here).

The last update for the Working Draft was in January 2012, and now confirmed: W3C stop developing XSL-2.

Why? It will be replaced by CSS3-page, see below.

PS: to discuss the "official statment", use https://mcmap.net/q/47144/-xsl-fo-is-xsl-fo-dead-technology-and-only-used-by-niche-companies-closed

CSS3 is officially growing

The standard CSS3-page is a draft, but many applications, like PrinceXML v9 and AntennaHouse Formatter v6 demonstrated that it is ready (!); and, the expected launch of HTML5 for 2014 is carrying along the forecast release CSS3.

So, I understand that for W3C, CSS3-page do all that we need to express good prints and good PDF.

Other motivations

One day, in a far future... PDF will dead — it is complex and is not part of the XML family or W3C investments —, and many claim that EPUB will replace it. This is another good motivation: tablet readers and PC browsers will print (HTML, XHTML and EPUB) as well as PDF. So PDF will be not necessary... And, for this day, the only standard need for, ex. Webkit printing project, will be the CSS3-page standard.

CSS3 is the key point in two strategic affairs: 1) to generate good PDF from XML or HTML contents; 2) to replace PDF.


NOTE: another 2014's updates for the links of the question: wkHtmlToPDF is now here. About "new texts", now we have many, see ex. Building Books with CSS3.



An updated answer for programmers, for this page's question, Why use XSL-FO instead of CSS2, for transform HTML into good PDF?

If you go further and implement a new system for XML-Publishing, there are no good reason to use XSL-FO. SUMMARIZING:

  • XSL-FO is a dead technology today, only used by niche companies, to give maintenance to legacy systems in big publishing companies, like Elsevier... Most writers/readers of Stackoverflow are from small and medium companies. Companies like O'Reilly Media, Inc. already use CSS3 for print.

  • CSS3 will replace CSS2, covering all gaps (and fears as @AlexS's) of CSS2.

  • today (2014), as you can check by Google or my links (see PrinceXML v9 and AntennaHouse Formatter v6), we have some good software to render content with CSS2 or CSS3.

  • as @bytebuster say, "CSS is much easier to develop" (and easier to learn!).

  • as I say above, CSS3 is not isolated, it is a piece of the "XML/HTML/SVG" family.

  • is much cheaper to develop "HTML+CSS templates" (hourly cost of a standard web designer doing a simple task), than "XSL-FO templates" (hourly cost of a rare professional in a complex task).

  • ....



News...

Jan'2016, the definitive CSS3 standard is coming!

About W3C standards: the old "css-page" was replaced by "css-break", and "paged media" to "fragmentation"... Now it is a Candidate Recommendation, see https://www.w3.org/TR/css-break-3

Apr'2020, Blimey, +4 years and nothing!... Ok, need more tests

Total 8 years from question's post, and 4 years from "css-break-3 fineshed!" announcement ...

Chrome was the first to finesh in 2019 but some was wrong in test validation team of W3C, and in 2020 back... Now the status (in 23 tests) is:

  • Chrome's Blink engine fail 1 test;
  • Firefox's Gecko engine fail 3 tests.

xxx

The draft now is here and tests here.

Polygon answered 17/5, 2012 at 18:43 Comment(11)
That's good points, but the reality is that apache fop seems to be the only available library to generate good PDFs. All those commercial products cost money and for most websites paying money for small functionality is not an option. And HTML can't replace PDF right now and in near future. So XSL-FO seems to be the only viable solution in the market.Contradance
Prince and others are free for personal use. For non-personal, even FOP is not a choice, it is not "100% professional", not generates "so good PDFs"; can not be used in a competitive magazine, nor in a publisher house... Need some adaptations or replace by a proprietary software. So the question today is not about use of free software, but about use of "free standards". XSL-FO is for "Adobe's ecosystem", CSS3 is for "W3C's ecosystem", that is free and interoperable with free softwares and another free standards.Polygon
" I understand that for W3C, CSS3-page do all that we need to express good prints and good PDF" - I disagree. I've just done a layout job in CSS; compared to FO there still are some omissions. There's no way to insert more than one text block in one header or footer block, for instance. You also have fewer options for conditional formatting: you can't say "do X if the previous paragraph had property Y".Felicita
Hum... But it depends on the tool that make your "layout job in CSS". You used something more professional like Prince? About "turing complete" logic, the intend to use CSS is to separate (!). Javascript is a standard of the CSS/XML/HTML econssistem: you can use XSLT (or XQuery etc.) to pre-processing, or use Javascript to run-time processing.Polygon
The example I mentioned (only one text block per header/footer block) seems to be a limitation of the specification, not a feature that's missing from my rendering engine (Antennahouse). Another example is keep-with-next: in CSS, this is a Boolean, in FO you can assign priorities.Felicita
@PeterKrauss - Why this? - [ANN] Apache FOP 2.0 Released - markmail.org/message/s5jox57bbhtkd5o6Cas
@AlexS, Apache 2.0 is a software, that meets the standard 1.1. This discussion is about the W3C XSL-FO standard 1.1, that never will be a 2.0.Polygon
@PeterKrauss - :) I noticed that a lot of the Pagination stuff has been absorbed into CSS3, so why would Apache release a new FoP now? That what I was asking.. Why? Why would they bother? or is FoP going to be made CSS3 Pagination compliant to pull in and push out PDFs?Cas
@AlexS Sorry ;-) Well, they have a niche, a lot of investment was made in the last 10 years, and Apache contributors are also from this alive niche, see ex. 3B2.Polygon
@PeterKrauss - Yeah. We were some of the pioneers of developing an Enterprise wide system using Xsl Fo in 2003. The best & only open standards option at the time compared to lot of proprietary stuff. I think enterprise systems will continue to use it for some time to come.Cas
More news: Blink (the Chrome's engine) is the first to passa all CSS-break tests (!), so we need to wait Firefox to realise the Recommendation.... test.csswg.org/harness/results/css-break-3_dev/groupedPolygon
M
9

Updated 01.10.2015

I used to do CSS to PDF (wkhtmltopdf) and XSL-FO to PDF and I prefer CSS, but there are lots of issues with it. IMO the best CSS/HTML to PDF renderer is wkhtmltopdf, but it has tons of problems like print-quality material issues, page breaking issues, CMYK coloring, exact positioning and fullscreen rendering.

Requirements like "move that box 1.8mm to the right and up so that it touches the top of the paper" and "we need the last page to be a 100% wide marginless table" are both quite doable in XSL-FO but in CSS it is too frightening to even consider. In some cases CSS just doesn't cut it as good enough software to render it doesn't exist even if the tags do. Even wkhtmltopdf (0.11, not sure about later) uses XSLT when rendering the TOC and doesn't really support @page.

I can't speak for PrinceXML as although it looks great I know in advance that the price tag would be impossible so it's not an option - I suspect this is true for a lot of developers and companies.

If there was better software to do the rendering and more user I really do think CSS would be a better option usually as it's so much nicer to write (both css and the source (x)html) and there are tons of editors out there. It's a bit like the old Linux vs Windows debate - IMO Linux is nicer to use but lacks the software, existing expertise and support that is often required.

And to echo the comments, source material is always an issue with CSS. CSS for XML is a bit uncharted territory and just about everything everywhere is XML. Unfortunately. I have a severe dislike for XML even though it's practically much more usable than (X)HTML.

Mendie answered 18/7, 2012 at 10:41 Comment(4)
Also, "with CSS2 you can fit the text exactly to the paper page" - that I think really depends on your rendering engine. With wkhtmtopdf I have tried to do this a few times and I have failed every time.Mendie
Hello, thanks a lot! Some checks: "IMO"="in my opinion"?; Can you try PrinceXML? There are free download (is a good reference for CSS2PDF usability), and you can generate a blank first page and cut it in Linux with pdftk. For personal use was my best choice.Polygon
I'm actually contributing to wkhtmltopdf (rainabba) lately in the hopes of using it to replace a buggy, overly-complicated and EXPENSIVE solution but headers/footers are still a major oversight which I hope to try and address. Only IE allows for decent control so far. I keep hearing that FF has some potential too, but (FF, really?)Simplicidentate
@Simplicidentate I see your two ducks and raise you one thanks: everything driving wkhtmltopdf forward is a great thing :-)! Lately there has been a few commits to antialize's repo, which is great too. I think I even saw someone do a new windows compile a while back in the mailing list. For me the biggest issues are page breaking, advanced positioning, text quality and image support. Although the image support is quite good, there are some vector formats that browsers in general don't like and I would like to see that change but I might have to wait a while for that.Mendie
F
4

One possible reason for banking on CSS rather than XSL-FO in the future is that the XML Print and Page Layout Working Group at W3C is no longer active. There was not enough interest to sustain this working group. The group published an XSL 2.0 working draft in early 2012, but now it seems quite unlikely that an updated W3C recommendation will ever emerge.

There is a very recent thread on the XSL-List mailing list about the reasons for closing the working group and about the future of XSL-FO vs. CSS. See http://markmail.org/thread/65j2ah2kulcp35fm.

And by the way, even though this is an interesting topic, I'm not sure if the question is a good fit for Stack Overflow. IMHO, it is more of an open-ended invitation to discuss something rather than a question about a specific, practical, answerable problem.

Farriery answered 3/11, 2013 at 18:32 Comment(1)
There is a Print and Page Layout Community Group at W3C. This is an open forum that is a something of a continuation of the defunct Print and Page Layout Working Group.Farriery
C
3

I agree with some of what has been posted by @Nenotlep. But I am not sure if CSS markup is yet as extensive for Paginated documents as XSL-FO. But I would not know that.

I also added this part to his answer because I was unable to "comment" on the answer.

There is some history to the whole issue.

Additionally, the richness of XSL-FO and its learnings & burn-in curve over the last 10+ years on the FO rendering has had quite a tenure to get "more" things ironed out.

I was responsible for proof of concept and prototyping an Enterprise wide XML Content related system for a Fortune 20 back in 2003.

One of the pieces of that system had to render PDF, Word, X/HTML versions of documents on the fly as people changed, added & modified content XML.

Even XSL-FO > PDF and to Word-ML had a bunch of teething issues at the time.

These were inherent due to the following reason:

  • Original and new goals and capabilities of the Markup & Styling languages
  • Ability & Limitations of the Final Rendering Component to accurately represent the given markup (i.e. XSL-FO to PDF Component or X/HTML to Screen via Web Browser)

It has been 10 years since I have been frequently hands on with XSL-FO / HTML/ CSS but the above issues were interesting to discuss with the Gods of XML/ XSL world at the time (Dave Pawson, Michael Kay, Wendell Piez etc.)

It is quite possible that all representative markup that XSL-FO had over CSS for Paginated output, is now (2013) possibly replicated in CSS3 and is rendered appropriately.

I hope this helps.

2017 Edit:

Apparently CSS is still playing catch up in some ways and I remember having most of this in 2003 - That is 14 years and in web tech that's an eon too slow :) .

https://twitter.com/t_machine_org/status/917025348646199297

enter image description here

Cas answered 4/10, 2013 at 12:14 Comment(3)
Hello Alex, thanks for the historic perspective (!). Your 10-years-experience is an important reference to a fuller discussion: maybe you can improve your text and indicate tool-names, companies, and dates (there are no enemies)... Well, there are an "open war", and today Adobe is wining. InDesign is so popular and all "PDFdesigner" knows it. Adobe-InDesign is closed, not exports good XML or XHTML+CSS2, and not imports XHTML+CSS2 or XML+CSS2 standard layouts; Adobe avoids to support good ePUB or "full and open XML-publishing".Polygon
I believe so, "all representative markup that XSL-FO had over CSS for paginated output, is now (2013)" replicated in CSS3, but, perhaps, due to the lobbying of Adobe, CSS3-page at March'2013 is only a draft... Today, if your "PDFdesigner" is a PDFdesigner code and for templates and automation PDFdesigners+programmers are merged into one team you can use PrinceXML; else the only "standard and inexpensive" tool is a pirate-InDesign.Polygon
Do elaborate what you mean by this tool-names, companies, and dates (there are no enemies). What tool names do you want? The slight difference here is that we used XML Data + XSLT(containing XSL-FO markup) to create .FO (XML Data inside XSL-FO markup), which was fed to an XSL-FO (.FO) to PDF engine. e.g. RenderX XEP etc. We had 3 different flavors of XSLTs - Common Logic for XML parsing, but different output markup - HTML, FO, Word ML. We were hand coding using XMLSpy; no WYSIWYG. Now, StyleVision & other tools exist. PS: I've got no exp with InDesign (Is it like Dreamweaver?)Cas
C
-3

As far as I know you cannot generate SVG charts or SVG barcodes with CSS.

Capping answered 13/7, 2012 at 8:53 Comment(1)
False assertion. PrinceXML (and I think wkhtmltopdf and others) YES, can transform (HTML+SVG+CSS) or (XHTML+SVG+CSS) or (XML+SVG+CSS) into good PDF.Polygon

© 2022 - 2024 — McMap. All rights reserved.