Will HTML 5 validation be worth the candle?

Asked 11/1, 2009 at 13:35 Answered 27/2, 2015 at 0:18

It's widely considered that the best reason to validate one's HTML is to ensure that all browsers will treat it consistently and predictably.

The HTML 5 draft, however, contains two specifications in one. First an author spec, describing the elements and attributes that HTML authors should use, and their interrelationships. Validation of an HTML 5 page is based on this spec. The elements and attributes included are not directly drawn from HTML 4, but have needed to be justified from first principles, which means that some HTML 4 features, such as the summary attribute on <table>, longdesc on <img> and the profile attribute on <head>, do not currently appear in this draft. Such features are not considered deprecated, they are simply not included. (Their absence from the draft remains a matter of dispute, although their inclusion any time soon does not seem likely.)

Second, the draft defines a browser processing specification that seeks to define exactly how a browser's parser will treat any byte stream it's given, regardless of how well formed and valid the HTML. This means that when the browsers fully support HTML 5, it will be possible to predict how any browser will treat HTML for a much wider range of inputs than merely those that pass validation.

In particular, because HTML 5 is defined to be 100% backward compatible with today's web, all valid HTML 4, and all invalid but commonly used mark-up, will continue to be processed exactly the same as it is today, regardless of whether it is HTML 5 valid or not.

Therefore, at the very minimum, anyone using any feature from HTML 5, HTML 4, or any previous version of HTML, plus many proprietary extensions, can be confident that their HTML will get consistent and predictable treatment across all browsers.

Given this, does it make any sense to limit ones HTML 5 to that which will validate, and what practical benefit will we get from doing so?

Apprehensible answered 11/1, 2009 at 13:35 Comment(7)

"It's widely considered that the best reason to validate ones HTML is to ensure that all browsers will treat it consistently and predictably." I assume by browsers, you didn't mean IE :) There are lots of standard stuff that fail to work consistently on IE. – Depot 11/1, 2009 at 13:45

Apart from the "Q" element, what other valid HTML doesn't work correctly on IE? Note that we're not talking CSS or JavaScript here, just HTML. – Apprehensible 11/1, 2009 at 15:32

“when the browsers fully support HTML 5 it will be possible to predict how any browser will treat HTML for a much wider range of inputs than merely those that pass validation.” — Oh, such sweet, untrammelled, blissfully naïve optimism. Have fun waiting for all browsers to implement the same HTML parsing algorithm with 100% accuracy. – Institutive 6/4, 2011 at 8:32

@Paul - lol. Yeah, I'm not holding my breath. But then, it's just like any other aspect of HTML5. Many parts are implemented consistently even if others are implemented patchily or inconsistently. As web authors, we can just use the reliable bits. – Apprehensible 6/4, 2011 at 8:57

yeh, true. And Firefox actually includes an HTML5 parser now, right? – Institutive 6/4, 2011 at 9:27

So that's an idiom I haven't heard before. Where does "worth the candle" come from? – Tinware 22/7, 2011 at 20:51

@james - This seems to explain it pretty well: phrases.org.uk/meanings/260900.html – Apprehensible 26/7, 2011 at 13:31

First there’s the layer of validity corresponding to “parse errors” in the HTML5 parsing algorithm. This layer is similar to XML well-formedness. The foremost reason to avoid having errors in your documents on this layer is that you may get a surprising parse tree. If your document is error-free on this layer, you get fewer suprises to debug when writing JS or CSS that works with the DOM.
As a special case of the above-mentioned layer, there’s the HTML5 doctype: <!DOCTYPE html>. The reason why one would want to comply here is getting the standards mode in the easiest way possible. It’s something you can memorize unlike the HTML 4.01 or XHTML 1.0 doctypes you need to look up and copy and paste each time. Of course, the reason why you’d want the standards mode is fewer surprises on the CSS layer.
The main reason to care about validation on the layer higher than the parsing algorithm is catching your typos so that you spend less time debugging why your page isn’t working like you are expecting.
The previous point does not explain why you should care about validation when a given element or attribute that you did not misspell is supported by browsers as a matter of legacy but the HTML5 spec still shuns it. Here’s why HTML5 has obsoleted syntax like this:
- HTML5 uses obsoletion to signal to authors that some features are a waste of their time. These include longdesc, summary and profile. (Note that people disagree on whether these are, indeed, waste of time, but as currently drafted, HTML5 makes them obsolete.) That is, if you have limited resources to improve accessibility, your limited resources are better spent on something other than longdesc and summary. If you have limited resources for semantic purity, your resources are better spent on something other than making sure you have the right incantation in profile.
- HTML5 obsoletes some presentational features that can be duplicated in CSS to guide authors to use CSS for their own good. This way, authors who don’t consider maintainability on their own are supposed to be guided to more maintainable code nonetheless. Personally, I’d prefer making more of the legacy presentational stuff conforming and leaving it to authors themselves to decide which way of doing things works for them.
- Some things are obsoleted for political reasons. The <font> element is obsoleted, because making it conforming would make anti-<font> standardistas think that the HTML5 people have gone crazy, which could lead to bad PR. <applet> is obsoleted mainly as a matter of principle of not giving special markup to one particular plug-in. The classid attribute on <object> is obsoleted, because it’s in practice ActiveX-specific.
- Some things are obsoleted on the basis of language design aesthetics. This includes the name attribute on <a> and the language attribute on <script>.

(I develop the Validator.nu HTML5 validator which is also the HTML5 validation engine used by the W3C validator.)

Mosaic answered 15/1, 2009 at 13:35 Comment(2)

Thank you Henri, for a very considered and detailed answer. – Apprehensible 15/1, 2009 at 14:12

Great answer. I think parse errors are the main reason to validate. It's just fine to use some html5 in xhtml right now. Why not use an email field instead of a text field. It will just revert to a text field if the browser does not know this field type. This might make the page invalid but it gives a better result. – Sollows 13/9, 2010 at 2:31

Given this, does it make any sense to limit ones HTML 5 to that which will validate, and what practical benefit will we get from doing so?

Yes, of course. You forget that the future is not fixed. In particular, you implicitly assume that HTML 5 specs will never change, and never deprecate any features. This, of course, only cements the status quo. It is definitely desirable to remove support for some features in long term, to make it easier for new developments to take place (in particular if these might conflict each other).

There may be no immediate benefit in producing valid HTML 5 (except that it still makes validation and thus development easier). But there may be a long-range benefit if most websites improve in quality because it makes moving on beyond the current technologies and standards much easier.

Ellen answered 11/1, 2009 at 13:57 Comment(3)

There's an implicit assumption that features won't be dropped, yes. Browsers only ever drop features though if a) they won't break the extant web - which is extremely rare; or b) the feature is shown to be a security vulnerability - which can apply to valid and invalid HTML alike. – Apprehensible 11/1, 2009 at 15:41

This is very optimistic of your part. I fervently hope that this will not be the case because it's ultimately stupid and just means that future browser versions have to carry around more and more useless ballast. – Ellen 11/1, 2009 at 23:20

@KonradRudolph, The sad thing is that that's how the world works: #23454121 . Dang economics. – Shipboard 4/5, 2014 at 10:6

Validation has never really been about getting consistent results across browsers, even before HTML5 began. That's a myth propagated by those who don't understand what they're talking about, even if they think they do.

The real reason for validation is and always has been purely an issue of quality assurance. It's just a way of detecting errors, which . Even though results for any given error may be, or may soon become, consistent among browsers, it's still possible that the result itself is not as intended.

It's important for authors to be able to catch errors in their code because cleaner, error free markup is easier to work with and maintain, especially when working in a team environment. While most individual errors may end up being benign and not cause any major problems, there are some that can give unexpected results. e.g. Incorrectly, overlapping or unclosed elements can cause unexpected layout problems in some cases, and letting a validator tell you where the error is, helps in rectifying the problem. But if the results are filled with dozens of otherwise benign errors, it can make the detection and process more difficult than need be.

Counterwork answered 12/1, 2009 at 12:7 Comment(0)

This is, indeed, one of my quibbles with HTML5. There's no point defining a subset of streams as 'valid', if a browser must handle all streams in the same way anyway. The eons spent on the WHATWG list debating fallback mechanisms is a massive waste of everyone's time, especially when XML should already have solved all the parsing issues.

It would have been an useful idea to produce a best-practices document on parsing legacy invalid documents but this has no part in a web standard, it's just another ingredient to further muddy the waters around HTML5, which can't decide whether it wants to be codifying existing behaviour (like HTML 3.2 did), redefining a cleaner platform (like HTML 3.0 tried) or adding new extensions piecemeal.

Anyhow, the question may be misplaced because there will never be a browser that "fully supports HTML5". There is just far, far too much of it: browser manufacturers could not implement absolutely everything down to the minutiae even if they wanted to, which at least Microsoft explicitly do not. Instead, obviously useful features will be cherry-picked from it by vendor and meet wider acceptance.

HTML5 is not a coherent HTML specification, it's Hixie's sprawling, unreadable and unfinished recipe for every random thing he thinks a web browser should do. It will fail. And W3's alternative approach, XHTML2, has already failed. There is no coherent future direction for web standards. We have dropped the ball.

Pelvis answered 11/1, 2009 at 23:43 Comment(6)

@bobince. The WHATWG is dominated by the browser manufacturers (other than MS). Hixie doesn't put stuff into the HTML5 draft if the browser manufacturers say they won't implement it. Indeed, that's probably his number one reason for not including stuff. – Apprehensible 16/1, 2009 at 9:55

There is lots in HTML5 that has little-to-zero vendor buy-in. It's not as bad as it used to be - some of the early drafts read like little more than "the year's most popular suggestions from fools on alt.html" - but there's still a lot of spurious stuff... – Pelvis 16/1, 2009 at 12:11

...luckily some of the worst features, such as data binding (which we should have learned from IE was a terrible idea) and repetition templates (which is the ugliest proposal I've eer seen in a putative standards document) have recently bit the dust. – Pelvis 16/1, 2009 at 12:12

Sounds to me like a drafting process that's working well. HTML 5 raison d'etre is to document the web as it actually is, rather than how people would have liked it to have been. I don't see any possibility that unimplementable features will remain by the time it reaches W3C recommendation. – Apprehensible 16/1, 2009 at 16:33

If HTML5 really was just that, catching up to reality, I'd be happy. Unfortunately, it's adding reams of features and requirements at the same time- none of them unimplementable, sure (neither was even repetition templates) but some of them ugly. This should have been done as a two-stage process. – Pelvis 17/1, 2009 at 22:57

Alohci: Indeed, nowadays even the W3C Process requires two interoperable implementations for a feature to reach Proposed Recommendation. – Cajole 31/5, 2011 at 8:47

It's a good question.

The primary purpose of validation (for me at least) is to help me catch errors in my markup, and to give me a good base on which to build when testing pages in different browsers; if the markup is valid, and the page is borked in IE6, it's an IE6 issue.

The fact that browsers should all still behave in a predictable manner even if your markup includes technically invalid HTML5 such as a table summary, or an anchor accesskey, muddies the waters somewhat.

As a general rule of thumb, I'd always want my pages to validate, for the aforementioned reason. However, if (for example) an attribute was dropped from the HTML5 spec without an apparently suitable replacement being added, I might be inclined to continue using the deprecated or obsolete attribute, and accept the validation errors.

As ever, I think it's a case of knowing your craft.

If you know what you're doing, and have made a conscious decision to build a page that doesn't validate for sound reasons, it's not a problem. If you're just writing code that doesn't validate because you don't know any better, that's another matter entirely.

Stephen

Titanomachy answered 11/1, 2009 at 21:49 Comment(0)

W3C HTML5 validator maintainer here. I recently wrote a short “Why Validate?” section for the “About” section of the HTML5 validator:

http://validator.w3.org/nu/about.html#why-validate

The source for the text of that section is here:

https://github.com/validator/validator/blob/master/site/nu-about.html#L160

And pull requests with suggested refinements/additions are welcome.

What I have there currently is this:

The core reason to run your HTML documents through a conformance checker is simple: To catch unintended mistakes—mistakes you might have otherwise missed—so that you can fix them.

Beyond that, some document-conformance requirements (validity rules) in the HTML spec are there to help you and the users of your documents avoid certain kinds of potential problems. To explain the rationale behind those requirements, the HTML spec contains these two sections:

rationale for syntax-level errors

rationale for restrictions on content models and on attribute values

To summarize what’s stated in those two sections:

There are some markup cases defined as errors because they are potential problems for accessibility, usability, interoperability, security, or maintainability—or because they can result in poor performance, or that might cause your scripts to fail in ways that are hard to troubleshoot.

Along with those, some markup cases are defined as errors because they can cause you to run into potential problems in HTML parsing and error-handling behavior—so that, say, you’d end up with some unintuitive, unexpected result in the DOM.

Validating your documents alerts you to those potential problems.

Deucalion answered 27/2, 2015 at 0:18 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags