Where is the HTML5 Document Type Definition?
Asked Answered
L

5

73

The "old" HTML/XHTML standards have a DTD (Document Type Definition) defined for them:

HTML 4.01 http://www.w3.org/TR/html401/sgml/dtd.html
XHTML 1.0 http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_XHTML-1.0-Strict

This DTDs specify the rules for nesting elements - "which types of elements may appear in which types of elements". I made a diagram for XHTML 1.0 here (sorry, I no longer have that resource)

I would like to update that diagram with a new version which also includes the new HTML5 elements. However, there doesn't seem to be a HTML5 DTD. It seems that the nesting rules are defined by the various content models that are defined in HTML5.

So there is no DTD, correct?

Follow-up question: Is there a reason why there is no DTD in HTML5? The DTD is such a nice method of defining the nesting rules for all the different types of elements. Why wouldn't they include such a thing?

Update: I found this: http://www.w3.org/TR/html5/dom.html#kinds-of-content I guess, this is the closest to having a DTD.

Update: The Visual Studio Team made a XML Schema for XHTML5. I guess that answers my question: Link

Lenora answered 29/10, 2010 at 16:31 Comment(5)
The link http://vidasp.net... redirected me hereMenarche
@pythonforspss.org Yes. I no longer have that domain. I've removed that link. Thanks for informing me.Prakash
Same for XSD: #5638866Tabret
This is not an answer, but if you're still interested in the question, you might be interested in this: github.com/unsoup/validatorDetrital
Keep in mind that the ‘HTML Living Standard’ is not a ‘standard’ in the traditional sense at all; it might be more usefully be understood as a sort of crowd-sourced documentation describing what the ‘major’ browser engines generally aim to support at the current moment in time.Untouchable
S
68

There is no HTML5 DTD. The HTML5 RC explicitly says this when discussing XHTML serialization, and this clearly applies to HTML serialization as well.

DTDs have been regarded by the designers of HTML5 as too limited in expressive power, and HTML5 validators (basically the HTML5 mode of http://validator.nu and its copy at http://validator.w3.org/nu/) use schemas and ad hoc checks, not DTD-based validation.

Moreover, HTML5 has been designed so that writing a DTD for it is impossible. For example, there is no SGML way to capture the HTML5 rule that any attribute name that starts with “data-” and complies with certain general rules is valid. In SGML, attributes need to be listed individually, so a DTD would need to be infinite.

It is possible to design DTDs that correspond to HTML5 with some omissions and perhaps with some extra rules imposed, but they won’t really be HTML5 DTDs. My experiment with the idea is not very encouraging: too many limitations, too tricky, and the DTD would need to be so permissive that many syntax errors would go uncaught.

Stood answered 6/3, 2013 at 11:14 Comment(4)
DTD is an SGML and XML thing. An XML DTD is even more limited in expressive power than an SGML DTD; XML is a simplification of SGML in this area, too.Stood
@JukkaK.Korpela - not sure if still care about this, but the <colgroup> entry in your faux HTML5 DTD seems to be definitely incorrect. The only allowed child is <col>, and this doesn't seem to be included, while invalid children are listed via %phrase;Erudite
All this is so unfortunate; there are somethings that make no sense doing, e.g. a head tag inside a body tag, or a div inside a span. So, there should be a way to validate your HTML syntax, just like a javascript would throw logic errors when you commit a logic mistake.Sweepback
@Sweepback I agree; though perhaps you hit the nail on the head there. You could write a validator in javascript at leastBurkes
L
24

Correct. There is no DTD. However, HTML5 documents should start with <!DOCTYPE html> So there's a DOCTYPE, but no DTD.

See:

Lentz answered 29/10, 2010 at 16:34 Comment(8)
@Lentz The DOCTPYE has no reference to a DTD, obviously. However, it would be nice if there would be an "unofficial" DTD just for the sake of having a good overview of the nesting rules...Prakash
@Šime Vidas The DTD is from HTML's SGML roots. HTML5 is no longer based on SGML so there is no DTD.Lentz
+1 concise answer. Also, worth to mention that HTML5 is currently a working draft, with a bunch of changes in the last months. A DTD makes sense after reaching a stable status, which is not the case right now. Despite it is safe to assume and use some elements and APIs which are stable, but the whole spec isn't.Rusell
@Lentz But what about XHTML5? It is an application of XML. So, it should have a DTD or XML Shema, right?Prakash
@Šime Vidas Good point. I didn't know about XHTML5. You're right, it should be possible to create one. I did a quick search to see if anyone had made one and I found johndyer.name/post/… and for HTML5 entities w3.org/2003/entities/2007/w3centities-f.entLentz
@Lentz Excellent. The link to the XML Shema is here: blogs.msdn.com/b/webdevtools/archive/2009/11/18/…Prakash
@Lentz Sorry to respond to this old thread but if I leave out the DTD, then how can the browser know how I'd like it to interpret the HTML it receives?Menarche
@pythonforspss.org The browser knows from the doctype that the document is HTML5. Modern browsers know how to interpret HTML5.Lentz
E
8

I have created an HTML5 DTD for use in my PHP XML projects. It ain't beautiful, but it works with well-formed XHTML5 (that is, HTML5 expressed as XML).

You can grab it from my bitbucket account here:

https://bitbucket.org/kashbridge/dtd/overview

Enjoy!

Extraterrestrial answered 16/2, 2015 at 14:40 Comment(4)
The DTD by Jukka K. Korpela has been already mentioned by himself in the accepted answer, @Hibou57.Pereira
Another hand-rolled DTD for HTML5 is provided in an answer to this related questionNereid
Another answer to the same question also offers a possible solution, though it is not completely clear whether that DTD is actually open-source.Nereid
You can build a limited DTD for HTML5 in XML, but it won't allow you to fully validate the schema (notably for HTML5's "data-" attributes which can be freely customized) without adding an additional XSD for XML; there are also complex security issues when using the SGML parser for DTDs, even in XML; DTDs are too limited (and no extension to the SGML rules has been standardized that would allow defining "data-" attributes; as well DTDs do not allow restricting values for many HTML values, meaning that you need an additional validator for HTML5, but it was already the case in HTML4).Miscarriage
K
3

Certain Marcus from sgmljs.net created and analyzed an SGML DTD for HTML 5.1 and started a thread in the XML-DEV mailing list for review and discussion. The discussion revolves around entity definitions so far.

I've just completed my analysis of W3C's HTML 5.1 recommendation at http://sgmljs.net/docs/html5.html (from a markup language rather than web development PoV), and I'm publishing it here for review in the form of an initial SGML DTD for parsing HTML 5.1, along with a lengthy analysis text.

[…]

I'm aware that WHATWG and W3C have since long moved away from SGML (and XML in most web-related specification work), treating it as a legacy technique and with a somewhat presumptuous attitude in the specification text and elsewhere. But as the analysis of HTML5's grammar shows, they've essentially abandoned use of any formal methods alltogether (and it shows in at least two flaws discussed in the analysis).

Nothing official yet, but maybe this initiative will get traction, or at least find its users as an unofficial resource.

Kaleena answered 18/11, 2016 at 21:56 Comment(4)
I found the info about the initiative interesting and this answer certainly brought more than just a link, @cpburnz (and other reviewers). Another answer in this Q&A has very similar content – just a link and a short description of another unofficial HTML 5 DTD. It got 6 upvotes and no downvotes. I included relevant info from the xml-dev list and I don’t see a better way to answer this question now.Pereira
Keep in mind that the ‘HTML Living Standard’ is not a ‘standard’ in the traditional sense at all; it might be more usefully be understood as a sort of crowd-sourced documentation describing what the ‘major’ browser engines generally aim to support at any moment of time.Untouchable
Adding limited support of DTD, only for entity definition, may be possible; however defining entities in documents with the same "free rules" of SGML would cause severe security issues (a defined entity in SGML can embed absolutely everything without any limitation and without any form of validation). If HTML5 is later extended to allow defining larger sets of entities, it will certainly not use any SGML-like DTD, and will impose restrictions (e.g. allowing them to be defined only for a single grapheme cluster or simply a single combining sequence, with a restricted set of Unicode codepoints)Miscarriage
So for now, it's still impossible to define new entities commonly needed for Hebrew, Arabic, or Chinese, and for most other scripts than Latin, Greek, Cyrillic, and symbols used in mathematical notations); sorry for Brahmic abjads users, they won't get their letters supported by entities in HTML5, as well, CJK users won't get their ideographic space or Bopomofo and Kana letters as entities in HTML5 (but they can use NCRs instead). May be some later "HTML6" will introduce a new way to define (and validate!) larger sets of entities (but without using the old and dangerous SGML DTD).Miscarriage
J
0

I think they did away with the old DTDs, now we just start HTML pages with: <!DOCTYPE HTML>

Maybe the W3C will come out with one eventually.

Johannejohannes answered 29/10, 2010 at 19:43 Comment(3)
This is very unlikely to ever occur. DTDs are very dangerous and can't be validated. Instead the W3C may introduce a safer system to define some well-behaved set of entities, e.g. based on a basic JSON syntax and strict validation rules (IMHO, they will first limit themselves to a single combining sequence per entity, possibly extended by variation selectors: combining sequences and variation sequences are very strictly defined in the Unicode standard; supporting generic grapheme clusters will be much harder and tricky: e.g. look at emoji sequences, and Opentype rules for complex clusters!).Miscarriage
An if ever the W3C standardizes a new better system (than old DTDs) for HTML, it should be added to the support in XML, and probably as well in a new safer version of XML1, forbidding the use of DTD and replacing it with a safer system (the XSD system used in XML alone is clearly not sufficient). A safer (and more performant system should be based on a JSON-based "data language" with strict validation rules for its own data schema); this would be the base for future "HTML6" and "XML2" standards simultaneously, allowing to define an efficient and secure "XHTML6". Goodbye SGML!Miscarriage
But for now HTML5 is not fully interoperable with XHTML without opening a can of dangerous worm holes. Mapping HTML5 to XHTML is dangerous. Freely interchanging HTML5 as XHTML offers no advantage (except is used privately for internal use, where you'll fully control yourself the DTDs you generate, use and validate locally). The HTML5 syntax is simpler, safer to interchange, and much more efficient to parse and validate.Miscarriage

© 2022 - 2024 — McMap. All rights reserved.