NO
See Edit Below
It depends.
If you are talking about the subset consisting of only theoretical HTML, then yes.
If you also include real life, working HTML that is accessed and used successfully by millions of people daily on many of the top sites on the internet then NO.
That is what gives HTML flexibility. The parsing engine adds tags, closes tags, and takes care of stuff that a theoretical CFG can't do. If you took automata you might remember that a production rule in a formal grammar cannot be empty (aka epsilon/lambda) on the lhs (left-hand side). Since the parsing engine is basically using knowledge that a formal grammar and automata couldn't have, it isn't restricted by that and the 'grammar' would have epsilon/lambda -> result
where the specific epsilon/lambda rule is chosen based on information not available in the grammar.
Since I don't think empty lhs are allowed in any formal grammars, HTML cannot be defined by a formal grammar and is not a formal language at all.
Sure, HTML5 might try to move towards a 'more formal' language description but the likelihood that it becomes a context free language in reality (i.e. strings not matched by the grammar are rejected) is about the likelihood XHTML 2.0 takes the world by storm and replaces HTML altogether (XHTML is the attempt they made to make HTML a formal language...it was rejected en masse due to its fragility).
Noteworthy is the fact that HTML 5 is the FIRST HTML standard to be defined before being implemented! That's right, HTML 1-4 consist of random ideas someone just implemented in a browser, and were collected into standards after the fact based on which features were popularly used and widely implemented. Then they tried XHTML, which totally failed to be adopted. Even 'xhtml' on the web is automatically parsed as HTML under almost every circumstance to prevent stuff from just breaking with a cryptic syntax error. Now you can see how we got here and why it is unlikely to be formalized any time soon.
Lesson: "In theory, there is no difference between theory and practice. In practice, there is." - Yogi Berra
EDIT:
Actually, after reading through the documents it turns out that HTML, even according to the HTML 4.01 specification, doesn't actually conform to SGML. To see for yourself, view the HTML 4.01 Strict document type definition (doctype) at http://www.w3.org/TR/html4/strict.dtd and note the following lines:
The HTML 4.01 specification includes additional
syntactic constraints that cannot be expressed within
the DTDs.
So I would say that it is probably not a CFL due to those features (although it technically it doesn't disprove the hypothesis that there is some possible PDA that accepts HTML 4.01, it does prevent the argument that SGML is a CFL therefore HTML is a CFL).
HTML5 flip-flops, abandoning any implied conformance to SGML, but is presumably describable by a CFG. However it will still provide best-effort parsing not based on a cfg, so IMO the current situation (i.e. language specification is defined formally, with invalid strings still being accepted, parsed and rendered in a best effort fashion) in this regard is unlikely to change drastically for a long, long, long time.
id
attributes etc..) – Danika