Skip to main content

Dancing Naked in the Streets: A Madman Takes on HTML 5

Kurt Cagle's picture

Editor's Note: This was first published 9 November 2009

The response to my post about HTML 5 has proved to be rather stunning, and interesting in the wide variety of opinions being expressed. In a few cases, most notably from both Robin Berjon and Shelley Powers, people I respect, the reaction was that I had transgressed the bounds of good taste and descending to a deliberate campaign of misinformation and malicious ad hominem attacks. Looking over the post, I can understand where that conclusion would be drawn, though it was not my intent. There has been some clarification concerning mis-statements I did make, from ignorance, including the fact that the XHTML expansion of binary attributes is always the name of the attribute as the value (something I missed, thank for the clarification), and in one notable case, a surprisingly conciliatory tone from perhaps the one person I was expecting to be confrontational. If, in attacking what I saw as rather boorish behavior I was myself boorish, then I humbly apologize.

However, one of the benefits of blogging is that you can in fact dance naked in the streets, and in so doing cement your reputation as a madman or a fool, without the rational voices of older, and perhaps wiser, hands to stop you from doing so. Having danced once (and likely still lacking suitable habiliments) I figure that it's probably a good time to address my central, fundamental problem with HTML 5:

It's not XML.

Now, I have no doubt that there are people who are going to immediately jump on this and say "of course there is - there's an XHTML version that's been there for a while!", and they would in fact be absolutely correct ... and will be missing the point. What exists right now is a data model, written in a long, highly detailed written specification that is, as its proponents would properly point out, an order of magnitude more detailed than the HTML 4 specification. Yes, absolutely, and I think as a conceptual model, there's a lot to like about HTML 5. While I don't agree with all of the inclusions, I like the underlying document structure, I think there are a number of good examples, and overall the tagset is good enough to hit most of the major bases (and I am writing articles on several of these structures for DevX that are in no way as ascerbic as I'm writing here, because I think that for better or worse HTML 5 is here to stay).

My problem is not with the content, but the underlying form. As it stands right now HTML 5 has no formal schema for it, no DTD, no ... well, nothing. There's deliberately a simplified SGML like header at the top, but its a header that is syntactical suger - it contains no information, no pointer about structures, only an assertion of its own existence. By extension, I suspect that XHTML 5 also does not have a such schema, though given my latest gaffes I may be wrong there. However, XHTML could easily have a schema developed as a variation of the XHTML 1.0 schema, because it's underlying architecture is modular - SVG, MathML, etc. are all well defined schemas, XSD has a clear extension mechanism, and the XHTML space outside of these particular sub-pieces is relatively well constrained. HTML 5 has ... a spec.

I'll come back to the full implications of this in a bit from a structural standpoint, but for the nonce, the argument comes down to the fact that HTML 5 as it exists right now is making an assertion that it is fundamentally its own construct, complete in its own right, and that this is in fact a good thing. HTML 5 - deal with it. We have spent an entire decade creating not one but two highly descriptive, very powerful, fully declarative languages (XML and RDF) that are internally consistent, have extensive sets of tools, storage devices, and billions of instances in usages for everything from documentation systems to messaging for car brakes. Either could have been chosen to represent these structures, and people would likely not have squawked dramatically about it.

Instead, for reasons that I have yet to fathom, there's a bare nod to SGML with HTML 5, but that's about the extent of any formal machine specification that is made about this language - and therein lies many, many fundamental problems. Shelley Powers raised the argument (and it's a compelling one) that HTML validation isn't really necessary, and by not requiring validation you effectively also eliminate the need for worrying about whether XML is in fact included at some deeper level. I would actually agree with her that validation is not really necessary - I don't think I've validated an HTML document in years - but that there is a secondary area that I think is actually in many ways far more important than validation: tools.

I work with XHTML and subordinate XML constructs extensively using both XML Spy and Oxygen, as well as many other XML languages from OOXML and ODF to NIEM and XBRL. How can I work with so many different XML languages? Easy - each language has its own schema or set of schemas, and the tools that I use are able to read these schemas, and from them determine what are in fact the valid tagsets to be used regardless of where in that XML document you are ... without a priori knowing a single damn thing about the language. I type in the root tag, and the editor will let me know what attributes and child elements that tag has, will let me know whether I can type in a number (or even a specific constrained range of numbers) or a string of text, will even pop up documentation about the tags. If I'm using NVDL, I can even intertwine different namespaces with ease - a real boon when it comes to working with XForms or MathML.

I'm going to give a plug to Oxygen 11 here, which came out last week and has a very nice XML author layer (getting close to par with XML Spy, which I think has perhaps marginally a better one, but the distinction's getting close) where you can actually write content quickly with nary an angle bracket in site. The reason you can is the magic of those schemas, and it is something that just about anyone who's spent some time working with XML learns to appreciate about the language. What this means in practice is that your renderer doesn't need to understand the underlying semantics - it only needs to know enough to apply the relevant bindings (via CSS, XSLT, XBL, etc.) to the entities in question to be able to render in an appropriate fashion. It is semantically neutral.

The approach taken with HTML 5 is in a way a step back in that it assumes that the primary rule of the browser is to render HTML - and that each of the tags have implicit renderings (albeit ones that can be overridden). This represents a profound dichotomy in thinking, one that I think influences the debate to a very great degree. The HTML position is that the HTML language is somehow privileged in the browser, and that ultimately everything ends up being mapped into some combination of HTML+CSS+JavaScript. It's a majority view because it was originally true - until the emergence of XML, the role of the browser was to render HTML, and even prior to CSS, the elements even had a very definite (and minimally configurable) presentation that varied from browser to browser.

The second viewpoint is more subtle - the browser is a generalized renderer of XML content in which the bindings associated with given elements and/or attributes are defined by some combination of CSS, XSLT and XBL - and these are all dynamic and document-centric. In this model, HTML can be seen as being just another XML format, SVG is just another XML format and so forth. In some cases (such as SVG) it makes a lot of sense to build these bindings deep into the architecture, because graphics (and most notably animation) need to have high performance capabilities to be effective, but in many respects such languages are comparatively rare, and are usually the ones most directly tied into some form of input/output capability. XForms can also be seen as such a binding language.

The moment that HTML 5 becomes just another XML language, it loses its primacy in the browser, though this process will take a comparatively long time to happen simply by dint of HTML 4 legacy. This factor is already being seen in the ODF/OOXML process in a slightly different context; once in XML, word processing documents or spreadsheets no longer need to be constrained to a distinct application, and instead become primarily carriers of content and format between processes. I would suspect that most OOXML documents are not being stored as such, but are instead being generated from prima facie XML content through some fSGorm of transformation, such that these formats become increasingly seen as end-point formats rather than document-storage formats. Different and better renderers and similar applications could thus make them obsolete fairly quickly.

This is true even for the case where the only XML characteristics of HTML 5 was that it was well-formed XML (i.e., didn't have namespaces, but followed a true containment model rather than an implied one. In the comments to my "Train Wreck" article, there were a number of people that brought up the SGML aspect of HTML 5, that binary attributes were perfectly legitimate forms of SGML. Yes, of course they are. However, I'd make the case that this is irrelevant. Outside of HTML, SGML is making up a small and rapidly shrinking percentage of the total markup space. When XML was first designed, the need for a simplified form of SGML was quite strong given significant bandwidth and processing constraints, but since the emergence of HTML, generalized processing power has increased by roughly a factor of 10,000.

I suspect that a case could very well be made for the re-emergence of SGML, given that significant increase in power, but in many respects XML has created its own solutions to most of SGML's major limitations, and other languages, such as RDF/OWL, may very even end up superceding XML (though I don't think this will happen for some time). At this stage in the game, however, XML is the primary messaging standard for both enterprise level and syndication level architectures, and despite the many benefits that JSON provides, is even beginning to regain some of the ground lost to JSON after the emergence of AJAX.

Thus, the refusal to provide even well-formed XML is again puzzling, especially in light of what such well-formedness gives you. Well-formed XML is unambiguous - no tags inadvertantly left open to italicize the rest of your document, no overlapping tags creating odd behavior (especially in editing solutions). If HTML were XML well-formed, then it could be read by any XML parser, not just a dedicated HTML parser, could be transformed via XSLTs, could be queried by XQuery, could be modularized as necessary. It could be stored in XML databases without needing to be parsed as HTML in or out. It can run through XML pipelines, with different steps resolving specific tags (or tag collections). It can be validated, even if it isn't necessarily, and it can additionally hold metadata schemas (for everything from publishing information to markup for recipes) simultaneously in the same document.

This makes no presuppositions about namespaces, CDATA sections, or schema content; this just assumes that the document is well formed. Yet even this point seems to be considered too controversial - and its a consequence that could end up costing potentially tens or even hundreds of millions of dollars to businesses, as more and more of these businesses move their CMS systems into XML based workflows and repositories.

A good case in point of this can be seen with another spec that exists only on paper - RSS 2.0. The intent with the publication of 2.0 was to both "keep it simple" and (I suspect) to provide the finger to the XML community after the publication of the RSS 1.0 spec. RSS 2.0 need not be well formed, and the specification is fairly ambiguous - there is no formalized schema (though there are a number of unformalized schemas). The result of this is that a significant portion of all RSS 2.0 feeds cannot be fed into an XML parser, there are a huge number of variations in the way that RSS 2.0 documents are laid out, and this has resulted in the need to create any number of different RSS processors that each behave in different ways, have different APIs, and are in many cases not even consistent with one another in what is produced.

This isn't a failure of XML. The Atom format, which does have a requirement of being both well formed and schematically valid, is gaining ground rapidly on the much older RSS 2 format precisely because it is interchangeable, it does have a clearly defined and machine validatable scheme, and it only requires an XML parser in order to be processed, rather than requiring people to write one Atom processor after another. As such it is seeing far more use in mash-ups and applications in general than RSS2.

Now, I would seem to be consciously avoiding mentioning XHTML 5 here. At the moment, the specification as it stands, there is nothing that specifically indicates that a user agent developer should in fact accept both an HTML 5 representation and an XHTML 5 representation of the same content. From the HTML perspective, this may not be a bad thing - after all, XHTML content should be generally parsable as HTML. From the XML perspective, though, this is a recipe for disaster.

Today, I develop nearly all of my applications, applications that support XForms, XBL, SVG, MathML, XML Events, XLinks and other similar schemas, using XHTML. Currently every browser vendors supports XHTML natively - except Internet Explorer. This means that when I send content as xml or xhtml, it will render the content as XML - even if it is a valid XHTML document. This means that for many of my clients, IE is no longer an option; or it dramatically limits the power of the applications that I can create. This is the case precisely because XHTML is not seen by Microsoft as a standard that its customers are clamouring for (though in my experience, once those customers understand what XHTML buys them, they do clamour for it), and the cost of re-engineering their browser to support XHTML is seen as not worth the effort.

If XHTML 5 is optional, then it means that (at best) you end up with two development paths for EVERY project that deals with HTML 5 - one for HTML, one for XHTML. Yes, they may share a lot of code, but in practice, there are enough differences that user agent development can get to be complex given the support of both standards. For instance, XHTML 5 should be able to support RDFa simply by dint of being an XML language. HTML 5 creates its own "microformats" approach, an approach that frankly is falling out of favor even in the HTML 4 world precisely because it fails to handle overlapping content formats and lacks rich semantics, which means that the two approaches cannot even encode the same information.

Ultimately it's my own belief (and only my belief) that it should in fact be possible to encode the same information in both the HTML 5 and the XHTML 5 format. This seems to be a reasonable expectation; the fact that it is not in fact possible raises red flags in my mind.

This also becomes a factor when dealing with distributed extensibility. The XHTML solution for that is namespaces - essentially a mechanism for creating category:term pairs both for disambiguating terms and for identifying - for the user agent - otherwise out of scope terms. In the XML view of the browser, such namespaces facilitate binding, especially for non-contiguous elements within a documents. HTML 5 has no analog - the closest to it is the use of the SVG element as a mechanism for embedding non-namespace bound XML, something which I see frankly as being a rather bad kludge.

Most of this likely seems fairly arcane to the HTML view of the language. Namespaces are verbose and make reading HTML content more difficult, and hence are seen as being bad - despite the fact that they serve an almost indispensible function to a fairly significant swath of the language users who are dealing with XML on a daily basis. Schemas may be seen as limiting and serving little purpose to those who don't use them on a daily basis, but again for those that do (and this includes nearly anyone who works with XML pipelines in almost any form), schemas do everything from providing significant constraints on incoming data (and weeding out a significant percentage of invalid or erroneous data) to providing the intellisense like behavior that make our general purpose tools able to handle a broad range of document and data types. Indeed, the seemingly hyper-complex language that can't be modelled in HTML readily resolves down to a clear number of well-established modules in XHTML.

I think there's one last factor here. One of the principle arguments made in terms of keeping the SGML-based format (even sans DTD or schema) is that this format is easier for non-developers to use, and being more forgiving, will see much wider adoption. Personally, this argument was specious in 1993 when it first popped up; it encouraged bad programming practices that eventually became encoded in badly written software, caused a great deal of ambiguity in the specification which meant different interpretations of the same element by different vendors, and it ultimately resulted in the inconsistencies in implementation in different formats that led to the need for a great deal of "translator" JavaScript, opened up the browser to exploits, and reduced the overall declarativeness of the web significantly.

No other computer language is designed to work when ill-formed, nor should it be. Such ill-formedness is analogous to the excessive use of default arguments in programs, a practice which ultimately can cause serious maintenance issues and unexpected behaviors in your programs that can be very difficult to track down. If something doesn't work, as a code creator, I want my code to throw an exception so I can figure out the problem, not simply ignore it until I end up blowing up a rocket because of a misplaced semi-colon. I realize that this is a browser issue rather than an HTML 5 issue, but the philosophy in the latter very much drives the former ... especially when you figure that nearly all HTML content produced today is generally the result of an automated process (via WYSIWYG editors, BBCode and similar alt-markup languages, database output, Wiki filters and so forth, rather than being hand coded).

Now, I'm going to be realistic. I'm dancing naked in the streets. I'd like to believe that these arguments will make some difference, but I'm not directly involved in the process (and am unlikely to be). I've yet to see a compelling rationale for the current path being taken. There isn't even a legacy issue ... the HTML 5 involved here has a unique declaration and the only support for HTML 5 is completely experimental in the browser space. There is no legacy code involved.

I will ask this - can anyone give me a compelling reason why this isn't being done just as an XML specification exclusively - at least to the extent of being well formed? I'd really truly like to be convinced, because at the moment, I'm just having trouble getting behind this spec, no matter how detailed it is.