An Open Letter to WHATWG

For the last decade or so, Web developers have been moving more and more towards standardization. With the advent and popularity of XHTML, we’ve all been encouraged to ensure that all of the elements we open are closed when we’re done using them, to use all lowercase type for entities and attributes (we could just as easily used all uppercase, but then we’d look like we were shouting in our code), explicitly define attribute values and more. We have come to a golden age in Web development.

Whenever we view the code from other Web sites, assuming it’s written in valid XHTML, it makes sense to most of us. We can tell specifically where paragraphs, divs, spans and other HTML elements begin and end. We have come a long way from the wild west days of the mid-nineties when anything could happen. Some of us are old enough and have been writing HTML long enough to remember the days when HTML was loose and fast and can also remember when browsers would do strange things when attempting to figure out where we really intended one element to end and another to begin.

Those days, however, came and went. XHTML made us all clean up our code, and it made it much easier and more sensible to create attractive, clean Web sites. With the proposal of HTML 5, however, we are very much in danger of returning to those days.

Misinformation

You will see a lot of articles online telling you that HTML 5, though it allows for implicit definitions, open-ended elements, mixed-case attributes and more, will not force you to code that way. Those posts are absolutely correct, but they’re not telling you the whole story. I liken the advent of HTML 5 to repealing the law on driving while intoxicated. Sure, if we were to do away with drunk driving laws, none of us would be forced to drink and drive. However, the fact that there would be no legal consequences for doing so would mean that a lot more people would most likely do so. The same goes for the abolition of the XML standards that have been introduced to Web markup languages.

Further, you will hear a lot of people tell you that XHTML will continue with HTML 5. They’ll explain that there’s no reason you can’t use XHTML formatting when writing in HTML 5. The problem with those statements is that they’re not exactly true. What is true, then?

  1. You can write well-formed HTML, mimicking much of XHTML when developing in HTML 5. However, you cannot have self-closing tags like the line break (<br />) or the image tag (<img />). Instead, you will omit the trailing slash altogether and just have a hanging tag (<br> or <img>). I have since found out that the standards will allow this type of closing tag, but will not encourage it.
  2. If you really want to use strict XHTML code, you will have to adjust your server settings so that the document is served as an application/XML document instead of being served as text/html (no, you cannot simply adjust the doctype definition, you have to modify the MIME type on the server).
    1. In doing so, however, you will not be able to serve your pages to Internet Explorer, as it does not recognize the application/XML MIME type.
    2. In addition, if you serve the document as application/XML, your pages will no longer fail gracefully when an error is encountered. Instead, they will simply stop being rendered. Therefore, if a tag is typed incorrectly or an entity is not encoded properly, the page will stop loading rather than just moving on with messed up formatting.

Proposal

With these issues in mind, and with a great fear of moving into a future when HTML code is just as sloppy as it was in 1995, I therefore propose the following adjustments to the HTML 5 specification.

  1. Continue to require all non-empty elements to be explicitly opened and closed.
  2. Continue to require all attribute definitions to be explicitly indicated. If necessary, support shorter names for attributes and their definitions. Supporting boolean true and false (or 0 and 1) for attributes that only accept two definitions would be a great improvement.
    1. For instance, something like sel=”1″ is much better than simply using selected all by itself.
  3. Continue to require all tags and attributes to be lowercase. Mixed case and all uppercase tags are quite simply unattractive.

I recognize that there is trepidation about introducing these XHTML values into the HTML specification, for fear that doing so will “break the Web.” However, the browsers already support old-style HTML, and will always be required to do so. Standardizing the HTML spec will not stop browsers from supporting old and invalid HTML. Many older HTML tags are already obsolesced by the HTML 5 spec, why should implicit attribute definitions and unclosed tags not also be obsolesced?

Conclusion

Instead of returning the Web to where it was a decade ago, you should use this opportunity to take what we’ve learned from XHTML and apply it to the HTML spec. There is absolutely no reason that HTML 5 should be a step backwards in standardization.

Further, I would posit that, even if the HTML spec were to include the proposal I have made above, developers could go back to using the old-style “spaghetti” code they used in the past. There is absolutely no reason, however, that that kind of sloppy code should validate against that spec. Instead, it should be discouraged.

I sincerely hope that, by the time the HTML 5 spec is finalized, some of the great standards introduced by XHTML will be brought back into the fold, and that we will continue to move forward into the future.

If it just does not seem possible to add these standards into an HTML specification, then I hereby call for a proposal to create a new markup language for the Web, using the old, comfortable elements found in the HTML spec, the standards and best practices introduced by XHTML 1 and potentially greater extensibility to allow for custom, valid elements; and that we work toward making that the next major standard in Web development.

10 Responses

  • The first point in your numbered list, the one about not being able to self close the BR and IMG elements with a trailing slash, is not true. You can add the trailing slash if you like. When it comes to validating as HTML5, it doesn’t matter (from a browser’s perspective, it has never mattered).

    I agree with you that it would be nice to be able to test for a preferred writing style (such as XHTML syntax) but I fundamentally disagree agree with you about that being baked into the language. The place for that test of strictness is in the testing tools. I would really like to see the HTML5 validator include a checkbox for checking against XML-like syntax.

    But, in the same way that I welcome JavaScript lint tools like JSLint without welcoming stricter syntax in the JavaScript language itself, I do not want to see one preferred writing style canonised in a markup spec.

  • XHTML is the one of the w3C’s greatest failings. If you don’t extend the code with namespaces, use MathML, have your own DTDs and so on, why would you want to use XHTML? While proper markup such as using the correct tags for the correct purposes (H1 for headings etc), writing lean and minimal code, and you closeing all optional tags like LI, P and so on received prominate attention in the XHTML spec, those were not the primary reasons for XHTML. Extensibility was!

    For normal general web page design, what’s the gain?

    If you write valid HTML 4.01 strict, and live up to that using the HTML 4 Strict Doctype, and separate content (HTML) from layout/look (CSS) and interactivity (JavaScript) you don’t need XHTML.

    Refering to http://www.w3.org/TR/xhtml1/ “Since HTML’s inception, there has been rapid invention of new elements for use within HTML (as a standard) and for adapting HTML to vertical, highly specialized, markets. This plethora of new elements has led to interoperability problems for documents across different platforms.” What? Not in 99.9% of web sites.

    There was never any excuse for writing sloppy HTML, such as ensure that all of the elements we open are closed and properly written HTML 4.01 is valid, and accessible, fullfilling all modern expectations of a properly designed web page/site.

    The MISINFORMATION is that XHTML is needed for general web development: no where in the W3C’s specifications for HTML was it stated that developers were permitted to write sloppy markup.

    Ergo HTML 5. A realization that the REAL intended purposes of XHTML have not gained traction nor are generally wanted. Browser makers basically ignored application/xhtml+xml. There is demonstratably no need for XHTML in the vast majority of web sites but there is a need to update HTML to recognize REAL extensibilities which are needed.

  • Don’t forget to actually send your letter to the WHATWG. :-)

  • http://carsonified.com/blog/web-apps/the-future-of-html-5/

  • Thank you for your comments, guys.

    Rod – regarding browser makers not generally supporting application/xml, Internet Explorer is the only modern browser that doesn’t. True, IE makes up the majority of use-cases, but Microsoft does not make up the majority of browser makers.

    I agree that extensibility is the real potential behind XHTML, and that it was never fully realized. However, that doesn’t mean that we shouldn’t continue to work toward that. Just because people haven’t used it (since it’s not truly available, even now) doesn’t mean people wouldn’t if that extensibility were ever fully realized. My logic may be flawed, but so is yours. If something’s never been truly available, we can’t possibly comment on the demand for it.

    Pushing that extensibility aside, though, the main point of this article is to make a call for keeping the strict standards for well-formed code. Just as the back-end development world is moving more and more toward object-oriented programming, the front-end development world has been moving toward clean, valid XHTML.

    I’m not saying I dislike the innovations coming with HTML5. There are some really cool new ideas, and many of them will make it so much easier to make pages accessible. Also, I’m not necessarily asking for us to move further toward XML (i.e. XHTML 2, which was a mess); I’m just asking that we keep the structured code many of us have come to know and love.

    Again, I am old enough to remember when tags were left open all over the place and browsers (mainly IE) made a mess of our pages trying to presume where tags should have been opened and closed.

    Jeremy – I like your idea of being able to specify a preferred writing style, but I am still very fearful that a whole new crop of coders will abandon the well-formed ideals of XHTML. It’s possible that I’m just being paranoid, but I do remember how sloppy and difficult things were when we all started using HTML.

    Also, Jeremy, thanks for the update on the trailing slashes. I found the information to which you were referring in the WHATWG FAQ (is that enough acronyms?). It does appear that they are going to allow (and validate) the closing slash on empty elements.

  • Oh, and Ian – I have just subscribed to the WHATWG mailing list and will be sending my letter (with some revisions based on the comments I’ve received thus far) to the WHATWG.

    You reading this blog post doesn’t count? :)

    Thank you for your comment, and you will be hearing from me, soon.

    • Curtiss: I try to keep an eye on blog posts, but for anything more than straight forward technical issues (that I just fix), it’s somewhat unwieldy. (e.g., I only remembered to come back here to check for further comments because I happened upon another site that mentioned your post). So I prefer the mailing list for this kind of comment, as then I can track it and make sure I don’t miss a response or anything.

  • Ms2ger

    “In doing so, however, you will not be able to serve your pages to Internet Explorer, as it does not recognize the application/XML MIME type.”—Look at it from the bright side: you won’t need *any* IE hacks anymore :)

  • I have gone ahead and sent my thoughts to the WHATWG mailing list. I tried, in my message, to focus on the fact that loosening the standards back to the old HTML style, rather than keeping them consistent with the XHTML standards will put more responsibility and control in the browsers and take control away from the coders. I hope I got my point across successfully.

  • mynth

    See comments at:
    http://crisp.tweakblogs.net/blog/321/html5-why-not-use-xml-syntax.html

    I agree with you. Using xml syntax will made language simplier because there will be no exceptions – just simple rules. Open-close, quote all. Thats all.