> My favorite is how they work to make it permissible to have a THEAD with an im...

taeric · on Aug 3, 2017

The specific example that blows my mind is that:

  <table>
    <thead>
      <tr>..
      <tr>...
  </table>

has an implicit tbody. Sure, there are some sane reason to have implicit values. And in some cases I think it is actually obvious what those tags would be. This case, however, does not appear to be obvious to me. It is just as likely that this was a table that has a header, but no body.

I don't fully understand why "existing documents" are relevant at all. Since you basically have to "opt in" to the new version by declaring the doctype, we could have had much cleaner semantics on a new doctype. This seemed to be the goal of the xhtml push a few years prior. I am not privy to all of the history of why that failed.

To directly answer, my goal for the spec of HTML would have been a spec with fewer special cases. Preferrably, one that made less of a surprise between people that know XML and HTML.

bzbarsky · on Aug 3, 2017

> has an implicit tbody.

Uh.... That case has no implicit tbody in HTML5. See http://software.hixie.ch/utilities/js/live-dom-viewer/?saved... in any modern browser.

In HTML4 it does because the DTD has "TBODY+" instead of "TBODY*", and yeah, I have no idea why someone thought that was a good idea, apart from the theoretical purity of "a table with no body makes no sense".

> It is just as likely that this was a table that has a header, but no body.

That's exactly what it has.

> Since you basically have to "opt in" to the new version by declaring the doctype,

Er.... you don't. The "new version" is the only version. The doctype affects a very small number of quirks but that's it, and that part way predates HTML5.

> I am not privy to all of the history of why that failed.

There were a few reasons. First, it turned out that neither authors nor users wanted the hard-fail behavior of an XML parser. Users, because it would mean they couldn't read the page they wanted to read. Authors, because they did not sufficiently control all the markup ending up on the page (multiple people authoring snippets, CMS templates, random bits of markup pulled from databases provided by other companies, etc).

Second, because there was no sane migration path. Suppose an author wanted to switch some page over to XHTML. But not all browsers support XHTML (and in particular the browser with 95%+ market share does not), so they need to provide an HTML version too. The normal answer to that was to make use of XHTML 1.0 Appendix C to provide a document that could be parsed as either HTML or XHTML, and to use HTTP content negotiation to send either the text/html or application/xhtml+xml MIME type. But then the problem was a tendency to only test the text/html case and have the application/xhtml+xml case not end up as well-formed XML. There were tons of documents all over the place that had an XHTML doctype and were attempting to comply with Appendix C, but were not actually well-formed; luckily most of them were only served up as text/html. All of this was a strong disincentive for browsers to advertise application/xhtml+xml support, because they would get broken pages. Even the browsers that had started off advertising such support ended up removing it in the face of user complaints; see first reason above.

Note that all this would have been _much_ worse if the switching had been on doctype, not MIME type; as I noted above, there were tons of documents around that had the XHTML doctype but were not well-formed.

I should note that the actual semantics of XHTML1 were not that different from HTML4; apart from parsing there were no significant differences. And the parsing semantics turned out to be something no one wanted in practice, per above.

As for XHTML2, which did attempt new semantics of various sorts, it suffered from several problems as well. Most glaring, again, was complete lack of migration path. Unlike XHTML1 there was no way to create a document that would work with a UA that didn't implement XHTML2 _and_ one that did. The XML parsing semantics were still not wanted in the market. The new semantics XHTML2 introduced were not that wanted either, because the working group decided to not talk to any actual authors or browser vendors or anyone else who would be involved in creating or consuming XHTML2, pretty much. The result was a spec that was solving problems people didn't have, not solving problems they did have, and with no clear way to deploy it in the market.

All of the above is why when WHATWG started working on an evolution of HTML the priority of constituencies (now captured at https://www.w3.org/TR/html-design-principles/#priority-of-co... ) was users, authors, implementors, specifiers, theoretical purity. Because the approach of putting theoretical purity first had been tried and failed spectacularly...

Note that a large part of the failure was in fact due to the "existing documents" problem, because the lack of a migration path was one of the most significant barriers to XTHML adoption. Of course the lack of strong reasons to adopt it didn't help either.

> my goal for the spec of HTML would have been a spec with fewer special cases.

This is not an unreasonable goal, sure. I should note that in terms of priority of constituencies this is a "theoretical purity" goal. Getting rid of specific special cases that are confusing people could be a goal in terms of the "authors" or "implementors" or "specifiers" constituency, of course.

Note that HTML5 did in fact remove various special-cases HTML4 had that were due to its SGML heritage, and most of which had never actually gotten widely impelemented in browsers. For example, comment parsing was simplified significantly, such that "" is actually a closed comment (which it's not in HTML 4, and wasn't in Firefox, which actually implemented the HTML 4 semantics for comments, until the switch to the HTML5 parser). The special-cases that remained were the ones that were needed to actually render existing web pages correctly.

taeric · on Aug 3, 2017

Hmm, I have to confess I was cribbing this example from a link above. I'll dive further on it and see where I got lost.

I am a bit fuddled on the claim that HTML5 was determined not to be an opt-in schema. I'm probably colored because most of my docs by when I was actually caring about this were using the xhtml doctype. So, for me it definitely was a sort of "opt-in" and a migration. Which, frankly, is logical and makes the most sense.

So, I grant that the "existing documents" problem presented a ton of not well formatted documents. But, a large chunk of existing code presents with excessive warnings. The solution there is not to just give up, but to come up with better tools and guide people to the higher quality paths.

In the end, I fully accept this as something I will just have to agree to disagree on. My assertion is that contortions to not raise the bar on the creation of documents did little to advance the state of the web. I do not have a clear path on how to test this assertion. And have since moved on from web development.