Apparently this isn't a typo, HTML 5 really has no DTD. Hasn't the industry already wasted enough developer-years trying to parse tag soup? Are we really going to make this disastrous blunder again?
What blunder are you talking about? HTML4.01 has DTD, XHTML1.0 has DTD, did it help in any way?
Tag soup is already there and will stay for a long time.
From all the specs so far HTML5 has the best description how to deal with it.
Yes, I do think they helped. Before HTML 2, validation didn't exist, so whether or not a document's markup made any sense was merely an opinion. "Best viewed with" was the order of the day because there was no way to know whether a browser would handle a document reasonably other than testing them together.
The web mostly interoperates now, but if we eliminate validation with HTML 5 I fully expect a return to the trainwreck we faced in 1995.
First, not having DTD is not the same as not being able to validate: there is a validator for HTML5, validator.nu.
You can also use XML serialization of HTML5 and XML tools to make sure your document is valid XML.
Second, having means to validate markup does not mean that authors will care and do that: what portion of document having doctype with dtd slapped on top are actually valid?
Good markup is not produced by tools, but by those who care.
Thanks! http://about.validator.nu/#pitch seems to be using a RELAX NG schema instead of a DTD, which is fine, though it's not clear where the schema actually came from. The draft alludes to "Criteria that cannot be expressed by a DTD, but can still be checked by a machine", and that's a good sign. But I can't find any formalized list of those criteria, and that's really not.
And sure, there will always be overworked or ignorant authors who roll out slipshod work. But I at least want it to be possible to expect better, as it was not before we had the first DTD for HTML.
More clearly, you are conflating two things: a Document Type Definition, which HTML5 does not have (since it is not an SGML application) and a Document Type Declaration, which HTML5 does have.
The former is (glossing over a lot here) an SGML document outlining the grammar and vocabulary of a particular markup language. The latter is a statement that a document uses a particular markup language. And such a statement does not require or imply the existence of a corresponding formal SGML definition of the language.
Right, they removed the public and system identifiers from the declaration because there will be no SGML DTD to point at. But as far as I can tell, the current draft fails to provide an official schema of any kind to take the place of a DTD.
That's because the specification instead defines the manner in which HTML5 documents should be parsed. In theory someone can provide an XML schema which defines the rules for the XML serialization of HTML5, but parsing HTML5 as HTML isn't something which (so far as I know) can be expressed in a standard machine-readable schema format.
The downside is you don't have "drop this file into an SGML parser and it works". But since that never worked with real-world SGML-based HTML anyway, the upside is that you get a parsing mechanism which actually handles the sorts of things you'll encounter in the wild.
I was also very dubious about HTML 5 until I attended a talk last May given by Mike Smith (http://sideshowbarker.net/). Slides here (for what they're worth without the talking to go with them):
Previously I was quite pro XHTML but now, not so much. Mike mentioned in his talk that he did not think that XHTML 2 specification would ever come to fruition and it appears that he was correct. The XHTML 2 Working Group's charter will expire at the end of this year and not be renewed. So it's looking more and more likely that HTML 5 is our future.
Mike seemed like a knowledgeable guy with his head screwed on right. I don't think he and the others working on HTML 5 would be doing anything to set the web back. I'm sure there will be good options for people to validate their HTML 5 mark-up.