In the face of arbitrary user-content, like comments? Are you checking they don't include a U+FFFF byte sequence in there? (Ten years ago almost none of the biggest XHTML advocates had websites that would keep outputting well-formed XML in the face of a malicious user, sometimes bringing their whole site down.)
It's absolutely possible to write a toolchain that ensures this, just essentially nobody does.
> Isn't the non-strict parser for XHTML just the normal HTML parser?
Yes. It's literally the same parser; browsers fork simply based on the Content-Type (text/html v. application/xhtml+xml), with no regard for the content.
The bigger problem with XML parsers is handling DOCTYPEs (and even if you don't handle external entities, you still have the internal ones), and DOCTYPEs really make XML parsers as complex as HTML ones. Sure, an XML parser without DOCTYPE support is simpler than an HTML parser, but then you aren't parsing XML.
The problem is that with the glut of document declaring strict conformance but failing to be, fallback mechanisms had to be implemented, making it like a two pass parser, where if strict fails, you reparse in non strict. In the end slightly more complex, and definitely slower.
> Because parsing invalid XHTML, which all browsers ended doing, is more complicated than parsing HTML5...
I don't understand what you mean. Isn't the non-strict parser for XHTML just the normal HTML parser? The complication levels should be equal.