Lax parsers made the modern web possible. New web pages with image tags, forms, ...

KirinDave · on Nov 26, 2020

I think that at this point, folks have realized this is not in fact true. If anything, lax parsers made the evolution of the web more difficult and more subject to corporate cooption.

wvenable · on Nov 26, 2020

What's the argument for that? If HTML was strictly parsed the first page with an image tag would have been broken in all other browsers.

morelisp · on Nov 26, 2020

Ignoring for one second the specifics of <img> in relation to SGML's `O` option (which was rectified in XML), this doesn't really need to have been the case. HTML could easily have said "if you encounter an unknown tag, render its contents as PCDATA" and sites would have degraded at least as gracefully as they do today. (If less gracefully than they did in 2000.)

(Heck, it could've been a generic SGML feature! "Unknown elements' contents are CDATA, unless they have this attribute in which case they're PCDATA, or this attribute in which case they're ignored" as a rule the DTD...)

wvenable · on Nov 26, 2020

> HTML could easily have said...

But it didn't! The problem with being strict here is that every possible usage has to be pre-imagined and perfectly implemented. You're suggesting the original developers should have just made affordances for everything that will be added in the next 30 years. That's easy to say now. The first web browser was essentially just a hugely successful prototype.

And can you imagine having to type all your tags in upper-case? Yuck. :)

erik_seaberg · on Nov 27, 2020

We never needed to parse tag soup. We only needed to say which DTD defines the new elements a document uses, and what a browser should do with valid but unknown elements. The latter could use #FIXED or default attribute values in the DTD, because some new elements have human-readable content and others don’t.

bmm6o · on Nov 26, 2020

The looseness of html is usually in regards to automatically closing tags or unquoted attributes. You can keep strict syntax enforcement and still recognize and skip unknown tags or attributes.

jcelerier · on Nov 26, 2020

> Lax parsers made the modern web possible.

that's the whole problem indeed

jacobush · on Nov 26, 2020

https://mcshowblog.files.wordpress.com/2019/08/dumpsterfire....

Yeah bad example if you wanted to convince me it’s any good :-D

goto11 · on Nov 27, 2020

This is not due to lax parsing though. It is due to the convention of how to process undefined tags and attribute:

User agents must treat elements and attributes that they do not understand as semantically neutral; leaving them in the DOM (for DOM processors), and styling them according to CSS (for CSS processors), but not inferring any meaning from them. https://html.spec.whatwg.org/multipage/infrastructure.html#e...

This is unrelated to parsing - for example a XHTML parser will be strict and draconian in its parsing, but still process undefined tags and attributes as above.

anoncake · on Nov 26, 2020

That's like saying sloppy programming makes software development possible. At best, it speeds things up (a temporary benefit) at the cost of quality (causing potentially permanent damage).

harperlee · on Nov 26, 2020

That’s not like what was said above. They said that a strict parser would choke on unrecognized tags, thus making experimentation non-viable.

Sloppy programming is not about enabling new syntax at all. That simile is not useful.