Hacker News new | past | comments | ask | show | jobs | submit login

Lax parsers made the modern web possible. New web pages with image tags, forms, and scripts were possible without breaking that page's parsing in older browsers.



I think that at this point, folks have realized this is not in fact true. If anything, lax parsers made the evolution of the web more difficult and more subject to corporate cooption.


What's the argument for that? If HTML was strictly parsed the first page with an image tag would have been broken in all other browsers.


Ignoring for one second the specifics of <img> in relation to SGML's `O` option (which was rectified in XML), this doesn't really need to have been the case. HTML could easily have said "if you encounter an unknown tag, render its contents as PCDATA" and sites would have degraded at least as gracefully as they do today. (If less gracefully than they did in 2000.)

(Heck, it could've been a generic SGML feature! "Unknown elements' contents are CDATA, unless they have this attribute in which case they're PCDATA, or this attribute in which case they're ignored" as a rule the DTD...)


> HTML could easily have said...

But it didn't! The problem with being strict here is that every possible usage has to be pre-imagined and perfectly implemented. You're suggesting the original developers should have just made affordances for everything that will be added in the next 30 years. That's easy to say now. The first web browser was essentially just a hugely successful prototype.

And can you imagine having to type all your tags in upper-case? Yuck. :)


We never needed to parse tag soup. We only needed to say which DTD defines the new elements a document uses, and what a browser should do with valid but unknown elements. The latter could use #FIXED or default attribute values in the DTD, because some new elements have human-readable content and others don’t.


The looseness of html is usually in regards to automatically closing tags or unquoted attributes. You can keep strict syntax enforcement and still recognize and skip unknown tags or attributes.


> Lax parsers made the modern web possible.

that's the whole problem indeed


https://mcshowblog.files.wordpress.com/2019/08/dumpsterfire....

Yeah bad example if you wanted to convince me it’s any good :-D


This is not due to lax parsing though. It is due to the convention of how to process undefined tags and attribute:

User agents must treat elements and attributes that they do not understand as semantically neutral; leaving them in the DOM (for DOM processors), and styling them according to CSS (for CSS processors), but not inferring any meaning from them. https://html.spec.whatwg.org/multipage/infrastructure.html#e...

This is unrelated to parsing - for example a XHTML parser will be strict and draconian in its parsing, but still process undefined tags and attributes as above.


That's like saying sloppy programming makes software development possible. At best, it speeds things up (a temporary benefit) at the cost of quality (causing potentially permanent damage).


That’s not like what was said above. They said that a strict parser would choke on unrecognized tags, thus making experimentation non-viable.

Sloppy programming is not about enabling new syntax at all. That simile is not useful.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: