I disagree, having these tags give the document a structure that makes it easier...

jacobsenscott · on May 21, 2019

I can't think of why it would be easier to write software to parse

    <html><head><title>foobar</title></head><body><p>Hi</body></html>

vs

    <title>foobar</title>
    <p>Hi

Number two looks way easier. If you were to write code to parse #1 you would just need extra code to ignore the useless tags.

Anyway, parsing HTML is a nightmare, and HTML documents are usually broken in the wild, and browsers still, amazingly, manage to render almost anything you throw at them. I'm sure they all already handle it.

boomlinde · on May 22, 2019

> I can't think of why it would be easier to write software to parse

Maybe you haven't considered it closely enough?

If you look at the specifications, there is a complex set of rules governing the conditions under which a tag may be omitted. These naturally complicate the syntax. Consistency is preferable because it typically results in a simpler syntax.

Consider the following language:

    document ::= tags
    tags     ::= tag { tag }
    tag      ::= '(' symbol [ tags ] ')'
    symbol   ::= 'x' | 'y' | 'z' | 'w'

It encodes a document like the following:

    (x (y) (z) (y (w (x))))
    (z (w (z) (y)))

i.e. a list of trees of tags named by symbols.

Now, try to describe the language that's identical except that the closing bracket is optional if it belongs to a 'w' tag that precedes an 'x' tag. Try even to describe a language where they are not optional, but must be omitted. They'll both be more complex than the language I described above. Now imagine that there are about a hundred such exceptions, as in HTML. Chances are that rather than encoding them as syntax for the parser, you'll complicate the lexer to automagically insert the optional tags, fuzzying the boundary between the lexer and parser.

> Anyway, parsing HTML is a nightmare, and HTML documents are usually broken in the wild

Exactly because the syntax is needlessly complicated.

kllrnohj · on May 21, 2019

> having these tags give the document a structure that makes it easier for machines to understand

What is a machine supposed to do with the <html> tag? It's entirely useless and tells you nothing. At best it could be used to identify file type, but the doctype has long since replaced that usage and that still wouldn't justify any reason for the end tag.

Similarly for <head> and <body>. They don't really do anything. There's no machine-useful structure to be had from that.

For some of the optional end tags sure, that definitely helps make things easier for machines to understand. That's why XHTML exists. It's not really used, though, but if you really want easy to understand document structure for machines you wouldn't go anywhere close to HTML in the first place and instead use XHTML.

hombre_fatal · on May 21, 2019

Which machines have trouble, here?

Sounds superstitious.