Hacker News new | past | comments | ask | show | jobs | submit login
Write HTML Right (lofi.limo)
263 points by aparks517 on June 10, 2022 | hide | past | favorite | 205 comments



Whilst the spec certainly allows you to ignore closing of a whole range of elements, it's not necessarily the wisest of choices to make. The parser does actually get slower when you fail to close your tags in my experience.

Unscientific stats from a recent project where I noticed it:

+ Document is about 50,000 words in size. About 150 words to a paragraph element, on average.

+ Converting the entire thing to self-closing p elements added an overhead of about 120ms in Firefox on Linux, before initial render.

+ Converting the entire thing to self-closing p elements added an overhead of about 480ms in Chrome on Linux, before initial render.

+ Converting the entire thing to self-closing p elements added an overhead of about 400ms in Firefox on Android, before initial render.

+ Converting the entire thing to self-closing p elements added an overhead of about 560ms in Chrome on Android, before initial render.

+ The time differences appeared to be linearly increasing, as the document grew from 20,000 to 50,000 words.

+ Curiously, Quirks Mode also increased the load times by about 250ms on Firefox and 150ms on Chrome. (Tried it just because I was surprised at the massive overhead of removing/adding the tag endings.)

The most common place this was going to be opened was Chrome on Android, and a whopping half-second slower to first render is going to be noticeable to the end user. For some prettier mark up.

Whilst you can debate whether that increased latency actually affects the user, a decreased latency will always make people smile more. So including the end tags is a no-brainer. Feel free to write it without them - but you _might_ consider whether your target is appropriate for you to generate them before you serve up the content.


I can't verify your numbers. As far as I can tell, loading a ~900,000 word document with no other differences than including or excluding </p> has about the same load time, though there's too much variance from load to load for me to really give definitive numbers.

Are you sure you converted it properly? I'd expect those kinds of numbers if your elements were very deeply nested by mistake (e.g. omitting tags where it's not valid to do so), but I don't see why leaving out </p> should be so slow.

Try these two pages:

https://niconii.github.io/lorem-unclosed.html

https://niconii.github.io/lorem-closed.html


For five runs, on the same hardware with the same load:

+ Unclosed: 4.00s, 3.91s, 3.59s, 4.45s, 3.93s

+ Closed: 3.90s, 2.74s, 3.9s, 2.05s, 3.39s

Though I'd note that the newline you have immediately following the paragraph, even when closing, would probably reduce the backtracking effect. And having no explicit body or head element would probably cause some different rendering patterns as well.


I don't know what you're measuring (onload?), but it's not giving you enough precision to make a conclusion about the performance of the HTML parser. If you profile the page w/ devtools Performance panel, you'll see that just 5% of the CPU cost used to load & render the page is spent parsing the HTML. At that level I'm seeing costs of 22-36ms per load.

And, spoiler alert: after repeated runs I'm not seeing any substantial difference between these test pages. And based on how the HTML parser works, I wouldn't expect it.

(I work on web performance on the Chrome team)


Were the five unclosed runs before the five closed runs? I could see that making a difference vs. interleaving them, if the hardware needs to "warm up" first.

For me, on Firefox on Linux (I know it's the one with the smallest difference, but I don't have the others on hand, sorry), using the "load" time at the bottom of the Network tab, with cache disabled and refreshing with Ctrl+F5, interleaving the tests:

- Unclosed: 1.38s, 1.49s, 1.45s, 1.52s, 1.48s

- Closed: 1.47s, 1.37s, 1.48s, 1.49s, 1.35s

The one with </p> omitted takes about 0.032s longer on average going by these numbers, but that's about 2 frames of extra latency for a page almost twice the length of The Lord of the Rings.

Regarding the page itself, I tried to keep everything else as identical between the two versions as possible, including the DOM, hence why I wrote the </p> immediately before each <p>. As for backtracking, I'm not sure what you mean. The rule for the parser is simply "If the start tag is one from this list, and there's an open <p> element on the stack, close the <p> element before handling the start tag."


Well this sounds like really interesting observation. May I ask where exactly were the original closing tags located and how the stripped source looked like? I can imagine there _might_ be some differences among differently formatted code: e.g. I'd expect

    <p>Content<p>Content[EOF fig1]
to be (slightly) slower, than

    <p>Content</p><p>Content</p>[EOF fig2]
(most likely because of some "backtracking" when hitting `<p[>]`), or

    <p>Content</p>
    <p>Content</p>[EOF fig3]
(with that that small insignificant `\n` text node between paragraph nodes), what should be possibly faster than "the worst scenarios":

    <p>Content
    <p>Content[EOF fig4a]
or even

    <p>
    Content
    <p>
    Content
    [EOF fig4b]
with paragraph text nodes `["Content\n","Content]"` / `["\nContent\n","\nContent\n]"`, where the "\n" must be also preserved in the DOM but due white-space collapsing rules not present in the render tree (if not overridden by some non-default CSS) but still with backtracking, that

    <p>Content
    </p>
    <p>Content
    </p>[EOF fig5]
should eliminate (again, similarly to fig2 vs fig1).

(Sorry for wildly biased guesswork, worthless without measurements.)


It was just paragraphs of text. p, strong, em, and q mingled at most. No figures or images or anything of the like to radically shift DOM computations. That the effect can even be seen is probably due to the scale of the document, as I noted it's a little larger than most things.

All paragraphs had a blank line between them, both with and without the p end tag. The p opening tag was always at the top-left, with no gap between it and the content.

So, for example:

    <p>Cheats open the doorway for casual play. They make it easier for disabled players to enjoy the same things as their peers, and allow people to skip parts of a game that <em>they bought</em> that they find too difficult.</p>

    <p>Unfortunately, cheats are going away, because of extensive online play, and a more corporate approach to developing games that despises anything hidden.</p>
Versus:

    <p>Cheats open the doorway for casual play. They make it easier for disabled players to enjoy the same things as their peers, and allow people to skip parts of a game that <em>they bought</em> that they find too difficult.

    <p>Unfortunately, cheats are going away, because of extensive online play, and a more corporate approach to developing games that despises anything hidden.
(You can also discount CSS from having a major effect. Less than a hundred lines of styles, where most rules are no more complicated than: `p { font-family: sans-serif; }`. No whitespace rules.)

However, if you wanted to look at this in a more scientific way - it should be entirely possible to generate test cases fairly easily, given the simplicity of the text data I saw my results with.


Yay, thanks for info and inspiration, sure it seems like fun weekend project.

(BTW your snippet's content sounds interesting and feels relatable, definitely intrigued.)


Finally did some synthetic measurements of (hopefully) parse times (not render nor CSSOM or anything like that). Differences seems microscopic but overall aligned with my initial expectations (omitting the closing tag actually shaves a bit of yak's hair), so I suspect that the real overhead you observed is caused by something happening after parse, where absence of trailing white-space in DOM nodes (ensued by closing tags) helps in some way. I guess something around that white-space or text layout. (Speaking of insignificant white-space, you could probably gain some more microseconds if you'd stuck paragraphs together (`..</p>\n\n<p>..` -> `..</p><p>..`), however such minification seems like a nuisance.)

Tested only on Windows, in browser consoles.

Numbers:

Firefox (Nightly) (performance.now is clamped to miliseconds)

    total; median; average; snippet
    2279.0; 4.0; 4.558; '<p>_'
    2652.0; 4.0; 5.304; '<p>_</p>'
    2471.0; 4.0; 4.942; '<p>_abcd'
    2387.0; 4.0; 4.774; '<p>_\n'
    3615.0; 5.0; 7.230; '<p>_</p>\n'
    2380.0; 4.0; 4.760; '<p>_abcd\n'
    3093.0; 5.0; 6.186; '<p>_\n</p>\n'
    3107.0; 5.0; 6.214; '<p>_</p>\n\n'
    2317.0; 4.0; 4.634; '<p>_abcd\n\n'
    2344.0; 4.0; 4.688; '<p>_\n\n'
Google Chrome (performance.now is sub-milisecond)

    total; median; average; snippet
    2870.4; 5.2; 5.741; '<p>_'
    2895.2; 5.4; 5.790; '<p>_</p>'
    2684.7; 5.2; 5.369; '<p>_abcd'
    2845.4; 5.2; 5.690; '<p>_\n'
    3836.7; 7.3; 7.673; '<p>_</p>\n'
    2837.8; 5.2; 5.676; '<p>_abcd\n'
    4022.5; 7.4; 8.045; '<p>_\n</p>\n'
    4044.3; 7.3; 8.089; '<p>_</p>\n\n'
    2928.4; 5.2; 5.857; '<p>_abcd\n\n'
    2805.3; 5.2; 5.611; '<p>_\n\n'
Test config

    Snippets per document: 5000
    Rounds: 500
    Wrap: '<!doctype html>(items-paragraphs)'
    Content each item (_): bunch of random digits chunks, something like '1943965927 52 27 5 51664138859173 5161 7226 5 15 2 55679 6553712585'
Code: https://gist.github.com/myfonj/57a6a8fcb1c5686527412543a897c...

(Before realizing I can use synthetic domparser I made something what measures document load time in iframe (http://myfonj.github.io/tst/html-parsing-times.html) but it gives quite unconvincing results, although probably closer to the real world. Understandably, synthetic domparser can crunch much more code than visible iframe.)


> For some prettier mark up.

But then if you run it through Prettier it'll add all the closing tags for you :)


If you’re running it through a processor, why it just write markdown and call it a day?


Is there a standard definition for the "Markdown" -language?

There are several for HTML different versions and it is standardized that you can omit some closing tags and some tags altogether.

The benefit of writing in a standardized language is that later you or anybody can run tools against your sources that check for conformity.

So that is why I prefer HTML. But I would like to hear your opinion on what is the best mark-down dialect currently?


Yes, CommonMark is a standard with implementations in many different languages.


That is an interesting development.

From their Github page I read: "The spec contains over 500 embedded examples which serve as conformance tests."

So it's not so simple any more, is it?

(https://github.com/commonmark/commonmark-spec)


Less than a 1000 conformance tests for a standard? Sounds Pretty simple to me, no way you could make an HTML compliance suite that small.


> So it's not so simple any more, is it?

I claimed the specification existed, I didn’t claim it was a simple specification.


I'm not claiming you claimed it was a simple specification :-)

I just find it interesting. This would indicate to me that there are 500 "features" in the language. I thought mark-down languages just provided a few shortcuts for producing the most commonly needed HTML features and then provide a fallback to HTML. So if you cannot do it in the markdown language, use HTML instead.


I can't really be bothered to take a look at the tests, but I strongly doubt there are actually 500 features. A large part of those tests are probably trying combinations of features. E.g. suppose markdown only had tables as a feature, and nothing else. That feature alone deserves several several tests (for tables of various sizes, edge cases such as having only the header, having rows with an incorrect number of columns, etc.).

But let's assume we can get away with just a single test for tables. And then we introduce the features "section headers" and "bold" and "underline". All these features can interact (e.g. underlined bold section headers), so we want to test combinations of all those features, and have a nice combinatorial explosion.


I see, combinations. But also the ability to use different combinations of "basic" features in a sense are a specific feature too. Like you can mark text bold and you can mark text as representing a table. But can you mark text within tables bold? If you can that would to me be a "feature" too. If you can not, then that "feature" is missing.


Well, one simply formats the source file as you write it. The other requires a infile -> outfile build step that's more complex.

Whether the latter is worth it tends to depend on other things than parse time.


Why would I care if one is merely “formatting” or not? If I have to run a tool either way, I would prefer one that accepts a user-friendly input language and decouples content from presentation.


Because transforming an .md file into an .html file is a lot more invasive (though taken for granted here I think) than just writing the .html file. It's a build step where there wasn't one before.

I'm not saying it's never worth it.


How does Markdown decouple content from presentation?


You typically write your content in markdown and merge it with HTML templates and CSS.


Are more strict html parsers/renderers, and aren't they faster?


Lenient parsers still benefit from strict input because it lets them avoid lookaround/backtracking.


What do you mean by lookaround/backtracking? You're inside <p>. You encounter another <p>. You can't nest one <p> inside another <p>, so you close current <p> and open new <p>. That's about it. I fail to see where do you need any kind of backtracking.


Well, even in this one example, imagine parser combinators which often mean backtracking the inner <p> so that you can commit to the `openTag('p')` parser. Or your logic may be 'consume all tags that aren't <p>` which is a lookahead.

A better example here is whether you are lenient and accept unescaped html entities like "<" vs "&lt;". If you require it to be escaped "&lt;" or if all entities in your inputs are always escaped, then your text parser never has to backtrack. But if you are lenient, your text parser can do catastrophic levels of backtracking if there is a single "<" somewhere (unless you are careful). Imagine input that starts off "<a small mouse once said". If could be quite a while before your parser knows it's not an anchor open tag.


> Are more strict html parsers/renderers, and aren't they faster?

Are what more strict? You're missing a subject there.

At a guess, you're referencing the differences between Chrome/Firefox rendering times? And are surprised that Chrome is always slower?

In the same completely unscientific stat taking, I found that Chrome was significantly faster at parsing the HTML head element of a document than Firefox, and that difference was enough for Chrome to pull ahead of Firefox in overall rendering times for smaller pages. (Chrome was about 30% of Firefox's time spent in the head.)

However, Firefox was faster at parsing the body, and as I had a larger-than-usual body (50k words is not your average webpage), Firefox was overall faster.


To you and all that have responded: there is no variation in HTML parsing between browsers. All engines are using precisely the same exhaustively-defined algorithm. There is no leniency or strictness. Their performance characteristics may differ outside of parsing, which includes what they do with the result of parsing, but in the parsing itself there should be basically no difference between engines or parsers.


That’s interesting, but surely relying on user agent to ‘fill in the gaps’ is error prone? Surely transpiling prior or during render would be more resilient than trusting browser behaviour


If you're in a situation where resilience against odd browser quirks matters, you probably shouldn't be writing HTML like this anyway. This style is fine for writing HTML for a blog. For any kind of application, it would be a nightmare to try to maintain.

Every time the author introduced a shorthand, they had to clarify that it works only in specific situations. The result of those qualifiers is that you will have to have some code written in the more verbose style anyway. Context switching between those styles and having to decide whether the shorthand works in any given case just isn't worth it on a large project that you'll be making changes to over time.


HTML parsing is exhaustively defined, so there’s not any filling of gaps, but only rules to be aware of. If you don’t know those rules, this may be error-prone, but if you do, it’s not, and things like the start and end tag omissions discussed in the article are quite straightforward rules to learn.


Although, as the article correctly points out, omitting the HTML tag is technically fine, there is one rather important argument for its inclusion: it can and should have a LANG attribute:

    <html lang=en-GB>
It's not verbose after all, and IIUC may be omitted if and only if the document is served with corresponding information in `Content-Language:` HTTP header, but nasty (or rather annoying) things may happen if that fails [1], so when it comes to "right HTML", following this advice sounds reasonable.

[1] https://adrianroselli.com/2015/01/on-use-of-lang-attribute.h...


No thanks. With the full markup you can see where things end, not just where they start.

I think this is similar to semicolons in Javascript: with semicolons at the end of each statement there is no ambiguity, but if you do not have semicolons, you have to know about edge cases, like if a line starts with a square bracket or paren.


You can't disable this "feature", so you still don't know where things end / begin. Some tags can't be nested in <p> while you could expect that they can:

  <p>
     Paragraph with a list won't work as you could think
     <ul> <li> Test </li> </ul>
     Something else
  </p>
Parses to:

  <p>
    Paragraph with a list won't work as you could think
  </p>
  <ul> <li> Test </li> </ul>
  Something else
  <p></p>

Similarly, in JS you are paying the price for optional semicolons even if you decide to use them.

   return
   {
      x: 1
   };
Will still not work even if you use semicolons elsewhere. So I don't see any advantage to actually using semicolons. JS is not worse than Python with it's basic inference, and yet in Python people will almost yell at you if you attempt to use a semicolon :)

I'd much prefer these features to be opt-in (yea, give me XHTML back for generated content). But when I can't can't disable them, why not embrace them ;)


> JS is not worse than Python with [its] basic inference

JS semicolon insertion is worse, because it depends on the following line. In Python, an unescaped newline outside of brackets always ends the statement, but in JavaScript, parentheses, brackets, binary operators, and template literals on the following line change that. The Python rule also makes a dangling operator outside of brackets a syntax error, which is a potential source of unintentional introduction of ASI when making changes to code in JavaScript.


On the point about semicolons in JavaScript, the logic I’ve heard is that if you consistently use semicolons, you can have a linter warn you if there is an inferred semicolon, so you know if you have made a mistake. If you don’t use semicolons and accidentally produce code with an inferred semicolon that should not be there, then there is no way for any tool to warn you. (Well, no general way; in your example with the return, many linters would warn you about unreachable code.)


I never use semicolons and I never have these issues.

Even in the rarest cases I maybe had them like when copy pasting in the wrong place they were so rare that I don't think it's worth the additional noise of semicolons.


There are 3 major footguns with automatic semicolon insertion iirc (one involves having the return statement on its own line. As long as you know them all it's fine I guess, but not my taste.


> give me XHTML

You can still use XHTML; just send "Content-Type: application/xhtml+xml". You can express the same things as an HTML document, but with a saner parser mode.


> You can express the same things as an HTML document

This is not quite true. There are a number of mutual incompatibilities between the XML and HTML syntaxes at both parse and run time.

At parse time, it’s mostly in the direction of XML syntax making things possible (e.g. nesting paragraphs or links, which the HTML parser prevents), but also in the other direction (e.g. <noscript> has no effect in XML syntax since it’s essentially an HTML parser instruction); you’ve also got case sensitivity which matters for SVG; and there’s the matter of the contents of <script> and <style> elements and their handling of <>&, where the best but still imperfect solution is a crazy mix of XML comments, JavaScript/CSS comments and XML CDATA markers. (See https://www.w3.org/TR/html-polyglot/ for more details of all this kind of stuff.)

At run time, behaviour changes in such a way that it will break some JavaScript libraries, due to differences like .tagName being lowercase instead of uppercase, and .innerHTML requiring and producing XML syntax.


What is saner parser mode?


In this context (although I would dispute calling it “saner,” as someone who was fully on board the XHTML train a decade ago), an XML parser, which among other things enforces that the markup is “well‐formed” by the XML definition, thus prohibiting implicit closing tags and unquoted attributes.


Agree 100%. It's also about a thousand times easier for people with a very basic HTML understanding to parse (if you open something, with pretty much the exception of an image, you gotta close it).

Periodically I have to send code to people who then make some of their own changes inline. God forbid trying to explain "yeah, they don't need to be closed, but that does because it's nested and..." Disaster (/hours of extra support) waiting to happen.


You have to know HTML in order to know where things end. Otherwise, you will see nested paragraphs here:

  <p>Hello <p>World</p>!</p>
when it’s actually two consecutive paragraphs, an exclamation mark outside of any paragraph, and a closing p tag without an opening counterpart.

And when you do know HTML, you might as well omit optional tags.

If you think that HTML syntax is crazy, I won’t blame you, and you might consider XHTML instead, but you should be prepared for different woes.


I have a tendency to forget ASI in JS exists when I've only been looking at my own code rather than other people's for a while.

I remain unconvinced it was a wise idea.


What is ASI?



This works for blog posts, where the body of the document is one long block of paragraphs, but I suspect this style would quickly become untenable for complex apps. Indentation _is_ information, which is lost here.


it doesn't work for even slightly complex documents either. there's been a little meme-fad lately around minimalistic html like this, but to claim it's the "right" way to write html is pompous at best.

not closing tags for instance is really asking for future headaches. sure, it works for a simple text list, but not when it gets even a little complicated (add links, images, buttons, etc.). even worse are p tags, where you have to memorize a whole matrix of what it can contain and what breaks out implicitly. with every insertion/deletion, you need to check the list. it's needless mental drag.


You have to know about what breaks out of <p> tags regardless of whether or not you leave off the end tag, though.

<p><div></div></p> is invalid HTML because <div> ends the paragraph, resulting in an unpaired </p>.


And not just because of that. In XHTML‐as‐XML, where <div> does not implicitly end the paragraph, what you posted is still invalid because <p> cannot contain <div>.


I've been using this style - with some tweaks - for web apps too. I don't think I have it completely figured out yet, but it's promising so far. You can view the source of http://lofi.limo/ to see how it's working out.


I feel like this style just makes it harder to read and understand the HTML. But hey, if it works for you, great.


This is the output of an app/templating system, i.e. not a single HTML page. Have you ever read the HTML of any dynamically generated page? It's unreadable.


> This is the output of an app/templating system, i.e. not a single HTML page.

I don't think that's correct. The article is literally talking about how to write HTML, and explaining the benefits of writing it in this style.


> Have you ever read the HTML of any dynamically generated page? It's unreadable.

Not with that attidude... if you write consistently and with intention, it turns out just fine.

Check out the source for https://try.nodebb.org, for example. Dynamically generated, (mostly) syntactically correct, (mostly) human readable.


All the HTML code in the app I maintain is pretty readable. At some point of complexity any HTML is difficult to parse, but if I hand-wrote a page in my app I think the HTML would be largely the same.


> Indentation _is_ information, which is lost here.

Isn't it a "view" of information? Any sufficiently advanced text editor can recreate it with a simple key combination.


Sure, but the author is advocating that you compose HTML this way. It would quickly become a mess of nested elements with zero visual indication of hierarchy.

The DOM is a tree, with nested elements. Losing that information doesn't get you anything but tag soup (which is, oddly, what the author suggests this style is supposed to avoid)


First and foremost, the author advocates for organising documents in a much flatter DOM tree. In this style all major page elements sit at the same hierarchical level, so there is no "mess of nested elements", the is no need for visual indication of hierarchy if there is no hierarchy to begin with.

I think that is a very compelling format for a text-first web page, like a blog post or news article. Of course it is a coding style not well suited for complex web apps with deep hierarchy.


One interesting detail is that a lack of deep nesting was in fact a deliberate design goal for HTML originally, to make WYSIWYG editing more feasible.

http://info.cern.ch/hypertext/WWW/MarkUp/HTMLConstraints.htm...


In a tree you have branches off branches off branches etc.

You can’t orient yourself - you can’t tell where you are - unless you count the branches. And indenting makes that visible.

In the examples for TFA, you can tell your location from the names of the elements. Eg <td> is enough for you to know you’re probably inside a tr inside a table.

And that is the more common case than the general tree example.

But a method of describing html does have to answer the question of how it represents arbitrarily deep nesting. But I like the answers it’s given for the more common case of structures that are not arbitrarily deep.


What’s a TFA?


The Fine(or Fucking) Article


I always read the 'F' as "Featured"


Urbandictionary agrees. I reprogrammed myself to read it as "Fabulous" using a text replacement addon.


And this terminology (TFA) comes from at least the Slashdot days, about 20 years ago.


In turn, probably descended from “RTFM”—on Slashdot people who commented despite obviously not having read the article were told to “RTFA,” which eventually led to “TFA” as a general term to refer to the original article.


The "F" has always been "Forementioned" for me.


The problem is that HTML has multiple uses. The author is describing the case of authoring content, with HTML used as a markup language. However a lot of websites and web applications use HTML more like a layout and templating engine for a GUI framework.


Only if the formatter is unaware of HTML. If it can't handle automatically closing <p> tags, then it's unaware and is trying to treat HTML like XML.

Or, to put another way, HTML != DOM, even though HTML can be rendered into a DOM.


Sometimes I indent in a way that my text editor doesn't exactly understand to better state where complex expressions begin and end.


I don’t think the author means the information-theory kind of information. I could gzip the file without a loss in that kind of information.


Incorrect indentation is therefore misinformation.


Hi guido!


That’s why HTML is not a language for ‘apps’.


Except for the fact that native apps also use SGML or XML inspired markup for their layout engines. A tree of heterogeneous objects maps extremely well to how people think about UI.


I agree that a tree structure can work well for mapping UIs, but HTML does not. It was specifically design as a textual markup language. Its role has been expanded, but it has been done so poorly.

What really needs to happen it a separation of HTML from UI markup elements. HTML will be used solely for textual markup and a new markup language can be used for UIs. This would allow us to return to a proper separation of concerns.


Sure, but that's an argument for creating new paradigms for having instantly available non-downloaded "apps". Right now, if you want a lot of what a webapp offers (100% cross-compatibility with any platform, instant updates, online syncing for free), you're basically stuck with HTML / Javascript.


> What really needs to happen it a separation of HTML from UI markup elements

Do you mean CSS? Using <b>, <I>, <strong>, etc has been “bad form” for a while (maybe not strong though)


<strong> and <em> are the recommended ways to semantically bold and italicize text. <b> and <i> don't have any semantics and can still be used where it makes sense.


Regarding writing "one-sentence-per-line", I've noticed that style before in LaTeX. While I don't use that style, one advantage that I like is the ability to include comments on the sentence level in LaTeX.

So instead of this:

  First sentence. Second sentence. % Comment on first sentence.
I can write:

  First sentence. % Comment on first sentence.
  Second sentence.
(Of course, one could define a new TeX macro that doesn't display anything to add comments anywhere in-line. That's not as readable, though.)

I've also read that one-sentence-per-line works better with diff programs, but I haven't had any problems with the program meld, so this isn't convincing to me. The advantage the linked article mentions in terms of rearranging sentences also is worth considering, though I haven't found the normal way to be that bad so I'm not convinced by that either.

Some other links on this coding/writing style:

https://rhodesmill.org/brandon/2012/one-sentence-per-line/

https://news.ycombinator.com/item?id=4642395

http://www.uvm.edu/pdodds/writings/2015-05-13better-writing-...


I've been working on turning a pretty massive scanned book into a git repo of markdown files, with multiple collaborators. Using sentence-per-line has been useful (compared to line-per-paragraph) because, even with / despite --word-diff , PRs are far more concise, and merge conflicts are more rare. From memory, with paragraph-per-line, I think a series of paragraphs, each changed, even with minor changes, kinda breaks git diff and GitHub diff.


Oh, wow... I hadn't even thought of the diff angle, but it makes all the sense in the world. I've heard some authors even start each clause on its own line. I'm not sure I'm ready for that yet.


> A few years ago, I found out I'd been tying my shoes wrong for my entire life. I thought laces came undone easily and didn't usually look very good. At least that's how mine were, and I never paid much attention to anyone else's. It took a couple of weeks to re-train my hands but now I have bows in my laces that look good and rarely come undone.

I’m equally interested in this as the HTML. Any clue what the author is referring to?



Likely the author was tying granny knots instead of slipped/bowed reef knots

If your first cross is left over right you need to make your second cross right over left, or vice versa. I found an image showing the difference for the un-slipped version, but it's the same with a bow: http://www.tikalon.com/blog/2020/square_granny_knots.png

Granny knots untie themselves and the bow will end up perpendicular to the knot instead of parallel.


Aw... you didn't read to the end ;)

> The right way to tie your shoes is with a square knot. It's easy to confuse this with the granny knot, which is the wrong way. The square knot is a simple and sound knot with many uses. The granny knot is an unsound knot whose only known uses are to make your shoelaces look crooked and to trip you.


Possibly the Ian knot https://www.fieggen.com/shoelace/ianknot.htm

You look goofy trying to relearn to tie your shoes, but it really is fast and sturdy.


It was apparently the square knot, but I'm a big fan of the Ian knot. I learned it about a year ago and at the very least, tying my shoes is more fun now. I'm not yet convinced it's better than the old-school method, but it looks impressive when you do it and it's more fun.


What immediately came to mind for me was this (short) Ted talk https://youtu.be/zAFcV7zuUDA


Not sure what he's referring to, I'm not familiar with the parallel posts. I just do the "rotate around the loop" part twice. I have had untied shoelace approximately twice in the last 5 years.


I appreciate that this blog post itself is written in the exact same style! I really miss being able to read the view-source: version of websites easily, but this blog post does it well :)


Certainly beats the "you don't need so much JavaScript!" blog posts that load 10 external scripts.


Or the articles about tracking and the ad industry with consent popups asking for permissions to let their ad "partners" track you.


Thanks to Aaron for posting this. Such a great reminder.

Anyone interested in this subject, check out a series of three very tiny books called “UPGRADE YOUR HTML” by Jens Oliver Meiert.

They give great step-by-step examples for eliminating optional tags and attributes, reducing HTML to its cleanest simplest valid form. The author is a super-expert in this specific subject, working with Google and W3C on this. His bio here: https://meiert.com/en/biography/

From LeanPub: https://leanpub.com/b/upgrade-your-html-123

From Amazon: https://www.amazon.com/gp/product/B08NP4GXY2/


> Such a great reminder.

Reminder of what? To me this reads like satire, even if it wasn't intended as such.


This is how SingleFile writes HTML by default :). However, it is also the most duplicated issue in the tracker.


Example "issue" (feature) from the tracker: https://github.com/gildas-lormeau/SingleFile/issues/967

(Also: a huge thank you for creating SingleFile. One of my favourite extensions of all time.)


You can link it: https://github.com/gildas-lormeau/SingleFile

Pretty neat extension!


I was hesitating, thanks!


The remarks by the person who opened #967 are beyond frustrating—and it's frustrating to see your responses to them. People putting stuff into the bugtracker that aren't bugs deserve a harsher response. Don't enable "putting stuff into the bugtracker without clearly articulating a defect [in the form of observed behavior versus expected behavior‡]" to be a viable way to interact with a project. Indulging these kinds of persons' requests for support and freeform banter is harmful in the long run. Giving them the answers that they're looking for even though their questions/comments are out of scope is way too forgiving, and it ends up causing problems for other maintainers when these numbskulls inevitably pop up around other projects and expect the same standard of treatment because they take it as a given that their fripperies are kosher.

‡ including sound, solid reasoning for why the former is incorrect and the latter is correct


At first, I thought people would respect the issue template. In practice, very few do, even when a proper bug is reported. I completely agree with you but it seems to be a losing battle. So I just deal with these kinds of cases according to my mood. Concerning the bug #967, maybe I was not angry enough. Overall, the atmosphere on the bug tracker is fortunately very positive.


I think having one set of fairly clear and complete polite responses to the question to then be linked from elsewhere (or possibly turned into an FAQ ... and then linked to when people inevitably don't spot it in the FAQ before opening an issue ;) is probably a net win in terms of maintaining a positive atmosphere on your bug tracker.


It can help indeed! Thanks for the suggestion BTW [1]

[1] https://github.com/gildas-lormeau/SingleFile/commit/6c7a2ef1...


<3


At this point you are better off making a DSL that compiles to html.

- it will be possible to be consistent with closing tags or not

- you can do other arbitrary things to improve your working experience with it

Ever tried Slang styled templates?


I like this idea. As someone who argued vehemently for XHTML a couple decades ago (even wrote a fair amount of XSLT in those XML-crazed days), who's been wandering between different levels of "how strict should I be?" since that time, this article marks the step of my journey where I feel like I can really embrace the goodness that SGML has to offer for the first time. So thank you. This article has changed me.


Regarding tables, there is one trick: size of borders are actually weighted semantic separators, and should be in HTML, not in CSS.


Regarding tables, don't use tables. :)


…for non-tabular data such as “your pretty design elements that frame and organize the text because it is 1995 and CSS doesn’t exist yet and this is the only tool at your disposal for aligning stuff across the page”. Or because it is 2000 and putting stuff where you want it is a hell of CSS2 floats and box models and eventually you just say “fuck it” and assign table-like behavior to a bunch of divs because Tables For Layout Are Considered Harmful.

If you’ve got stuff that would look good as a table, use a table.


It’s funny you bring this up because while I have joined the Tables For Layout Are Considered Harmful club, I never really have heard a completely convincing argument on why tables have this bad rap. I think it’s mostly because, semantically, tables don’t make sense for layout, but back in the days before frameworks such as Foundation and Bootstrap (and more recently native CSS3 mechanisms), tables with invisible borders were nearly perfect for layout containers.


The "Tables are Harmful" club largely came from the crew who thinks HTML carries lots of semantics and that if you don't use the Blessed Tags that carry those semantics you're doing Bad Design.

The rational evidence in favor of this claim has always been weak. The "div" tag basically finished it off. The people who use HTML "semantically" have always been dwarfed by the people just making it look good on the screen, and the number of applications that use those semantics has always been small and on the fringe for something so putatively important.

However, the idea persists to this day despite its near complete failure to pay off significantly in nearly twenty years, and I'm sure someone will angrily reply to this and list the incredibly useful semantic HTML features that they and fifteen other people have found to be just incredible. Perhaps we'll also get the traditional citation of the Google info boxes, which have nothing to do with the semantic web and everything to do with Google throwing a crapton of machine learning and humans at the problem of parsing distinctly non-semantic HTML until they cracked the problem.

(An honorable mention to screen readers, which sorta do benefit, but still nowhere near as much as you might casually expect.)

Today the reason not to use tables is more just that it's inconvenient to do things like have a mobile and desktop layout. I believe they've got all the tools nowadays to tear into a table-based layout, break the tables apart, and treat it like any other CSS-styled content, but that's relatively recent, and still a silly way to operate when you could just use normal layout elements ("div" if nothing else) like a sane person and not have to undo the table layout before you can manipulate them properly.


> honorable mention to screen readers

Too limited, and deserving of much more than an honorable mention.

Accessibility should be a fundamental consideration of any reasonably sized app, using <table>s to markup tables is part of that.

Assistive devices are not limited to screen readers, and it's just good practice to use tables for tables.

CSS Grid has landed in all major browsers, if you want a grid layout, use grids for layout.


> the idea persists to this day despite its near complete failure to pay off significantly in nearly twenty years

I am not clear on exactly what "the idea" refers to, perhaps you could clarify. Also, how has the idea "completely failed"? And what would complete success look like?


The idea is the "semantic web". Success would look like almost everyone here having to know a lot more about the "semantic web" to do their jobs, such that I wouldn't have to explain to anyone what it was because it would just be how things worked, because it would be that important, and they couldn't operate without it because they wouldn't be able to compete against other websites without the staggering benefits that super-careful, expert semantic design brings them. Rather than just learning the layout and adding a few extra accessibility tags as needed.

As it stands now, it's very practical to just slap some <div>s down and do some CSS and be done.


Obnoxiously bad take.

> Google info boxes[...] have nothing to do with the semantic web and everything to do with Google throwing a crapton of machine learning and humans at the problem of parsing distinctly non-semantic HTML until they cracked the problem

This is verging on /r/SelfAwarewolves material.


I'm pretty sure you're misinterpreting it. Google did not simply write a web scraper that pulls a <business_hours> or a <dc:business_hours> tag out of the web. They wrote a web scraper that super, super intelligently examines the HTML and looks for "anything that looks like business hours"; maybe it's in a table, maybe it's days of the week separated by &nbsp; and <br>, maybe it's in <div>s or <span>s with suggestive CSS class names, maybe it's just in a pile of other HTML. The exact promise of the Semantic Web was that we could just load up a page and get a <business_hours> out of it. Google had to extract the "semantics" with everything but the "semantic web", because the "semantic web" is a no-show. Throwing a crapton of machine learning and humans at extracting semantically useful information from a page is precisely what the Semantic Web isn't.

Which is why it is bizarrely unselfaware when Semantic Web advocates almost inevitably cite that as their biggest success. It isn't. It's their biggest failure.


> I'm pretty sure you're misinterpreting it

You should be more sure of the things you're pretty sure of before saying you're sure of them.

There was no misinterpretation—from this end, that is. Your comment wasn't particularly sophisticated. It didn't require explanation.

> Google did not simply write a web scraper that pulls a <business_hours> or a <dc:business_hours> tag out of the web. They wrote a web scraper that super, super intelligently examines the HTML and[...]

No shit. The value proposition of the semantic web follows from how the world would be much better off if that weren't necessary. It has always been the case that, without the "semantic" half of "semantic web", attaining Google-level mastery over the Web's messy inputs is really, really difficult and requires Google-level resources. This isn't news. Yet you presented it as if it were in insightful observation wrapped in sage wisdom.

In your attempt to "prove" by counterexample what's Wrong with the semantic web, you just end up undergirding its very premise.

> Which is why it is bizarrely unselfaware when Semantic Web advocates almost inevitably cite that as their biggest success.

You cited them. You are literally the only person who mentioned them here, at all. You brought them up.

Saddling someone who advocates for X with the burden of defending position Y that you yourself have pulled from thin air is a textbook example of a bad argument. If you defeat some easily take-downable opponent (a 6-year-old, let's say—and one who is made of straw, for good measure) and then plan to enter the ring in subsequent matches having only bothered yourself with the thought that you will face the threat of another strawchild, that's not wise. It's stupid.


The semantics being wrong is convincing enough to me. That’s fundamental to accessibility. But if that doesn’t convince you: they’re not even good at layout. It’s much harder to make a page even minimally responsive with a table than with more semantically appropriate markup; you’d effectively have to revert all of the table styling, at which point why bother?


https://eev.ee/blog/2020/02/01/old-css-new-css/ <-- Sounds like it depends on how complicated you want things to get within those containers; Eevee here mentions nesting three levels of tables, which... Ew.

Also, not being able to rearrange blocks for different size displays is kind of a non-starter relative to the mobile internet, which I'd guess was more important than frameworks.

In my own hobbyist stuff, though, there's just something a little gross about putting layout in the HTML -- I want HTML to represent semantic structure, in ways I'd be okay with Lynx displaying, and I want CSS to do all the lunatic nonsense to make me happy with how it looks on a modern browser. I wonder how much it's this aesthetic principle of separation that motivates others as well.


Tables don't lend to responsive layouts. If you've got a stereotypical layout with the middle row having a main content and sidebar column you can't really reflow that sidebar below the main content on mobile. With block elements (divs or semantic blocks) and CSS it's super simple to collapse multiple columns down to a single column for mobile. It's also simple to redo the same layout to handle super wide displays as well.

Tables for layout were fine back when everyone was browsing the web on SVGA, XGA, or even SXGA screens at 96dpi (72dpi on the Mac). Now a visitor might be on a high DPI display in portrait orientation, full screen on a 4K monitor, or anywhere in between I think it's a bit disrespectful to visitors not have have a responsive page layout. Tables are a liability for responsiveness and should only be used for tabular data.


Doing layout with tables creates a mess of non-semantic cells, with spans everywhere. That is hard to read, hard to write, brittle on changes, and obscure the actual content. If you take a random page from the tables era, the odds are great that you won't be able to tell what text goes next to what other text.

Divs that follow the document's semantic hierarchy and are positioned on your CSS have none of those issues.

Anyway, a lot of ways to use Bootstrap and other grid-based frameworks introduce the same problems back. And if you want to really display things in a table, well, a table fits quite well your requirements.


you can still use table layout, just not tables for layout, via `display: table` and it's many cousins (table-row, table-cell, etc.). it's a bit cumbersome, so not something to use everywhere like the old days of tables plus spacer.gif's.

https://developer.mozilla.org/en-US/docs/Web/CSS/display#int...


...and absolutely unreadable for anyone using a screen reader.


This was never true, I believe, and only a theoretical issue invented by the semantic-HTML obsessed.

From [1]: “It is sometimes suggested, even by some accessibility advocates, that layout tables are bad for accessibility. In reality, layout tables do not pose inherent accessibility issues.”

[1] https://webaim.org/techniques/tables/


yep, then no pb as a layout.


There was never a well-substantiated argument for the alleged harm of layout tables. Demonizing them mostly just stemmed from the cult of wanting to completely confine layout to CSS vs. expressing semantics with HTML. In the end that “CSS zen” was never really achieved, because the dependencies between HTML structure and styling are just too many and too strong.


> There was never a well-substantiated argument for the alleged harm of layout tables.

What was the best argument that you can recall? What were some of the bad ones? What does "harm" mean in this context?

> because the dependencies between HTML structure and styling are just too many and too strong.

Which dependencies? What would a structure/styling language combination look like that that lacked or had weak dependencies?


> What was the best argument that you can recall?

I mostly recall “layout belongs into CSS files” (so a matter of principle) and “layout tables are bad for accessibility”, which while in theory could be an important point, in practice screenreaders had already adapted, and had (still have) very practical heuristics to distinguish layout tables from data tables (see e.g. https://webaim.org/techniques/tables/).

The thing is, at the time, using CSS to achieve the equivalent of layout tables was an exercise in frustration and futility, in that the results were exceedingly brittle and very often broke when either the table content, the surrounding elements or the browser window size changed too much.

Nowadays we have CSS grid and flexbox of course, but I imagine that in some cases a layout table could still be the most straightforward solution today.

> Which dependencies?

The fantasy back then was that it would be possible to define the HTML content and structure completely independently from layout and styling considerations, and then a separate CSS file could be used to specify any conceivable styling and layout for that content. While that is true to a certain extent, it usually breaks down as soon as you need HTML elements to be in a different order or nesting relation, or when you need additional intermediate nesting DIVs, etc.

In reality the HTML structure and the CSS structure (bound to each other by IDs, class names, hierarchical selectors etc.) is so closely intertwined, and the mapping points (i.e. IDs, class name combinations, etc.) are so many that, for the most part, only superficial changes can be made to one side without having to also make some adjustment on the other side. Ideally, it would be possible for an HTML author and a CSS author for the same web page to mostly work independently from each other. In reality this is almost impossible, except for the case where the HTML remains basically unchanged and the CSS can change within the constraints of the existing HTML structure.

Banning layout tables was never going to be a major factor in coming substantially closer to the imagined ideal here.

> What would a structure/styling language combination look like that that lacked or had weak dependencies?

I think it’s inherently difficult, because you will always need to specify which styles/classes should apply to which element in a rather fine-grained manner, which just means there will always be a lot of dependencies between the two sides.

One thing you’d need in order to realize arbitrary layout is a way of mapping structured content into a different structure. That basically means having a functional programming language to define the mapping, if you want to have full flexibility.


Joking aside, tables are perfectly acceptable and actually the most appropriate markup for tabular data; in addition, accessibility tools know how to read them (IF they are coded correctly, but that goes for any HTML). I use tables where needed, but of course never for layout.


Except, you know, for actual tables. :-)


I like the aesthetic though I'm not sure how sustainable it is beyond basic content documents. On a side note though, I clicked around and big props to Aaron on the lofi.limo project, this is very cool.


Thank you for the kind words! I've been working on adapting this style for web apps, but I haven't got it figured out well enough to write an article about. Yet...

I wouldn't mind if we had a bunch more basic content documents on the web.


HTML can't be fixed with a small trick like that.

Just use templating engine like Pug and get away with most of the annoyances.

It's concise about what part of the text is covered by a certain tag due to forced indentation, not to mention you'll never need to close any tag and you never write "class=" but are all turned into CSS selector notation among many other tricks.

https://github.com/pugjs/pug#syntax

Unless the HTML I'm composing will be touched by people like designers who would get scared of new syntax, in which case I'll use Twig or Nunjucks, I'll never write plain HTML for myself.

There's also a very solid implementation in PHP as well.

https://github.com/pug-php/pug

You can either let server side (node.js or PHP) compile that on demand or let your editors compile them as you edit if you're working on a static file.

I really think the language humans write should deviate from the language the runtimes understand to get all the convenience while never breaking how runtimes/crawlers interpret your output. Same goes for Stylus against CSS.


> However, any content which cannot go in a p element (most other block-display elements, for example) implies the end of its content, so we can usually leave off the end tag.

Note however that this means that the whitespace between paragraphs will be part of the paragraph which can be annoying if someone tries to copy the text on your website and gets an additional space after each paragraph which wouldn't have happened if you explicitly closed the </p> directly after the text.

Also, you should keep the opening <html> and specify the language of your document even for english since e.g. automatic hyphenation does not work if you don't specify a language.

Otherwise really like this condensed HTML style and have recently converted my personal website to it.


The lack of closing tags is giving me severe anxiety. I know it's valid non-xml syntax but all the hairs on my neck are at attention.


I agree, and unless someone has a better reason than the ones I have seen, (saving tiny amount of bytes, less keystrokes, dx) I am convinced it's a bad idea to omit the end tags.

It causes way more trouble than those benefits are worth


I use an aggressively minnimal set of (valid) HTML because I prefer to write in HTML rather than Markdown-flavour-x.

Omitting the closing tags where possible is less about saving keystrokes than minnimising interruptions to my writing flow.

But I wouldn't advocate it for published documents, just my local scribblings.


To solve your anxiety, may I suggest XHTML? I use it on my website in practice and it works really well.


If you must write html by hand this seems nice. But I would never actually write html by hand anymore. For most web apps you write more tags than text. I love slim because it was designed with that in mind. There is no overhead to writing tags, and just a little for writing text. Which is the right way to go for web apps.


omitting <html> works fine in browsers but breaks a lot of other developer tooling in my experience. It's nice to save 6B I guess, but compared to the behemoth webapp it's wrapping it's not much of an optimization.


Why does it matter? A good HTML editor ought to be able to take in HTML, display it and edit it according to the user's preferences, and save it in a size-minimizing way. Why should we have to choose only one way?


Author: write HTML right Me: this green on black background is terrible to read, I'll use reader mode Chrome: this author did not write their HTML correctly, so there is no reader mode available

How ironic.


Firefox's reader mode works just fine. You need a right browser for the right HTML.

... anyway, it bothers me sometimes that I'm not aware of any spec for "reader mode compatibility", did anyone see anything like that?


I use (a old version of) Firefox and can select "View > Page Style > No Style" to disable CSS, and this works OK for me (it is better than some web pages, where this does not work very good, but this one it works good).

I do not know what criteria are needed for the reader mode in Chrome. (The HTML code looks OK to me?)


I think Reader mode looks for a <main> section. When it's not present it either guesses or doesn't work at all.


Is there a tool to convert an existing HTML document into this style? E.g. strip out optional closing tags, without doing full minimisation/whitespace stripping.


I've been using https://github.com/terser/html-minifier-terser to get this kind of HTML for my personal site for a while. It passes W3C so I'm happy.

After reading the connected blog post http://perfectionkills.com/experimenting-with-html-minifier/


Slightly off topic but I'd like to point out that paragraphs in HTML are grouping not textual elements. They are like divs or headers, not like span or b.

They are mistakenly and traditionally associated with literature-type paragraphs but that is not correct. You generally use them in forms to split different groups or inputs, that has nothing with paragraphs of a written form and even less with textual paragraphs.

I think there is really a lot of confusion about them in this whole thread.


Although there are some other uses for <p>, it is perfectly valid to use <p> tags for textual paragraphs and that has been the main use for <p> for as long as HTML has existed. I'm not sure why you believe otherwise.

Take a look at the source code for http://info.cern.ch/hypertext/WWW/MarkUp/Future.html for instance, which was written by the creator of HTML, Tim Berners-Lee.

You can also look at the source code for any page of the current HTML spec (e.g. https://html.spec.whatwg.org/multipage/introduction.html) where, again, <p> is used for each paragraph in the text.


I didn't say it's not a valid use, I said that it's not it's primary use.

Paragraphs relate to grouping content[1], not textual one. There's no logic in paragraphs.

I quote here the official spec, which makes various examples of how paragraphs are not related to logical paragraphs:

> The solution is to realize that a paragraph, in HTML terms, is not a logical concept, but a structural one. In the fantastic example above, there are actually five paragraphs as defined by this specification: one before the list, one for each bullet, and one after the list.

And I'll quote also the definition on MDN:

> The <p> HTML element represents a paragraph. Paragraphs are usually represented in visual media as blocks of text separated from adjacent blocks by blank lines and/or first-line indentation, but HTML paragraphs can be any structural grouping of related content, such as images or form fields.

Failing to realize that paragraphs are grouping rather than logical content leads to frequent misuses of paragraphs and this comment section is literally filled by bad paragraphs examples which suggests the community is largely ignorant on html.

[1]https://html.spec.whatwg.org/multipage/grouping-content.html...


In this comment section? Are you talking about stuff like the example I used earlier?

    <p><div></div></p>
Yes, obviously this is bad and nonsensical HTML. Under no circumstances does it make sense to have a div inside a p. In fact, the above doesn't even work, being parsed as

    <p></p><div></div></p>
But the intention of this example is not to show good HTML. The point is that many people have only a very basic understanding of HTML syntax, under the impression that

    <foo><bar></bar></foo>
works for any elements, because there's a <foo> and a </foo> so clearly anything inside it must be inside the foo element, right? But this is not the case for all elements. HTML's syntax is more complicated than that. My example was only intended to correct this misconception, not to demonstrate semantically-correct HTML, and that goes for other similar examples made by other people in the comments too.


> gator

What do they mean here?


Less-than or greater-than signs (code points 0x3C and 0x3E in ASCII). A friend put me on to calling them that because they (sort of?) look like alligators with their mouths open.


My math teachers used an alligator analogy to remind us which is the correct symbol for using the greater - and less - than sign, the 'mouth' of the gator is always eating the greater number.



> It used to be the case that URL parsers would remove newlines and tabs, so we could split long URLs across lines and even format their query parameters nicely with tabs. Unfortunately, this was taken advantage of for data exfiltration via HTML injection and we no longer have this nice thing as URL parsers have been made more strict to prevent this kind of attack.

Does anyone have a source/reference for this?


Why dont just use groff/troff and output to html?


Figure 2. showing the "common style" is something I've never used or seen before.

What is the "right" way? Perhaps it is to use style from both of these extreme examples and write code that is easy to read and edit for the person that is working with it.

Or perhaps the right way is to never imply the way you are doing things is the only correct way and then try to pass it on as facts?


I too found out I'd been doing my shoelaces wrong. YouTube set me straight.

For HTML, these are good recommendations.

Sometimes, like for technical writing where there are

    various
distinct and important formatting choices, it's just hard work to get it the way you want it even with a WYSIWYG editor.


Closing li tags is the right thing to do! I always close the kitchen drawer too after putting the scissors back. But I rarely write HTML as content anyway, it's mostly templates for the CMS, where it's best to close the tags.


I too close my kitchen drawers. But not my li tags. Unless I'm using the bastardization known as jsx. The next li closes it automatically, as it's specified to do.


"everybody knows" doesn't scale, because

1) not everybody knows and

2) you're relying on memorization for people to read your code, which means you're smashing the ladder rungs behind you

Software on a team is a performance art. People are either watching you and copying your behavior, or watching you and getting confused.

And if you've ever felt overbooked on a project while other people are idle? It's stuff like that that put you into that situation. And since you're the one who did the 'stuff like that', it's at least partly your fault you're in this situation. Stop being a ball hog, and you'll get fewer bruises.


> "everybody knows" doesn't scale

Agreed. That's why I prefer to have things written down. In this case, WHATWG and W3C already did the work for us.

> And if you've ever felt overbooked on a project while other people are idle?

I've seen what you're talking about, but I'm not the one getting overbooked. I'm not generally the one fighting over this stuff. If I get feedback on a PR telling me to add li close tags, I'll probably just do it.

If you're using a technology on a daily basis, it will pay big dividends to spend a little time learning how it actually works.


While I'm no big fan of SEO and all that surrounds it: Will this open-tag-thing here influence how crawlers handle your site and index/rank it?


I have no idea what Google does, but expect their parsers to be quite robust. I tried doing some web scraping, and so many pages are not even valid HTML (most often invalid nested tags, like a table inside span, missing closing tags even when required, random unopened closing tags, ...). Not closing <p> and <td> tags is quite common, I have not seen omitted <html> <head> and <body> yet.


I don’t expect it to as long as the mark-up is valid. Perhaps someone with more SEO knowledge will stop by to correct me.


You're right. HTML5 does not work with DTDs anymore, so unclosed tags are not a violation of the document schema and therefore probably not "punishable" by search engines.


Implicit end tags as described in the article have been allowed by every HTML DTD not named XHTML.


One of the easiest ways to improve SEO is to just properly use existing HTML tags (instead of using a custom DIV for everything).


I'm curious if a more strict html parser would actually be faster.

Browsers are not really fast on my Android, and I wish they were fast.


I have yet to see a slow HTML-only website ;) (which is not 10MB single file spec or entire book). Really, I don't think html parsing is a huge bottleneck and these few parser exceptions don't seem to be that hard to implement - just close a tag if opening one of a predefined list, no backtracking or something expensive.


Depending on what sites you're mostly accessing, it may be worth experimenting with Firefox Mobile plus uBlock Origin plus perhaps one or more of the extra anti-(ad|bloat)ware extensions. Chrome is definitely faster in a straight line but once I've got Firefox configured it's (to me) significantly more pleasant to use (and I like the current UI better than Chrome's though that's -definitely- not a universal opinion, mileage may vary as ever).


love the troff reference. I wrote my first CV in troff. mostly because it was available on my linux machine, and working.


In 2022 how often do you actually write text by hand in your HTML files? I find that beside the few buttons here and there (and that's if you don't have i18n), text is always going to be served by a server.

In 2022 we also all use text editors or IDEs that can collapse entire blocks of tags, to improve readability.

I'm not sure I can see a clear benefit here outside of very few edge cases, and I am sure it comes with its lot of disadvantages.


Static site generators (Jekyll, Hugo) are one example. Sometimes you can get away with markdown but often you end up marking up pages of text.


Even when you need to write actual HTML you still should use shorthand tools like emmet to write your markup faster and with less mistakes.


XML is beautiful and clean, and I prefer to write full closing tags.


It’s funny how people’s aesthetic sensibilities can differ. Making use of HTML’s standard features to drop unnecessary elements and closing tags is very much in line with my own idea of “beautiful” and “clean.”

Do you consider any table that doesn’t explicitly declare <tbody> “unclean”? That’s an implicit element in every <table>, according to the spec.


No, tbody is just an element. The power is in tags.


Of course, of course; but here they are talking about HTML (i.e., about HTML5), not about XML.


I've given up to try and educate XML heads that XML is just a proper subset of SGML, just as HTML is originally, and mostly still, an SGML vocabulary. Idk what people are talking about in this thread (seems to be about each one's personal preferences and wildly speculative assumptions about backtracking when in reality both SGML and WHATWG are deterministic); while there is exactly one reference to WHATWG at this time.


HTML has a dialect in XML called XHTML. It is obscure but actually works. My website is a living example.


This post needs an OCD trigger warning.


"OCD" as in "I don't like clutter" or real OCD as in "if I don't clear away the clutter my family will die in a car crash, I know that's illogical, and yet I'm still encumbered with the intrusive thought?"


OCD as in “not having closing tags matching open tags is driving me insane”. Maybe OCD isn’t the proper term, but I don’t know of a better one.


Respectfully, please try to refrain from using OCD casually. It's not like you're the only one, but it's a debilitating disease.


What is a better term to use that means when things are not perfectly matched it drives me so insane that I can't function until I go in there and fix it so that everything is exactly right?


Rite HTML Wright

(sorry)


Keep calm and Prettier on.


[flagged]


Implicit end tags, which are completely unambiguous, have been a feature of HTML literally since its inception.


You should try reading articles before you comment! The author explicitly states why this is perfectly "proper" and why they prefer it.



>A p element's end tag may be omitted if the p element is immediately followed by an address, article, aside, blockquote, details, div, dl, fieldset, figcaption, figure, footer, form, h1, h2, h3, h4, h5, h6, header, hgroup, hr, main, menu, nav, ol, p, pre, section, table, or ul element, or if there is no more content in the parent element and the parent element is an HTML element that is not an a, audio, del, ins, map, noscript, or video element, or an autonomous custom element.

Whoops! Not in the spec!


What's not in the spec? Every example in the article is valid HTML, and the article itself, which is written in the same style, is valid as well:

https://validator.w3.org/nu/?doc=https%3A%2F%2Flofi.limo%2Fb...

> Document checking completed. No errors or warnings to show.

Or are you complaining that the rules are too complicated? It's very verbose and explicit because this is a specification, but the basic rule of thumb is that anything that would normally be a block element and thus doesn't make sense inside a paragraph will end that paragraph. In practice, this is not really an issue I run into.

Moreover, you need to know about this rule even if you don't omit </p>, because this is the list of elements that implicitly ends a paragraph. For example, <p><div></div></p> is invalid HTML because <div> ends the paragraph implicitly, making it equivalent to <p></p><div></div></p>.

If you don't like that, then your problem is not with this particular code style but HTML itself, which is reasonable. HTML's syntax is very complicated due to its history and doesn't always make sense. But you still have to know how it works regardless of how you personally like to write it.


Read the quoted sentence again (from the source you brought here), none of those clauses apply to:

<p>

Block of text ...

Which is what they do in the article.


I don't understand what you mean. Please elaborate.

That quote says that </p> is not needed in many cases. When you say "none of those clauses apply to <p>", this is true, you can't omit <p>, only </p>... but the blog article doesn't advocate for omitting <p> at any point.


1. The article omits </p> liberally.

2. The spec details situations where the </p> tag may be omitted.

3. None of these (2) apply to what's going on at (1).


Okay, let's go over this then.

First, let's talk about the basic case where there's no whitespace between the two paragraphs.

    <p>Paragraph 1</p><p>Paragraph 2</p>
In this case, the first </p> can be omitted according to the rule "A p element's end tag may be omitted if the p element is immediately followed by an [...] p [...] element [...]", resulting in this code:

    <p>Paragraph 1<p>Paragraph 2</p>
If we assume that the body ends immediately after this (either because there's a </body> or because we've reached the end of the file, since </body> and </html> are optional tags) then we can remove the second </p> as well because of the rule "A p element's end tag may be omitted if [...] there is no more content in the parent element and the parent element is an HTML element that is not an a, audio, del, ins, map, noscript, or video element, or an autonomous custom element":

    <p>Paragraph 1<p>Paragraph 2
Now, let's get into the case where there is whitespace between the two paragraphs:

    <p>Paragraph 1</p>
    
    <p>Paragraph 2</p>
In this case, you can't remove the first </p>, because the rule is that it must be "immediately followed" by another p element. However, what if we start with this code?

    <p>Paragraph 1
    
    </p><p>Paragraph 2</p>
In this case, we can remove the first </p>, resulting in:

    <p>Paragraph 1
    
    <p>Paragraph 2</p>
and again, we can remove the last </p>, resulting in:

    <p>Paragraph 1
    
    <p>Paragraph 2
Now, this is different from what we started with. The whitespace is now inside the first paragraph instead of after it. But since HTML does not render this extra whitespace by default, it's of no real consequence.

And that leads us to this point: the HTML spec is specifying the exact circumstances where you can omit tags without changing the DOM. However, if we are okay with changing the DOM a bit, by moving that whitespace into the first paragraph, then we can simply pretend that we wrote

    <p>Paragraph 1
    
    </p><p>Paragraph 2</p>
from the beginning, and apply the rules to that instead.


> But since HTML does not render this extra whitespace by default, it's of no real consequence.

But it does render that extra whitespace (as a single space). Try selecting the text of the article and you see there is a trailing space after every paragraph.


You misunderstand the spec. Exactly what confuses you is hard to discern, but perhaps you misread “the p element” as referring to the <p> start tag, when in fact the element includes the start tag, text contents, and (if present) the end tag.


Similar mindset that drives people to create HTML-targeting template languages that use indentation rather than surrounding tags, to indicate nesting. Maybe fine in some limited cases but it'll bite you in the ass eventually.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: