Hacker News new | past | comments | ask | show | jobs | submit login
Why isn't the <html> element 100% supported on caniuse.com? (anderegg.ca)
300 points by GavinAnderegg 11 months ago | hide | past | favorite | 87 comments



You'll notice that CanIUse has a "% of all" option right above the Global percentage. If you dig into the gear wheel on a feature you'll see more information about this:

Usage calculation:

( ) All Users: Any usage % of browsers not tracked by caniuse is considered unsupported.

( ) All tracked users: Excludes the usage percentage of browsers not tracked on caniuse.

This implies that CanIUse pulls in data about global browser usage, but does not cover all of the browsers mentioned in that usage data. By default it just assumes that browsers that it doesn't track any data on are not supported. If you change this to All Tracked Users you then get 98.71%.

The sum total of the browsers listed in the chart which have unknown support (these are different than the browsers being referred to in the setting above) is 1.27%. Worth it to note that likely there's some rounding going on here, as when you take 98.71% and add in that 1.27% you get 99.97999..%

Pretty close though.

So the real issue is indeed that those browsers show Unknown, when I think we can safely say they do in fact support the html element.


Given what the "not recognized as anything in particular" User-Agent strings in the request logs to our API SaaS look like, I have a feeling that 1.27% of "browsers" could very well actually be various kinds of special-purpose scrapers that have accidentally stumbled out of their domain of expertise.

And as such, they may not necessarily know how to parse <html>. They could be JSON scrapers!


Good point! Would we call them browsers? Well, no, but they do speak HTTP and are bound to get picked up in the data.

I suppose that would be the remaining 0.03% though, since the 1.27% is accounted for, if we trust the upstream browser usage data.


I think the word "browser" usually doesn't appear in the w3c documents. It's a "user agent". And as long as the scraper works for a user, it fits nicely.


In my usage over the past two decades, a browser is a user agent whose output is meant to be consumed by a human. Contrast with a script or scraper, which are user agents whose output is designed to be the input to another processing technique.


A bit late, but let me try to refine this definition a bit, because I find it very interesting.

To me, a "browser" is:

1. a User-Agent that interacts with the web through a HATEOAS model (i.e. it works in terms of hypermedia, not structured data);

2. with some external actor — the "user" — being "in the loop" for at least some of those HATEOAS interactions,

3. where the "user" is expected to be an intelligent, agile actor: one who can cope with changes in the possible HATEOAS interactions, or the insertion of novel HATEOAS interactions they weren't pre-trained on, etc;

4. and where the browser, as a User-Agent, is designed to offer such intelligent, agile actors an interactive tool for understanding and examining the current HATEOAS state of a web resource — a tool which enables such actors to not just receive a static description of a HATEOAS resource, but to probe the resource, and to observe changes in the resource (remember: hypermedia includes Javascript!), allowing the user to gradually refine a decision about how they will interact with the HATEOAS resource.

5. Finally, a browser then offers a "user" the ability to execute their decided-upon HATEOAS interactions through the same interactive representational model that enabled them to learn about the HATEOAS state. The same (usually visual) representations of links that can be hovered to see the URL or alt text, can be clicked to navigate to them; the same (usually visual) representations of form fields that can be examined for labels or placeholder text, can be click-focused and targeted with keyboard input; etc. When the user makes decisions about how to interact with the browser, they're making decisions about how to interact with a coherent Human-Computer Interface that browser is exposing to the user — one where the information and the affordances co-exist in the representational model.

Why so nit-picky? Because the edge-cases are weird. Here's what I mean:

• Chrome itself is, obviously, a browser. Chrome shows a human being (or a dog, or a robot's webcam) a visual rendering of a webpage on a screen. This "user", outside the computer, looking at the screen, can poke at the page with the mouse and keyboard to understand and interrogate the state of the HATEOAS resource at hand (a document with links; a form; some kind of CAPTCHA test; etc.) The human makes a decision about how to proceed given this understanding, and tells Chrome to do it (by clicking one of the visually-represented links; focusing and typing into the visually-represented form fields; etc.)

• w3m is also a browser. Same deal as above, even though it's a character-array representation rather than pixels.

• On the other hand, headless Chrome, when driven by a Puppeteer script that visits URLs and renders out screenshots to bake thumbnail previews for a social-media site — should not be considered a browser. There's no interactivity there, no "browsing"; it's just a dumb bot-agent using Chrome for its renderer.

• Headless Chrome, when driven by a scrapy-puppeteer to extract data from a website — is almost, but not quite, a browser. Scrapy "views" pages, and browses between them! It clicks links! It presses buttons! It fills out login forms! But Scrapy isn't an intelligent, agile agent — it can't cope with changes to the site's HATEOAS model. It isn't making decisions, and it can't take advantage of the "browser as HATEOAS-resource gestalt interrogation tool." In the end, it's just using Chrome to create a trail of authentic-looking network requests, and then parsing the results, outside of Chrome, using brittle, hardcoded logic. It's a bot.

• But what if, instead of Scrapy, we put ChatGPT in the driver's seat of a headless Chrome instance, by having it speak the Chrome DevTools protocol, and then giving it a prompt to solve some high-level problem "by accessing the web"? Well, ChatGPT is an "intelligent, agile actor" by my definition — it doesn't need pre-training on how to deal with a specific website; it can respond to its workflow changing over time. And ChatGPT can see and interpret images — so it can take advantage of the "browser as HATEOAS interaction-space interrogation tool", by doing things like scrolling the viewport or positioning the virtual cursor over things, then fetching screenshots and interpreting them. So headless Chrome, in this use-case, is (acting as) a browser.

• How about headless Chrome driven by a chatbot or Alexa skill, in turn being interacted with by a human? Well, that would seemingly depend on the level of HCI fidelity that is exposed through the bot-as-proxy. If the bot only knows how to do a few programmed-in commands — and it does them by scraping data using pre-programmed models, parsing the hypermedia into structured data, and then describing that structured data to you — then no, it's not a browser. (Even though a human kicked off these interactions, and will see the final result of these interactions, those interactions are being intermediated by a system that isn't itself an intelligent, agile actor.) On the other hand, if the bot is able to be told to navigate to arbitrary URLs; and describes them by fetching screenshots and feeding them to an ML image-to-text model to conjure a description; and allows the user to tell it how to interrogate the loaded resource with commands like "hover over the red button; what do I see now?" — then the system of headless-Chrome-plus-chatbot is a browser.


I like this distinction!


I've come across browsers embedded in another app. One of them was a coupon-app. Apparently, people open these for the coupons, and have to browse somewhere, but then keep browsing inside the app. It had a weird user agent. And of course the PlayStation browser, that was another weird one. So there might be real browsers in the rest group.


Using such application-embedded browsers is a very common technique to circumvent locally-enforced network access controls.


Should they be represented in that case? Scrapers can literally cover every variable within all programmatic languages.

Special purpose is specific, and therefore not standard. I can produce a web-centric language that excludes <html> but has zero use beyond being a jackkass.

(Postscript: I don’t disagree with your point)


> when I think we can safely say they do in fact support the html element.

Not if it's included, but I use wget a lot. Checking src/html-url.c, it does not, in fact, support <html> at all (treating it as a unrecognized tag).


> Not [sure] if it's included


> MDN used to stand for Mozilla Developer Network. Now it’s just MDN. I spent a few minutes looking on the MDN site to see if I could find any mention of the full name, but I guess they’re just all in on “MDN” now.

Huh, yeah, nerd-sniped on that one; I can't find anything either.


We made that change around 2017. Mozilla Developer Network is somewhat ambiguous. In surveys and user interviews people were confused about the name. Web developers thought it might be a resource specifically for Mozilla developers, which to be fair was the focus in the earlier days. The web platform documentation was just one part of many Mozilla related things documented on MDN for some time. In 2017 though the web platform documentation had grown to make up 95% of MDN's traffic and it had become clear that it was not primarily a resource for Mozilla developers anymore, the name change to "MDN Web Docs" was intended to reflect that change in focus.


When the expansion of the first letter of an acronym becomes obsolete, I think it is a good opportunity to turn the acronym into a recursive acronym, e.g., MDN = MDN Developer Network.


This reminds me on the recursive XNA acronym which stands for “XNA’s not acronymed”.


Nice one! Thanks for sharing!

One of my most favourite recursive acronyms is XINU which stands for "XINU Is Not Unix". The delightful thing about this acronym is that "XINU" is also the reverse of "UNIX".

Upon a closer look, it turns out that for a given word W, a recursive acronym proclaiming that it is not W while simultaneously being the reverse of the word W, we need W to be of the form W = "?NI?" where each "?" denotes a distinct letter. Some fictitious examples:

* ANIL ⇒ LINA = LINA Is Not ANIL.

* KNIT ⇒ TINK = TINK Is Not KNIT.

Words of the form "?N?" also work if we are happy with a contracted "is" in the acronym. In fact we can get circular recursive acronyms in such cases:

* ANI = ANI's Not INA ⇔ INA = INA's Not ANI.

* ONE = ONE's Not ENO ⇔ ENO = ENO's Not ONE.

Both acronyms in each pair refer to each other thus making them circular while also being the reverse of each other! These could be useful names to express friendly banter between rival projects.


How about the classic "I'm So Meta, Even This Acronym..." ?


Wow! I was an avid reader of the XKCD comics about 15 years or so ago but I somehow missed this one. For others who are wondering what this is, here's the link to the relevant XKCD comic: https://xkcd.com/917/

What is so clever about this phrase is that it naturally completes to a full sentence that contains itself as an acronym!

I'm So Meta, Even This Acronym IS META!

A wonderful tribute to Hofstader's books that are full of such fascinating self-references.


That's Acronymiquine.


Ah, good ol'e Dr Comer. Back in school I worked on a project rewriting XINU in Rust. It was quite difficult in the early days of Rust, but it was a fun project to get insight into how XINU worked.


Or GNU which is a recursive acronym for "GNU's Not Unix!"

source: https://en.wikipedia.org/wiki/GNU#Name


Also, EINE = EINE Is Not Emacs, and its successor ZWEI = ZWEI Was Eine Initially.


or instead of being a troll, you could just remove the pointless extra letter:

DN = Developer Network


a) I bet you're no fun at parties.

b) That would be way to ambiguous and imply this is the only/main developer website when it's just relevant for web developers.


This is a super interesting anecdote, considering the name definitely confused me at the time - trying to learn web development with no context I did indeed think it was some kind of Mozilla-centric resource. All those times using w3schools over MDN are, in hindsight, a little sad aha.


Not as much as the names ExpertsExchange.com and PenIsland.com confused me!


> Mozilla Developer Network is somewhat ambiguous

MDN is less ambiguous?


Yeah. MDN is just a name, you learn what MDN means and you know it, like you learned what Apple means or what Windows means. "Mozilla Developer Network" sounds more like a description that you're supposed to interpret the meaning of, and one natural interpretation is the network for Mozilla developers, and another is that it's a network for people developing for Mozilla platforms, maybe to do with Firefox add-ons.

I'd maybe call it "misleading" more than "ambiguous" but meh.


> you learn what MDN means

"What's MDN stand far?" "Mozilla Developer Network"


That's the point: it doesn't.


I mean, even if you are not aware of that and you start wondering what MDN is an acronym for, that’s the obvious result to arrive at?


And so what? It's not like they can fix the fact that there are old websites with outdated information on them, so people who are looking for what MDN means will find some outdated sources, yeah. Changing an initialism to a proper name takes time, that's not surprising.


It is easier to displace than destroy previous meaning. Thers what backronyms are for.


I was always surprised that the browser oligopoly together with the W3C already had started building a site for web platform docs - webplatform.org - and then just mothballed that in favour of MDN. Seemed weird.

Snapshot: https://webplatform.github.io


Is that supposed to be for all the browsers? I always thought it was specifically for chrome and sponsored just by google. But maybe the bias towards chrome is part of why MDN became more prevalent.


Scrolled through few articles and it feels as useless restyled copy of w3schools or a similar site. Which are already less useful than MDN.


Huh, thanks for the explanation.

I always understood it by analogy with MSDN so it wasn't confusing to me at all.


This change happened right about August 15, 2017... which I know only because somehow I happened to notice at the right moment where the branding had been updated but the page <title> had not. Here's the bug report I filed: https://bugzilla.mozilla.org/show_bug.cgi?id=1390381



I like calling these "anachronyms," because they're sort of like anachronistic acronyms (yes, fine, this one's _technically_ an initialism). I wrote a blog post[0] about them.

[0]: https://simpsonian.ca/blog/anachronyms/


> yes, fine, this one's _technically_ an initialism

From the wikipedia article on acronyms[1], an initialism is a kind of acronym.

There are some definitions that specify that it must be pronounced as a word, although in common usage acronym includes initialisms , and what iteans to be "pronounced as a word" is kind of imprecise anyway. Is CIA pronounced as a single word, but the pronunciation comes from the pronunciation of the letters? I'd argue that it a single lexeme at least.

[1]: https://en.m.wikipedia.org/wiki/Acronym


ARM might be another good one. Acorn computers don't even exist anymore.


It’s no longer even an acronym, the official name is now Arm.


That's what I mean - that's how the parent defined an "anachronym". But yeah, there was that transitional phase where they first just redefined the acronym.


But also, even later they got rid of the all-caps (at least mostly). It went from an acronym of nothing to just a word.


Isn't it "Arm" because it's British and Brits typically write pronounceable acronyms in proper noun case (e.g. "Nato" vs. "UN")?


No, that's a media/journalist thing, companies themselves can do whatever. As far as I could tell (I worked at A(RM|rm) at the time, but not on this, no particular inside knowledge or anything) it may as well just have been to get people to stop saying 'ay-ar-em', since it was already not an acronym as said up thread; just a branding change, font change. In fact to your point in particular no you can tell it's not that, because note the 'a' is also lower case in the logo and anything styled.


Sun Microsystems - came from Stanford University Network Apple's Siri, came from SRI - Stanford Research Institute


Wikipedia calls these "orphan initialisms", with a couple of citations (halfway down the page for Acronym): "an existing acronym is redefined as a non-acronymous name, severing its link to its previous meaning." But yours is catchier.

Oh, I see they have "anacronym" too, with a fine distinction of meaning. It's the difference between the word officially ceasing to stand for anything, and the public generally forgetting the word stands for.


Oh I'm definitely going to start using that name.


There's a couple of mentions of it in side pages and in example code, but that's all I can find:

> will soon be a proud part of the Mozilla Developer Network (MDN)

https://developer.mozilla.org/en-US/blog/mdn-observatory/ (posted 25 October 2023)

> Their brilliant but offbeat idea grew into today's Mozilla Developer Network

https://developer.mozilla.org/en-US/docs/MDN/At_ten (last modified 25 January 2024)

> const heading = <h1>Mozilla Developer Network</h1>;

https://developer.mozilla.org/en-US/docs/Learn/Tools_and_tes...


I remember a few years ago that Microsoft docs started to point to Mozilla docs, maybe part of Edge rebrand? Perhaps renaming to MDN during this agreement could leave someone like "huh is it M(ozilla)DN or M(icrosoft)DN??". I have no clue.


But M(icrosoft)DN was MSDN.


This is the point in the discussion where I'll point out that .NET started out as ".NET Passport", e.g, what people now think of as a Microsoft Account.

See also Apple, who, after a relatively brief dalliance with the name iTools, rebranded its online service as ".mac".

Sigh...

I miss early Internet names like these, although I suppose the fact that the original dot com era didn't last and a bunch of early startups got wiped out - Pets.com, anyone? - gave it a stench that marketers were all too willing to run away from at the first possible opportunity.


.NET was terrible. .net already meant something else.


Obligatory Wikipedia article[0].

For the curious, MSDN is now called Microsoft Docs.

To the surprise of many, much of the documentation on the site is also open-source[1].

[0]: https://en.wikipedia.org/wiki/Microsoft_Developer_Network [1]: https://github.com/MicrosoftDocs


What's the surprise about documentation being open source? The overwhelming majority of the source is going to consist of the text of the documentation, which is open anyway.


Your information is a little out of date, Microsoft Docs became part of Microsoft Learn, so MSDN is actually now Microsoft Learn, or part of it at least.


That’s a clever and thoughtful idea, but those responsible for branding at Microsoft would burn the entire place down if this was ever to be the case.


Searching for:

site:https://developer.mozilla.org/en-US/ "mozilla developer network"

yielded ~115 results on Google


It is mentioned by its old name in older blog posts, but there seems to have been a rebrand to just "MDN". For example, there's no explanation of what MDN stands for on the About page: https://developer.mozilla.org/en-US/about



Now you link it, I vaguely remember it being a big deal around the time they dropped Servo and load of Rust people left (/were made redundant?), both as part of a broader shakeup/refocusing.


Now it stands for the Mweb Developer Network

xD


Mweb is the name of a South African ISP.

Interestingly, they were one of the very first ISPs that I can remember, certainly one of the, if not the, most advertised to consumers at the time and I think it is the only ISP which still exists today from those very early days.

I wouldn’t ever use them personally, an apt analogy might be that choosing them today is like choosing AOL in the US.


Mԍp Developer Network


Hint: check the blog posts. As recent as October 2023.


It's in the url bar.


No it isn't, there's no 'network' and the order is different, and that's not what's discussed anyway. Though I suppose if the URL really was mozilla.developer.network it would make it less interesting whether the pages ever spelled it or not.


Interestingly, it was also pointed out to me that the `a` and `p` elements have exactly the same 97.34% support. Both of these elements have data coming from MDN like the `html` element. See: https://caniuse.com/mdn-html_elements_a and https://caniuse.com/mdn-html_elements_p

I've updated the article to note this.


Probably worth breaking out a separate feature "html-manifest" on caniuse.

Also, I want to echo that MDN is a fantastic resource.


The original webpage doesn’t use the html element.

http://info.cern.ch/hypertext/WWW/TheProject.html


Is this related to the most minimal valid HTML? No HTML tag there.

    <!DOCTYPE html><title> </title>


It doesn't seem particularly related as <title> has about the same percentage: https://caniuse.com/mdn-html_elements_title. Rather it seems more to do with how it counts browsers it knows the usage of but doesn't have validation of support one way or the other for.

Trivia: That HTML is actually invalid, <title> </title> gets treated as a blank title, which is not valid. If you change it to <title>.</title> or similar it should be error free though.


Isn’t there an <html> tag there though? It is just untyped, and therefor inferred. If you open this page with any browser, it will contain an <html> tag.


The difference is that it is actually valid according to the HTML spec. Browsers will render lots of invalid HTML which is shorter than that, but if you put such HTML in a validator such as [1], it will have errors.

Putting <!DOCTYPE html><title>.</title> in that validator returns no errors (it does have a warning though).

Although in response to OP's question, no, I don't think it is related.

[1] https://validator.w3.org/nu/#textarea


When framed in a protocol that can convey a title otherwise, such as email with its Subject header, <title> is optional. So the shortest valid HTML is just <!DOCTYPE html> and no more—it’s just only valid in certain contexts.


> a lot of the data on the site actually comes from MDN

Eh... not really.

The feature support matrix (as linked on CanIUse) comes from MDN's browser-compat-data repo. Here's the HTML element's source data: https://github.com/mdn/browser-compat-data/blob/main/html/el...

This doesn't contain the testing and usage info that CanIUse cites for support, though, just which browser versions included which features.

CanIUse also points to their own repo, which contains a lot of data: https://github.com/fyrd/caniuse

But I can't find an easy entry point to find where they're getting the numbers for a specific element. The data on there seems to be primarily for features.

So the more precise question is, where is CanIUse getting HTML element testing and usage numbers from? Because that seems to be the issue.


Shouldn't the html element have a support percentage at least as low as any other element?


Nah, the <html> element doesn't represent "all of HTML" or anything it just means "the root element of the document" - just another element. As someone else noted it's not actually even required to have a valid error-free HTML document, for that only <!doctype html> and a valid <title> are.

An analogy might be having a valid understanding and parsing of what "English" as part of an essay heading is unrelated to knowing what "fustigate" means despite the latter being an English word.


> So yeah, I don’t have a great answer for this. If you do, please let me know!


We could come up with a fun explanation, like the "secret browser" [1] in Android being responsible for this seemingly un-trackable aberration. :-)

[1] https://news.ycombinator.com/item?id=39226754


What would it mean to NOT "support" <html> -element?

Is putting <html> on a page supposed to somehow alter the way the page is presented?


I think your question makes more sense if you broaden the idea of "browsers" a bit.

wget? It barely "supports" html; it has a parser because it can follow links, contrary to e.g. curl, for which "HTML" is just a stream of bytes really.

that tv-thing hanging in the underground or the bus rotating awkward advertisements, local weather and public transport info? It might very well use some minimal rendering engine that only knows barebones of HTML?

this thing in the set-top-box on the TV that gets "broken" HTML from some server and presents the buttons and such? same there.

I presume if you are liberal at what you consider "a browser" it makes sense that several will not support that element.


Ah I see now, they would just display the tag <html> on screen as is perhaps.


It's nice to have something that the humans know with absolute certainty so that we know much to trust the machines.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: