Hacker News new | past | comments | ask | show | jobs | submit login
<i>: The Idiomatic Text element (developer.mozilla.org)
125 points by dimmke on Nov 10, 2022 | hide | past | favorite | 208 comments



So a brief history:

- <i> makes text italic

- no, presentational elements are bad, use <em> instead

- okay, fine, you’re all still using “i” so we’ll put it back in the standard

- here’s a convoluted meaning to pretend that it’s in the standard for some reason other than “we tried to remove it and failed”


You skipped "fuck semantic HTML and CSS, we use Tailwind now."

<i> as italic was great like Tailwind is great: sometimes I just want to design something basic without having to context switch to CSS. I want italic, I use the <i> tag.

What has semantic HTML ever done for us? Does the browser care that I used <i> instead of <em> for italic? Do screen reader even care?


> What has semantic HTML ever done for us?

For us small things, like enable reader views. For people with disabilities, it lets them effectively use screen readers.


The accessibility aspect is huge and deserves more attention and kudos than it gets.

Also, from a purely selfish aspect, I vastly prefer documents with at least a minimum of semantic organization, even if it's just header, footer, main, and meaningful heading levels.

Without those, well, it's just a document full of undifferentiated text with various attributes to distinguish it visually or typographically, and usually inconsistently used. Is this piece of text styled "bold with fontsize 16" supposed to be a second or 3rd level header?


What stops screen readers from treating the <i> tag the same way they treat the <em> tag? If <em> functionally ends up doing the same thing that <i> did, what's the point?


A screen reader that did this would need to heuristically determine if the speech should be emphasized or not. Because you don't emphasize much of what is read when reading italicized text.


The point being, if italics were already used to add emphasis and to separate an expression from the rest of the utterance, why was a dedicated tag required to express emphasis?

(The heuristics only apply now, as there are two kinds of emphasis, which introduces some uncertainty of how things should be expressed. Moreover, as type styles have moved to CSS, there's a chance that a screen reader presentation may miss an intended separation of text.)


In terms of typography, italics are also applied for titles, for quotes (in some contexts), for foreign expressions, sometimes for dialogs, and sometimes simply because they are easier to read. For a screen reader, you don't want to emphasize these use cases.


They are still used to separate that expression, to make the expression stand out from the rest of the text. Surely, if I set a quote in italics, I mean this to be read in a distinctive voice by an aural presentation. (This is the entire point of separating it by a distinctive type style. I'm actually somewhat alarmed that this isn't the case.)

Edit: There's a clear meaning to the use of distinctive type styles. I don't see an intrinsic value in pretending and that there is no such intended meaning and that there should be an artificial separation, rather than generalizing on presentation styles.

Edit #2: In Antiqua, italics isn't just oblique text, it's an entirely different script. So what is the use case and the intended meaning of using a different script? Isn't this more conceptual than just using a fancy visual presentation style, which may be happily ignored in any other representation as there's probably no intended meaning to this?


<em> and <strong> have always been there as long as I recall, and are part of the 1995 HTML 2.0 spec. Before NCSA Mosaic, your hypertext was being rendered on text based terminals. Most of them couldn't render italics, and for emphasis tended to invert foreground and background, use underlining, or color if you were lucky. The use of italics and bold to add emphasis started when people started using the first WYSIWYG HTML editors in the later 90s. People who had never learned HTML or even written hypertext before, concerned primarily with making things pretty. And this problem drove the creation of CSS, to get people to stop using tables for layout but never quite got people to understand <i> and <b> were exclusionary, except at the Universities where they had to support braille terminals and screen readers for visually impaired students.


Yes, <em> and <i> where introduced at the same time in HTML 2 (1995), as were <b> and <strong>. But for any practical purpose <em> and <strong> were ignored until HTML 4 / XHTML. I've been coding websites professionally since 1996 and I do not recall encountering these tags in the wild. (When I started, the use of head and body elements was the dernier cri – to be coded with <i>. :-) )

Fun fact: the original HTML definition [1] has "typewriter" as the only styling element and uses surrounding underscores ("_are_") for emphasis.

[1] http://info.cern.ch/hypertext/WWW//MarkUp/Connolly/complete....


<i> for things that should be italic for visual reasons. For example, the titles of books.

<em> for things that are emphasized, as in emphatic expressions within text. For example "You'll get nothing and you'll like it."


TIL. ~15 years ago I read "don't use <i>, use <em>", and so I did. Now it seems I need to go and change a bunch of <em>s to <i>s.


*<cite>


<cite> is also pure joy. From MDN:

> It's worth noting that the W3C specification says that a reference to a creative work, as included within a <cite> element, may include the name of the work's author. However, the WHATWG specification for <cite> says the opposite: that a person's name must never be included, under any circumstances.


In short, if you want to avoid making errors, just don’t use <cite>.


I find this interesting, and wish the W3C could work this out more: In one possible future, 'cite' elements (and the absence thereof) could be used to inform/assist human and automatic fact checking.


It doesn't have clear semantics. <cite> is basically used as "I want something in italics, and it is the title of something" rather than "here is a reference to a creative work". And it gives the lie to semantic markup: we only ever have semantic markup for things we can reasonably expect to be visually set-off. Something like a persons name or a cut-off phrase are never marked up semantically; there could be massive benefits from semantically attributing lists of city/state combinations in flowing text but it just doesn't happen. There's nothing more than semantically attributed visual markup, rather than being semantic markup, and this kind of renaming of <i> is just going overboard.


There are also other things that are not emphasized, but require italics per typographical convention, <i lang="la">e.g.</i> foreign-language fragments.


Modern web developers don't use semantic HTML for screen reader support. They manually apply screen reader interpretation data the same way they manually apply Tailwind styles to every element. (That's what WAI-ARIA is for, most notably the "role" attribute.)


Most of the new HTML5 elements (main, aside, article, section, nav…) have an implicit ARIA role, no need to add one manually.


A lot of the old ones do too, but elements not named <div> are an elitist conspiracy to prevent people from becoming web developers.


Sounds like a rat's nest of fragility. :(



Once upon a time, HTML was a document format, and there was a dream that these documents would be interconnected with deep meaningful semantics that would be machine readable.

But then HTML became the rendering layer for a universal sandboxed VM, and all that went away.


> What has semantic HTML ever done for us?

Why, it has liberated us from European colonialism. <i> being no longer italic, you can wrap it around any script from the world and enjoy the semantics of it.


I'm confused, is this a joke? If it is, is there context here I'm missing?


The joke is that italic can also mean Italic, the ethnolonguistic group of those who spoke Italic languages and why Italy is named so.

The phrase "What has semantic HTML ever done for us?" is a riff on the quote "What have the Romans ever done for us?" from the movie Life of Brian, about a Jewish-Roman man mistaken for another Messiah.

The Romans are, of course, Italic peoples.


No, that's not the joke, though for a brief split second, some moments after I wrote the comment, it occurred to me that the interpretation might arise.

Italic type is derived from a form of semi-cursive writing, and is entrenched in the European printing tradition. No punning with Italy intended.

So what does it mean if you have <i> around Chinese, or Devangari or what have you? Do you just apply shear to make the text slanted?

I suspect that it bothered some "woke" types that HTML contains Euro-centric typographical directives, so they have been repurposed to have some sort of, culturally neutral semantics that is more inclusive of the planet's diversity (pardon me if I'm not nailing the terminology here).

In short, <i> no longer belongs to whitey and his writing system.


> So what does it mean if you have <i> around Chinese, or Devangari or what have you? Do you just apply shear to make the text slanted?

Actually, yes! <i> is still defined in the standard to use the italic font style: https://html.spec.whatwg.org/#phrasing-content-3

And if there isn't an italic version of the font (often the case for Chinese etc), https://w3c.github.io/csswg-drafts/css-fonts/#font-style-pro... says the browser should programatically shear it:

> If no italic or oblique face is available, oblique faces may be synthesized by rendering non-obliqued faces with an artificial obliquing operation.


While Italic type is a European thing, it is a very useful thing that does have some parallels in other scripts (Consider Rashi script for Hebrew, or how Japanese uses Katakana in a similar way). No need to get all 'anti-woke' about it: if it were the case that people had a problem with 'Italic type', the phrase 'oblique type' is right there for use.


Can the semantics of <i> stretch so far as to turn hiragana into katakana?


I thought cercatrova's interpretation was a lot funnier.


So what does it mean if you have <i> around Chinese

There's actually a tag for this. I can't remember what it is, but it puts dots above the ideograms, which serves the same function.


Speaking only of HTML tags, you might be thinking of (the HTML version of) "ruby" characters, which isn't dots, but does allow for annotatation:

https://en.wikipedia.org/wiki/Ruby_character#HTML_markup

With CSS `text-emphasis` we can do the dots you are speaking of:

https://css-tricks.com/almanac/properties/t/text-emphasis/

Probably there are other useful bits of HTML/CSS for Asian languages. (right to left and/or vertical writing, anyone? I always wanted to fart around with pretty-printed Chinese poems and, like, calligraphy script fonts with procedurally generated jitter.)

Edit: there are also Unicode entities for these:

https://en.wikipedia.org/wiki/Emphasis_mark


[flagged]


Just saying slanted is not problematic. would be.


Italic --> Of Italy.


Not to be confused with idio[ma]tic, of idiots.


From my end, it's intended that way, and I hope it is so.


> You skipped "fuck semantic HTML and CSS, we use Tailwind now."

If look at Tailwind's website and examples, you'll see that it's one of the very few CSS frameworks that properly uses semantic HTML everywhere.

And there's no such thing as "semantic CSS".


Most CSS best practices are based around semantic naming: you name your elements ".call_to_action" or ".sidebar". HTML5 introduced a lot of semantic elements such as <header> or <article>.

Whereas with Tailwind or HTML < 5 you name you elements based on what they look like. ".font-bold" or "<b>"


CSS purists will tell you that you shouldn't tie your CSS to specific structures in your code, and it should be as generic as possible :)

> Whereas with Tailwind or HTML < 5 you name you elements based on what they look like. ".font-bold" or "<b>"

No, you don't. With Tailwind you use the semantic elements in HTML. And you style them using generic primitives


So with the @apply rule you can sketch things out in tailwind and put them behind semantic names if you have PostCSS running.

It works quite nicely.


> And there's no such thing as "semantic CSS".

I think the parent meant “[semantic HTML] and CSS”, not “semantic [HTML and CSS]”.


Ah, I just decided to stop learning new things before tailwind came along


Wait isn't Tailwind pushing the vast majority of its users to produce much more accessible websites than they would otherwise?


It’s just another Bootstrap.


It definitely looks like bootstrap in syntax. A lot of "css superfans" actually really enjoy tailwind in use. It solves more problems than it causes for some people.


It’s basically a return to the days of HTML without CSS. CSS classes are not applied as an abstraction of different visual styles or components, but purely as tools to define visuals in situ.

I suppose it’s easier to reason about, but it feels like it was invented by that grug brain dude who likes solving lots of problems but hates thinking.


It’s really convenient for prototyping but best to refactor to stylesheets before pushing all that bulk to production


I wonder if AI screen readers will ever use the visual appearance of something to determine what it means. So even if you draw on a `<canvas>` it can still read the text. That would be sick.


In my view all canvas-related JS libraries should treat accessibility as a foundational requirement. It's not "too difficult" to implement.

Here's an explanation of the accessibility features I've added to my canvas library[1]. These features may not be the best approach, or even the most appropriate approach ... but at least they're there, and they can evolve into something better as/when people offer feedback.

[1] - https://scrawl-v8.rikweb.org.uk/learn/eleventh-lesson



This is fascinating. Do you know what ever happened to these projects?

Prefab[1] and especially Bubble Cursor [2] look like be super useful additions for the flat-UI era.

[1] http://prefab.github.io/

[2] https://www.tovigrossman.com/BubbleCursor/


You have a fantastic idea there, and if you'd like to attempt to make it a reality, I'm happy to brainstorm it with you. You keep full ownership of the concept; I'm happy to contribute given the people it'd help.

You can contact me directly using my handle at my domain, which is 21337.tech


z3c0, much appreciated, but I am prioritizing other things at this time. Let me hit you up at a future time if no one has already built something with this feature already!


Fine by me! We'll table this for then.


Better to use a div instead.


More than just a witty name, I see


I feel that this is where accessibility needs to go. We cannot rely on toolkits to implement support for accessibility standards because there will always be people building apps in niche toolkits / homemade gui's which don't support accessibility, especially as wasm becomes more common on the web.

We need to treat the screen like a PNG which an AI can parse to determine what the text is, what text is relevant to read to the user, where and what the buttons are, how to emphasize reading the text based on the visuals, etc. It's the only thing that will scale and while difficult it seems that creating something like this is within reach.


No. That’s like saying OS should use AI to try and figure out what buttons are to make them clickable, rather that developers actually building an interface for users.

All operating systems and platforms have pretty fluent and in-depth APIs for exposing an “accessibility tree” for assistive technologies to use. It’s your responsibility as a developer to build interfaces for actual users, and there’s very little reason not to.


> That’s like saying OS should use AI to try and figure out what buttons are to make them clickable, rather that developers actually building an interface for users.

That's actually a really good idea.


Especially now that designers are making buttons/clickable areas hard to distinguish ... I could totally use AI to highlight all the things that are actually, you know, interactive.


You don't need an AI to do this, just make your own custom build of whatever operating system you use with styles that aren't crap. It's almost certainly a more tractable task than trying to make an AI work out what the interactive bits are - considering it seems to be a business objective for many companies nowadays to make it hard to work out what the interactive bits are.


I've seen systems at Boeing that understand CAD drawings semantically, like how arrows work and what regions are alternate views of what other regions, and whether a drawing is of the nose landing gear or the flap control, all based on scanned drawings from the 50's. So ... yes. Absolutely.


> all based on scanned drawings from the 50's

It's 2059. We're using browsers that render pages created with Dreamweaver and Frontpage using advanced machine learning techniques to determine that <blockquote><font size="+1"> is an entry in the table of contents, while <p><font size=+2> is a level 1 header.


That sounds SO COOL. Can you tell me more? Or do I have to go ask my friends at Boeing about it? :)


I agree in principle, but good luck with AI interpreting flat-UI controls visually, it’s confusing enough for sighted users.

This actually gives me a little hope that AI-interpreted UI could trigger a trend back to visually obvious and unambiguous controls.


YES. I have seen layout prototypes for this kind of thinking, and they're magical. It's not trivial, but will come more and more into reach. All of visual layout delivers a spatial visual language that all humans understand. We know what headers _are_ because we know what headers _look like_. And not for a single style of article, but for all articles.

I also used to believe that rigorous semantic tagging was The Way To Do Documents, and it was certainly a useful crutch, but we absolutely need to move beyond that.


I find that Tailwind makes it a lot easier to use semantic HTML because it discourages element styling and provides an aggressive browser style reset.

H2 or H3? A purely semantic distinction with Tailwind.

Use an OL for a custom list of items? You start with a clean slate, no overrides required.

And you are discouraged from writing custom CSS that causes unexpected styling as well.

Every HTML tag with an empty class attribute is a clean slate. You only have to care about the default display mode and about allowed descendants.


The screen reader cares about the big stuff, such as using headings, tables, and lists correctly. And of course, implementing ARIA attributes correctly. Other than that it doesn't matter too much.


> sometimes I just want to design something basic without having to context switch to CSS. I want italic, I use the <i> tag.

Just use the <em> tag.


What if I want text that’s italic but not emphasized?


Yes, e.g. what if you are quoting something with italics in the original text? For older texts in English, italics might easily indicate a word or phrase in a non-English language. Showing that with semantic "emphasis" might convey the wrong impression -- e.g. without a note such as "emphasis added".

The OP addresses "idiom in another language" as one case, but if it is within a scholarly quotation, can one change the typography to a different convention?


Then there's the matter of languages such as Japanese that use a different set of characters (katakana) both to indicate emphasis and to mark foreign loan-words. Would katakana characters still be enclosed in an <i> element? Would styling be modified accordingly?


Perhaps you could add emphasis dots (https://en.wikipedia.org/wiki/Emphasis_mark)!



but... what if the browser chooses to render emphasis in some fashion other than italic? How then could I force my design opinions on the reader, despite the contrary configuration of their user-agent? Surely we can't allow the end-user to control their experience when it's the designer's intention which really matters.


okay, fine, you’re all still using “i” so we’ll put it back in the standard

Going from HTML/XHTML to HTML5/XHTML5, several elements were redefined, including <i> and <b>.

It’s nothing new.

Here’s an article from HTML5 Doctor explaining this more than 12 years ago [1]. I hope web developers serious about their craft aren’t just now discovering this.

[1]: http://html5doctor.com/i-b-em-strong-element/


If they wanted people to use a different tag for such text, they should have created an idiomatic tag instead of attempting to repurpose existing ones. Adding alternative elements is usually the better solution.


If they wanted people to use a different tag for such text, they should have created an idiomatic tag instead of attempting to repurpose existing ones.

If you read the spec, it mentions several different uses for <i>, depending on the context. Otherwise, we’d need a different element for each context, bloating the spec by having special purpose elements instead of general ones.

For example, <i> is used for ship names:

    <p>They came over on the <i>Mayflower</i>.</p>
It wouldn’t make a lot of sense to have an element just for names of ships in a general purpose markup language like HTML.

Turns out, if you look at the Wikipedia article about the Mayflower, the name is marked-up with the <i> element [1].

[1]: https://en.wikipedia.org/wiki/Mayflower


I kind of like using <i> for other user’s names that appears in a sentence. For example if a joker calls them self, “you piece of shit” it would be kind of unfortunate for another user to open up a dialog and it says: “Are you sure you want to delete the user you piece of shit?” Now with the user’s name in italics, this is less ambiguous.

In general I think many UXs are too shy about using italics.


Dialog window:

“Are you sure you want to delete the user you piece of shit?”

Options:

> Yes, you worthless dildosaurus

> No! Stop! I hate you!


And the actual change was made at least 15 years ago: https://web.archive.org/web/20071001102153/https://www.w3.or.... (Can’t be bothered finding even earlier drafts to find when it actually happened, but it could be a few more.)


MDN's "interpretation" of <i> has, in that span of that last one, also changed between "alternate voice"(?), "interesting", and now, apparently "idiomatic" text—I'm sure I've missed a few. I feel like it'd be a service to the world to just suck it up and let it be the "legacy italics" tag, though.


MDN's "interpretation" of <i> has…

It’s not an MDN interpretation; this has been part of the HTML5 spec since day 1.

I linked to a 12 year-old article earlier in this thread [1] describing this very thing when HTML5 was new.

[1]: https://news.ycombinator.com/item?id=33556432


As far as I know, the spec still calls it "alternate voice or mood"; it's MDN, not the spec, that now calls it "idiomatic" text. So I stand by it being the MDN interpretation that's changed, not the spec (to pretend that it's not "we tried to remove it and failed" by making it something that at least actually starts with an I now, I guess, twice).


idiomatic is certainly in the spec [1]:

    The i element represents a span of text in an alternate voice or mood, or
    otherwise offset from the normal prose in a manner indicating a different
    quality of text, such as a taxonomic designation, a technical term, an
    idiomatic phrase from another language, transliteration, a thought, or a ship
    name in Western texts.
[1]: https://html.spec.whatwg.org/multipage/text-level-semantics....



The entire HTML standard process has traditionally been dominated by people with Very Strong Opinions and little sense of practicality. It's a little better now though.

The practical thing to do was to just re-define <i> and <b> to mean what <em> and <strong> were defined as then add additional elements for when more fine-grained meaning if needed. Instead, countless of hours had to be lost on s/<b>/<strong>/g instead.


Thing is, not all uses of <b> are valid uses of <strong>.


But everyone just did s/<b>/<strong>/, so...

This is exactly the sort of "argument from purity" that lacks any practicality that I meant.


> here’s a convoluted meaning

I think the word you are looking for is “contrived”


indeed. thanks


To be fair, <em> is semantically not appropriate for the use cases that <i> is these days recommended such as H. sapiens and RMS Titanic. These are not about emphasis but a typographical convention.


We need to work on the following:

1. Add an "i" element to SVG for isometric path drawings

2. Update the HTML parser algo to special case this "i" for inline SVGs


Do you need to updated the parser though? I kind of thought SVG elements were implicitly namespaced. For example we have the two <a> tags. One HTMLElement <a> and another Element <a> with the namespace "http://www.w3.org/2000/svg". I don’t know of any meaningful difference between those two elements, except if you try to insert the former into an SVG DOM tree, then weird things might happen (though I haven’t tested it).


Yeah, only a terminally-dumb parser could confuse an SVG <i> tag with an HTML <i> tag. A proper HTML parser should ignore anything inside tags it doesn't understand - such as the <svg> tag.


Why isn't this deprecated and marquee is?


<marquee> gives you too much power, cannot handle.


3 fewer letters than "span".


This is quite annoying. There is a lot of web-pages that use <i> to mean "italic"; possibly most of them. Many of those are not going to be updated just because of a change in the definition of the tag.

Instead, they should have introduced a new tag, e.g. <idiomatic>, while deprecating <i>.

Most internet standards are descriptive, not coercive. The semantic web's advocates have been coercive from the beginning, but despite the repeated failures of coercion, they keep on at it.

What does "idiomatic" mean anyway? If I say that idiomatic markup is a waste of time, do I have to put "waste of time" in an <idiomatic> element? How is text in "another language" idiomatic text? [Checks for another example] Oh, that's the only example they give of something that's "idiomatic".

It seems clear to me that "idiomatic" isn't what they mean at all; what they mean is some text fragment that should appear differently, because in some way the "mode of speech" is to be treated differently. It looks as if they've hijacked the <i> tag because they want it to stop meaning what it means; they've chosen a meaning that starts with the letter "i" for that reason.

The way I parse it, "idiomatic" text is any kind of text that would normally be set apart by being presented in italic. That is, it's not even semantic markup at all; it's a politically-correct gesture to the semantic markup die-hards.


Fret not. The term “idiomatic” is just MDN’s editorialised and poorly-chosen title for the underlying element, heavily incomplete and simplified past the point of conveying the meaning at all. They should have given up on choosing a single word for it (like they haven’t tried with <s>). The actual spec describes it much more reasonably:

> The i element represents a span of text in an alternate voice or mood, or otherwise offset from the normal prose in a manner indicating a different quality of text, such as a taxonomic designation, a technical term, an idiomatic phrase from another language, transliteration, a thought, or a ship name in Western texts.


Next up: "<b> is for... beneficial text."


It's actually for "bring attention to" :)

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/b


> Do not confuse the <b> element with the <strong>, <em>, or <mark> elements. The <strong> element represents text of certain importance, <em> puts some emphasis on the text and the <mark> element represents text of certain relevance. The <b> element doesn't convey such special semantic information; use it only when no others fit.

This is really hard to take seriously, even as someone who advocates for semantic HTML.

Surely all these nuances mean something tangible, but I don't see it.

Compared to <b> vs <strong> vs <em>, <mark> and <i> seem comparatively sensible.


I agree. I like the idea of semantic HTML with close to no classes, but the standard is really failing here. I generally just use <em>, and very rarely <strong>.


So that's 'b' for 'bring'. But you could bring anything, couldn't you? E.g. "Party tonite; <b>your own bottle</b>".

It's a stretch.


Not to be confused with "<strong>: The Strong Importance element"


Also not to be confused with "<mark>" to indicate relevance. A use case might be if the respective part of the text is of strong relevance and you want to bring attention to it by emphasizing it.

But now I'm unsure if I should consider these distinctions <bullshit> or <horsecrap>?


Thing is, when you're composing prose in your native language, you don't generally parse it and re-parse it to figure out whether some term is "relevant", or requiring attention, or in an "alternative mood". A native speaker/writer isn't normally aware of the grammatical aspect of what they write; they know how they want the text to appear.

Trying to force writers to instead overlay a layer of meta-meaning on their prose, so that the meaning is expressed both in the text and the markup, is a fool's errand. It's a bit like a programming system that requires the developer to express his meaning in both Forth and Python; it's just asking for an author to write text that directly contradicts the markup.


Also "<b>: The Bring Attention To element"

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/b


I thought you were joking until I clicked the link.


Is it just Mozilla coming up with these un(?)-intentionally hilarious retcons?

Or are these actually becoming official standard names?

I'm truly enjoying the sheer absurdity of it all.


They are part of the changes to the WHATWG HTML specification, so it's not just Mozilla. MDN is just reflecting/documenting what is in the different standards.

From https://html.spec.whatwg.org/multipage/text-level-semantics.... (with _ added for the relevant section used by MDN):

> The b element represents a span of text _to which attention is being drawn_ for utilitarian purposes without conveying any extra importance and with no implication of an alternate voice or mood, such as key words in a document abstract, product names in a review, actionable words in interactive text-driven software, or an article lede.

From https://html.spec.whatwg.org/multipage/text-level-semantics.... (with _ added for the relevant section used by MDN):

> The i element represents a span of text in an alternate voice or mood, or otherwise offset from the normal prose in a manner indicating a different quality of text, such as a taxonomic designation, a technical term, _an idiomatic phrase_ from another language, transliteration, a thought, or a ship name in Western texts.


The standard doesn't mention "bold" and offers no reason why <b> is <b>.

https://html.spec.whatwg.org/multipage/text-level-semantics....


From the HTML4 spec (https://www.w3.org/TR/html401/present/graphics.html#h-15.2.1):

> B: Renders as bold text style.

The element (and other style-related elements) were deprecated due to CSS. For b and i the strong and em elements were created as semantic alternatives. Now the b and i elements have been given specific semantic usage and are now not deprecated.


<b> and <i> have never been deprecated, in any version of HTML or XHTML. (<u> was deprecated in HTML 4, though.)


It's not Mozilla; it's WHATWG, whove been trying to subvert standards and to social-engineer web developers since they started.


Just be thankful they didn't go with "bigly".


That's <big>

Deprecated, but it will be back.


<u> is perhaps the biggest stretch for "Unarticulated Annotation"


“Underscore” is used figuratively, so they could just stick with that.


Actually no, that’s the most accurate one of them (though <b> as “bring attention to” is fairly close too). The actual spec:

> The u element represents a span of text with an unarticulated, though explicitly rendered, non-textual annotation, such as labeling the text as being a proper name in Chinese text (a Chinese proper name mark), or labeling the text as being misspelt.



> Do not confuse the <b> element with the <strong>, <em>, or <mark> elements.

> The <strong> element represents text of certain importance, <em> puts some emphasis on the text and the <mark> element represents text of certain relevance.

This really confused me. Isn't this all basically the same?


<small> was permitted to become "the side comment element", but the gods turned their faces away from <big>.


https://developer.mozilla.org/en-US/docs/Web/HTML/Element/st...

<strong>: The Strong Importance element

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/hr

<hr>: The Thematic Break (Horizontal Rule) element


I had never heard this retcon.


This is what the WHATWG group has been spending their time on over the last decade.


No. This is what MDN choose to call it. The actual spec uses no such silly names. Concerning the actual semantics, which are WHATWG’s domain, that was done over fifteen years ago.


Mozilla stopped spending their money on making a good browser, instead started blowing all their Google-cash on “inclusivity” and “diversity” efforts many, many years ago and fired technical staff which would not tow this new political line.

Looking at stuff like this, I think the results are pretty obvious. Despite the “diversity”, the value of what Mozilla offers hasn’t simply stop increasing, it has now started decreasing and in cases like this actually brings negative value to the internet overall.

This is what firing/alienating the actual people who did the actual job does. And no amount of “diversity” can make up for it.

What a shame. Mozilla used to do so much good.


<b> is amazing if you need some titles in contexts where <h1>-<h6> doesn’t make sense (i.e. will confuse assistive technology). Pair it with aria-labelledby and you’ve created a title for things like <aside>, <nav>, etc. without confusing normal navigation.


So unfair to marginalized non-english speakers!


What's unfair?


Sorry, was bleeding over from another post.

I'm not really sure what unfair is, though, to answer your question.


I'm so confused. Am I speaking with GPT-3?


The explanation for the i element has had a rough history: I remember wording such as "the i element is for content that's normally put into italics" lol. Demonstrating the whole concept of "semantic" HTML is simply flawed at a fundamental level: HTML is a vocabulary for casual academic publishing, with headings, footers, lists, etc. for structuring, and other very specific elements (such as <var>) typical of such use case. But what about other pieces of text, such as threaded discussions like this page? For some, using a vocabulary fit for a specific purpose with mapping to a rendering vocabulary is the entire point of markup.

"Semantic" HTML seems to be pushed after the fact as a justification for CSS to have syntax completely separate from HTML, reserving attributes for "behavior". The idea that markup attributes shouldn't contain style info is laughable when they're there for the exact reason to associate typed properties with elements.


Surely we can come up with a semantic name for <blink>?

"Bring levity in kind"


Make A Really QUool EEffect


“Bi-legible ink”?


Bi-Luminous Information Node (type K)


marquee -- make almost really quite unilaterally everything elevate


Make A Really Quite Unreadable Eyesore, Eh?


Move Around, Rambling Quaintly Until Eventual Exit


Especially if it's h1 fushcia and red text on a yellow background blinking with bold, italic, and underline. Something like "1995-12-10 Please read my website completely before emailing me directly: bob941 (at) compuserve.com . . . . . "


Of course it is funny to look at a committee trying to invent some grand vision for something that was simply a small set of options available on desktops around 1990.

The real problem, however, is that many writing systems have never had any kind of oblique style for emphasis. Or cursive (which is a different thing). Or they have cursive, and use it all the time in print, but technically that's a different font without any connections to the main one. So any person making some kind of template should be aware that the effect of <i>-as-italic can range from “obvious thing everyone expects and understands” to “does nothing at all” depending on language and font. I think they should have got straight to that point in the article.

It is pitiful how those unreflected small choices can put fences in one's mind. Everyone knows the font selection dialog in the browser, and probably thinks that the world is made of serif, sans-serif, monospace, and some Comic Sans. But that's nonsense, even for Latin.

Even more pathetic is inability to use basic punctuation marks. Something as barebone as Fixedsys font had all the quotation marks and dashes since Windows 3.1, long before programs became Unicode-aware. All that time absolutely nothing has been stopping you from pressing the key and getting the proper symbol just like you type any other… except the typewriter key labels hardware makers still use, and the default system layouts being just as stale.


> Of course it is funny to look at a committee trying to invent some grand vision for something that was simply a small set of options available on desktops around 1990.

I honestly find it embarrassing rather than funny.

"The foundations of the modern web were built by people who couldn't possibly have imagined the importance their inventions would play in the future, and the sheer breadth of applications they would be used for. One of the consequences of the web's convoluted history is that tag names like <i> are historical artifacts that don't make sense in the context of modern semantic elements."

Is it really so hard to admit that? Why make up bogus explanations?


They've clearly put the effort in, but I'm not sure it's a retcon I personally like.

I think I'll hang onto the original Season 1-4 interpretation for my headcanon.


<em> does have its uses. If you have an entire paragraph that's styled as italic and you want to emphasis something within that, usually what you do is make the emphasized span non-italic.

Italicized paragraphs or long spans are more common than you might guess. Sometimes they're used to indicate speakers in a dialog or to signify editorialized interjections. No idea if browsers are smart enough to render nested <em>'s this way now without specialized CSS.


I always use <em> for any kind of emphasis, and I'll even do <em class="italics"> and <em class="bold">, fight me.

I've seen <i> used for icons(!) but that's a bridge too far for me.


IIRC, <i> for icons was/is the markup FontAwesome suggested/suggests.


Of the available elements, which one is suitable for icons, especially when being used in text (just yesterday I learned that emojis are even used in code like css). Clearly they should be in an element to separate them from the text, but which one is appropriate?


<svg>


God, can we just kill the semantic web stuff already? It's like a bizarre religion that won't die.

By the time the web is finally "semantic" and everything is marked up appropriately, we'll have AIs that just infer the semantics and don't need all the meticulous tagging.


Doing whatever we want and just hoping AI will eventually fix the mess it is also how we got sub-par public transport but billions invested in self-driving cars.


I mean, the semantic web had a good try. The farce has been going on for 25 years now, it had its chance.

meanwhile, yes, google does have to use AI to figure out webpages because the semantic web is a failure.


Some people even tried RDF and triplestores to structure metadata and separate from presentation. XSLT was supposed to become an almost universal presentation layout language/transformation language. Nawh, we have REST APIs, ORMs, and HTML and JSX templates that present rendered views devoid of data and logic behind them.


A lot more money has been invested in public transport. Turns out, running buses and trains all the time gets expensive.


I don't doubt billions have been invested in public transport. I also don't doubt those billions have been (and will continue to be) more cost-effective than the billions invested in self-driving cars.


It's overly semantic.. but the core /document/ semantic outlining does seem useful to me, particularly when I started having to take accessibility into account when designing applications and pages.

I will spend time considering if something other than a <div> is more appropriate for an element. I'm not going to spend very much time pondering the difference between <b> and <mark>.


It's embarrassing that they do this without even apologizing.


I'd rather people just collectively give up on semantic HTML. What is even the source of hope for this cause? The rest of us gave up a long time ago. I'd much rather see energy spent on expanded ARIA support, something that actually moves the needle on accessibility.


Absolutely. Leave HTML as is and focus on actual accessibility features. They could even add screen reader specific elements and indicate that all other page elements should be ignored. So, you would show one presentation for visual purposes and the other for audible purposes. (Or even braille for feedback purposes.)

Add something to the document that tells it to ignore screen readers as long as a <screenreader> tag is present that puts everything in one place.


> What is even the source of hope for this cause?

Because nested layers of DIV are apparently "semantic"


Documents and humanity in general are complicated and various. You're never going to be able to fully encapsulate the complexities of a document into any (structured) DSL as a result. The recommended approach, failing all other options – even to the semantic people – is to use some standby element, such as DIV or SECTION. Unless you're including people who would say "just change your document to not do anything not representable by Semantic HTML". Add to that, that the list of supported elements aren't even close to approximating what you'd write in 50% of documents.


Fun fact: in manuscripts, italic is represented by a single-stroke underline. An equivalent print representation is spaced text. These forms are sementically identical.

BTW: The <u> element has been rebranded as "Unarticulated Annotation". :-)

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/u

(So traditional emphasis is now both idiomatic and unarticulated.)


Ironically, in the list of examples of how to use <i> under Usage Notes, they actually use <em> instead.


Indeed: "<em>What is this writer talking about, anyway?</em>". I don't find that ironic, though; I think it bespeaks the carelessness with which the HTML "standard" is being jerked around by various unaccountable pressure groups.


Nonsense. This is MDN docs, not the spec. Don’t conflate or confuse them.

In this specific case, it’s probably just a bad mechanical translation, possibly from long ago as part of a batch i → em change, or possibly recent in their conversion from HTML to Markdown (might have done both em and i → *). Ill-conceived in either case, but not the end of the world.


You're right; I have conflated them, for many years. Fact is, MDN docs is a million times easier to read than the specs, which used to be clear, but are nowadays full of struckout text and references to other documents.

If MDN docs is autogenerated, that's regrettable. I swear I'm not going to fall back on W3Schools.

Incidentally, whatever happened to W3Schools? There was a time when every google search for anything remotely similar to web development came up with the top 5 links being to W3Schools. Nowadays not so much. Did google finally decide that W3Scools content amounted to disinformation?


And then there’s me who uses nothing but div elements and hacks at them with css like a drunken butcher moonlighting as a surgeon until my webpage kinda looks correct on my browser and monitor.


Some days I think html should have no predefined nodes at all. just use whatever you think is best and use css to get it the way you want it to look.


So, custom elements?


Using CSS you can make most html tags behave like other html tags. I’ve made all the divs look and behave like strong tags and vice versa to demonstrate to people new to CSS that the default styling of tags is mostly artificial.


Slightly odd that they explicitly say:

  these should include the lang attribute to identify the language
but then don't do that in the Example (with *vini, vidi, vici*)...


The mildly uncomfortable aspect of it is that <i lang="la">veni, vidi, vici</i> might change the font the text is rendered in, because the generic font families (e.g. serif, sans-serif, cursive, monospace) can vary by language. For me, for example, my usual serif is Equity A, but in Latin it changes to Noto Serif and I haven’t been bothered to patch this up in my Firefox config because it takes far too much effort, requiring changes in a number of languages (not such as French (fr), but yes such as Latin (la) and Māori (mi)). Or sans-serif: Concourse 4 becomes Noto Serif. This will honestly cause me to omit lang=la in some iffy cases where I would write lang=fr on a similar word or expression from French.


In human-machine interaction "semantic" means "you know exactly what the computer will do with that". So "<i>" or "<b>" tags have very clear semantic, while "<em>" or "<strong>" are less so and things like "<article>" are totally vague.

Although this is not quite correct. "Semantic" is what makes sufficient distinctions for a given case. For example, assume I'm going to publish Ashby's "Introduction to cybernetics". It has an interesting structure: numbered chapters ("2"), smaller divisions within a chapter with a title but without a number, and yet more smaller units that are numbered like "2/17". Some of these smaller units have a title, some just a number. The point is that all this is rather unique.

So I'm about to mark up the text to indicate all this. A semantic way to do that would be to invent a notation that is as unique as this book. There will be "<ashby-chp>", "<ashby-div>", "<ashby-unit>" and such. We do not specify how this is going to be rendered, but at least we faithfully describe what we have without losing anything and without adding anything irrelevant.

Now we want to render it on a visual media. Here we have a different notation that describes fonts, styles, spacing and so on. These are very different distinctions from the structure of the book, but are very approriate for typesetting. We decide how we represent our "ashby" distinctions with these tools and write a transform from our notation into the visual one.

Same for a screen reader. Here we have a notation that describes pitch, speed, etc. No spacing or fonts, of course. We decide how we are going to represent our distinctions with these and come up with another transform.

At each step each element in our notations has a clear purpose. The "ashby" notation captures all the distinctions the author needs to make his point. The "visual" and "aural" notations are tools to express distinctions that can be made on a specific media. This is semantic. And this, by the way, is the original idea of XML (a multitude of notations) and XSLT (the notation transformer).

[And the description of the transforms also uses yet another notation with yet another clear purpose :)]


I prefer ∠this⦢ way to mark text ∠italic⦢ and ⁎this* way to mark it as ⁎bold*.

These are the symbols:

Italic:

∠ ANGLE Unicode: U+2220, UTF-8: E2 88 A0

⦢ TURNED ANGLE Unicode: U+29A2, UTF-8: E2 A6 A2

Bold:

⁎ LOW ASTERISK Unicode: U+204E, UTF-8: E2 81 8E

* ASTERISK Unicode: U+002A, UTF-8: 2A

I just made this encoding up. Ah, it would be more correct to say I just made this markup up. I made it up and nobody understands it. …Or yeah, actually most people will understand it?


Similarly, <b> has become the “Bring Attention To” element: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/b


So have <b> and <u>. Years ago.


Yes, but I wanted to foster this exact comment thread so it worked perfectly.


Yeah, but your editorialised title suggested a recent change, whereas this change was made over fifteen years ago. (Can’t trivially find exactly when the change was made, but it’s already present in <https://web.archive.org/web/20071001102153/https://www.w3.or...>.)


Hilariously, the modern prescription for <b> ("text to which attention is being drawn for utilitarian purposes without conveying any extra importance") probably covers more uses of boldface than <strong> ("strong importance, seriousness, or urgency for its contents").


Absolutely not! We also need em and strong, just to be confusing and redundant like Perl.


Okay, I've got a question then.

If I have a label to some form element in italic, it would not be correct to write it as <label><i>Name:</i></label> , right?

There's no surrounding text it needs to distinguish from and the fact that it is a label should already point out to screen readers that it's a label, while adding an italic font to the label without the i should distinguish it visually, is it correct?


Fun fact! Italic typefaces was originally not designed for emphasizing text but as a way so save paper since the slanting allows the letters to sit tighter.


Another fun fact. In linguistics, italics is used to denote the utterance rather than its meaning. It's not emphasis, but a visual signal that says "this bit is in another language or a single syllable or whatever we're discussing".


This is useless insofar as in a rich-text editor, when the user clicks the “i” button or presses Ctrl+I, they still mean italics, and there is no separate “idiomatic” button, which if present would furthermore only confuse most users. Same for bold.

Also Markdown doesn’t support the distinction.

This is not to say that browsers should prevent styling <i> elements differently, but the intended meaning is still “whatever italics means”.


HTML is really rightly coupled to layout, even though it's a ridiculous format. And semantics isn't linear, like HTML is. If you want some vague form of semantic representation (because there's no good, universal semantic representation, and the rest is just data tagging), use something else, or add a new namespace if you want to abuse HTML even further.


I've been using <i> for things like inline simple math. I set it to display using serif font and in italic using CSS (while the surrounding text is not in italic and also using sans-serif fonts). The results is good visually, it works. And I felt like it was a good use of this tag. I'm now even more convinced! Thanks for sharing :).


An alternative that’s slightly more semantic would be Unicode’s Mathematical Alphanumeric Symbols block.


Under Usage notes it says:

- Ship or vessel names in Western writing systems

Am I the only one finding this a bit overly specific?


Someone is trying too hard to make something very normal seem niche.

They’ve probably gone to a school whose curriculum includes the terms “critical” and “theory” and “colonialism” more often than not.


>The <i> HTML element represents a range of text that is set off from the normal text for some reason, such as idiomatic text

This seems backwards to me. Doesn't "idiomatic" mean natural-sounding? Setting it off from the normal text seems kind of the opposite.


No. Idiomatic means "the normal way of doing things" basically. So in that context they are trying to say "et cetera" is the normal way to say "and so on" even though it's not English. It's a big stretch though. You wouldn't use "idiomatic" to describe that text unless you really wanted to find a word starting with i.


What you're saying seems to agree with my point. If something is "the normal way of doing things", why would it be "set off from the normal text". You're saying idomatic means normal, whereas MDN says it means "set off from the normal", aka abnormal.


I’ve seen people using <i> and <b> html elements to do whatever they need them to do, more than their old italic/bold uses.

Other than the fact that they aren’t semantic, it does make some sense to use small tags if you are concerned about saving bytes.


FontAwesome's documentation in the past contained examples using the <i> element. Example:

<i class="fa fa-facebook"></i> <!-- renders the Facebook logo or something like that -->

I believe they presently just use <span> now.


Conversations around semantic HTML make me wonder: asking devs / product teams to use semantic tags, ARIA attributes, etc. doesn't seem to work; perhaps we should focus on making screen readers and other assistive tools smarter?

Put another way, where's the assistive technology equivalent of OXO Good Grips? I'd imagine a tool that is part ad-blocker, part auto-summarizer, part navigation helper, part personal curation assistant; that does the Right Thing (tm) 90-95% of the time, even on highly dynamic web applications (since that's already _way_ better than the proportion of sites / applications built with a11y in mind); that people in general find to be a superior user experience, regardless of whether they have disabilities or not.


Too bad memes can't be included: "<i> is profanity. --- Change my mind."


What does <i> do for Han characters?

I read the spec, but there's not much of a hint there.


You mean Chinese characters? Probably nothing.

> idiomatic text, technical terms, taxonomical designations

These were usually put in quotation marks.

For book titles, wrap them inside 《 》


> Chinese characters

In Unicode, Chinese, Japanese, and even Korean and Vietnamese characters are combined into something called "CJK Unified Ideographs",or just "Han".[1][2]. So they all get the same treatment for something such as italics. Not sure what HTML currently specifies for that.

[1] https://www.unicode.org/reports/tr38

[2] https://en.wikipedia.org/wiki/Han_unification


Pour one out for the blink tag.

https://www.w3docs.com/tools/code-editor/13719


Who says sophistry is dead?


They pretend they want tags be about semantics and not style. And yet and yet, they still refuse to add the "sarcasm" tag.


Technically there is sarcasm tag in HTML specification, it's one of tags that is specifically handled in https://html.spec.whatwg.org/multipage/parsing.html#parsing-... ('An end tag whose tag name is "sarcasm"').

However, a problem with sarcasm tag is that it wouldn't really help accessibility compared to say saying "Sarcasm:" or something like "(The preceding remark was sarcastic.)".




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: