> To dramatically reduce ambiguities, we can remove the doubled character delimiters for strong emphasis. Instead, use a single _ for regular emphasis, and a single * for strong emphasis.
I would love to see * gone but I must note that _ is annoyingly hard to type on a screen keyboard.
Back in the days of USENET one common choice was using a / to delimit /emphasis/ - the usual reading was that this indicated words that would normally be rendered as /italics/. You'd often see it used to indicate the titles of books and movies, as well, since the typographical convention was typically that these were italicized as well - note that both <em> and <cite> typically render as italics, for instance. I have always disliked Markdown's choice to use * as a delimiter for both italics and bold; / always implied italics to me, and * always implied bold.
Anyway. I propose that / would be a much better delimiter for emphasis than _. On a US keyboard, it can be typed without a shift key. And on a US IOS screen keyboard, it is a simple swipe on B, versus shifting to the numeric entry page and swiping on &.
This is a real problem considering that some programmers are just sloppy with code-quoting paths and such. And it’s even debatable whether (sort of stylistically) you should even code-quote paths.
I will say that forward slashes are still more common in regular english text even among non-programmers than underscores. For example, listing options a/b/c.
I don’t really think of Notion as “markdown” but I suppose you’re right since we support a bunch of markdown conventions. Some things are different though like `> ` is a toggle block, and `” ` a quote block. Unfortunately we already abuse / for a command menu, which is by far my least favorite feature. I want to make a setting to disable it but it goes against our anti-settings philosophy.
Using slashes for a short/small/abbreviated list is relatively common. For example, the top post on the hiring thread right now is for a "Full-stack/frontend/product engineer". There's another one describing a "M/W/Th" hybrid schedule.
Not even common including file paths, unless you mean directory paths. Even then, only necessary if you really want to unambiguously indicate a directory in the path itself.
no, but most people definitely use links in their texts and they have the same problem. / is also regularly used for fractions or/and in a situation where you could use two words
I really like your proposal, but in the days of USENET the // wasn't interpreted by the machine but simply by our minds, just like * for bold. Would there be any extra issues caused by italics being / rather than *?
I'm honestly with you on this and I'm in the middle of building a huge Markdown site where I have the freedom to change the syntax now if I want.
*this is bold*
/this is italics/
_this is underlined_
Beyond simple conventions like this, I'd just as soon drop into HTML as deal with some other markup that ends up being just as complex. We don't need to allow permutations and combinations such as bold and italics, double-weight bold, etc. these never occur in normal prose typesetting and if you need it just use HTML for those rare cases.
Underlining is an emphasis hack for mechanical typewriters or in handwriting. There's no reason to use it typographically in something which has all the layout possibilities of a modern computer or printer.
Except to indicate something new and modern which needs its own visual distinction… like hyperlinks. Using underlining as the default for hyperlinks was genius.
> Underlining is an emphasis hack for mechanical typewriters or in handwriting. There's no reason to use it typographically in something which has all the layout possibilities of a modern computer or printer.
The argument presented by that link is valid for paragraphs and valid for printed content.
On a website, underlining single words or short phrases doesn't make them less unreadable, it draws attention to them.
Like with hyperlinks; the displayed form of `[See here](http://here.com) for more inform` is undeniably better than simply `see here for more info` and leave the reader guessing which of those words is the text and which is a hyperlink.
The problem is exacerbated on mobile where the reader cannot hover a mouse over the words to determine which words are a link and which are not.
If you're writing a full paragraph like this:
"Our previous stories of FooBarFactory Inc were well-received by our readers. Investigative Journalism has always been a core principle of PotatoNews. The images and video that our beloved readers shared on Twitter are only a single component in the fight against big corps polluting our environment."
In the above paragraph, "previous stories" is a link, "FooBarFactory" is a link, "well-received" is a link, "images and video" is a link, "Twitter" is a link and "Polluting out environment" is a link.
The advice from Practical Typography would render that entire paragraph free of any indication that there's more for the user to read.
This is what Org mode does. It's still very tied to Emacs, but there's an effort to standardize the Org format. Hopefully this will help its adoption outside of Emacs, it's a nice markup (and a lot more).
> Back in the days of USENET one common choice was using a / to delimit /emphasis/ - the usual reading was that this indicated words that would normally be rendered as /italics/.
Fuck no. Same idiocy as turning -- into long em, makes writing any technical posts mighty annoying
Get better screen keyboard. On mine _ doesn't require shift, neither does *
Usage of `_` and `-` keys are dead simple home-row keys on Dvorak keyboards. I’ve never switched layouts because most are focused on prose, but programming demands a lot of snake & kebab casing. …not that `/` is too far away.
> While Markdown’s syntax has been influenced by several existing text-to-HTML filters — including Setext, atx, Textile, reStructuredText, Grutatext, and EtText — the single biggest source of inspiration for Markdown’s syntax is the format of plain text email.
I understand where the author is coming from and respect their contributions to Commonmark.
But...
There are tons of markup languages for prose that have well-defined specs.
So, why did Markdown win?
IMO, because it does not have a well-defined spec. It is highly tolerant of formatting errors, inconsistencies, etc. If an author makes a mistake when writing Markdown, you can always look at it in plain text.
Whereas a perfectly-spec'd markup language would probably evolve toward an unreadable-to-humans mess in the committee-driven pursuit of precision.
You see this theme in so many places in tech: "less is more", the Unix philosophy of everything-is-a-file, messy HTML5 over "XHTML", ML extraction vs. explicit semantic web, etc.
> IMO, because it does not have a well-defined spec.
Same reason that JSON won.
JSON and Markdown are base standards that were generated by market need to simplify.
JSON won because it was not overly complex and there was some flexibility. If you need more go YAML or use JSON as a platform for more.
Every attempt to change JSON has and should be shot down. JSON really just has basic CS types: string, int/number, bool, object, lists. From there any data or types can be serialized or filled. With JSON you can do types via overloads/additional keys, you can add files by url/uri or base64, and any additional needs using parts of basic JSON. Even large numbers can just be strings with type defs as additional keys/patterns. Financial data can just use strings or ints with no decimal largely because this is the safest way to store financial data to prevent float issues.
KISS is life and sometimes things are just done, no improvements needed. Now you can take JSON and add things on top of it if you want. Same with Markdown. The base doesn't need to change... ever.
Don't SOAP my JSON. Don't HTML my Markdown. Though you can add specs (JSONSchema/OpenAPI) and formatting tools on top in a processing step. For messaging and base content, they are perfect, simple, clear, concise and no need to change.
I think JSON and Markdown are very different, in fact.
JSON is very strict. It won't let you have a comma after the last element of a list, for instance (which is very annoying in many cases). It won't let you add comments in any way, shape or form. It won't let you use single quotes instead of double quotes. Or forget quotes in keys. Or mess with case in null / true / false. Or use NaN values.
Markdown is ill-defined, and will happily let you do whatever the hell you want.
JSON is made for programs, and is a PITA to write as a human (for the reasons mentioned above). But a pleasure to parse and (to some extent) generate automatically. It's not very good with text.
Markdown is made for humans, and I'd hate to have to parse a markdown file and do something with its content other than basic formatting. It's bad at anything but text.
JSON won because parsing it on in the browser was just was a call to “eval()”, and then you just access the object using normal JS conventions/syntax (e.g data.foo[0].bar). Whereas XML required creating a DOM parser and document fragment, and then using cumbersome HTML DOM methods like “ getElementsByTagName()” to get each value directly (or worse Xpath). It totally sucked.
Native support for JSON parsing and stringify helped when it came later. The Selector api that also came later made XML parsing a little easier if you didn’t want to use XPath, but by then most things were JSON anyway.
> Every attempt to change JSON has and should be shot down.
I really wish JSON allowed for final trailing commas in arrays/objects.
It would make for more readable diffs, simpler text templating, easier writing/parsing for us humans, etc. I'd happily trade all of TOML, YAML, XML, and every other similar format in existence for that one change.
It makes generating from templates in certain (many!) instances needlessly difficult. I say needlessly, because the rule is seemingly arbitrary. I can't see what purpose it serves.
I completely agree. My favourite software is not just functional, it also is opinionated and expresses a philosophy on how to do something. Simply adding flexibility forever in a quest to be useful for everyone ends up making it useful for no-one.
This is only perhaps correct in that a loosey-goosey proposals can spread farther because it is seemingly simple to implement (less MUST and whatever) and by the time you notice inconsistencies between implementations, the thing has reached a sort of critical mass already and the things aren’t that inconsistent so you just shrug and say whatever.
But in the case of MarkDown the original implementation was just not that great. Which has nothing to do with being easier; MacFarlane’s Djot is an easier to implement and easier to describe language.
And of course your point about “committee-driven pursuit of precision” is just a made up hypoethical which is not worth responding to. (The only committee has been on CommonMark, which is a definition of “MarkDown” (TM) which merely tries to deal with years of drift between different MarkDown implementations. With their famously long-winded spec-by-prose-enumeration style.)
Asciidoctor has a spec, reads pretty similarly to markdown, and is infinitely better IMO. And it (well, AsciiDoc) predated markdown!
I think markdown won because it was specifically made with HTML output in mind, instead of arbitrary output (docbook, in the case of AsciiDoc, which is pretty much infinitely malleable).
The Asciidoctor flavour of AsciiDoc doesn't have a specification. There is only a working group. The parsers are a mess composed of regular expressions.
There are in effect two different versions of AsciiDoc, because Asciidoctor people have appropriated the name while making their own changes to it and marking what they dislike as deprecated.
AsciiDoc cannot express all of DocBook, for example figures with multiple images.
While I despise Markdown, there isn't all that much to be a fanboy of. Just the syntax is overall saner.
Ah, DocBook //imageobjectco with something like calspair as well. I've been wanting it badly, but there's zero movement in the Asciidoctor group to try and tackle that beast.
With all due respect, and speaking as an amateur programmer, when it comes to lightweight markup, is there a better way to write a parser besides regular expressions? I suppose it's how the semantics are abstracted.
Asciidoc does get you conditionals and transclusion in the core spec, without needing to resort to extensions. This is what brought me over. That and the XML interoperability.
The Eclipse WG isn't published yet, but, in my opinion, it's a more stable surface to build on than the "many worlds" of Markdown.
Every time someone shows me a cool markdown trick, it requires me to pull something down from github and `npm-install` (or equivalent). But, well, that's kind of the point, isn't it? Markdown's ease of implementation allows a degree of glorious hackery that's just not possible otherwise. While Asciidoctor's great albatross - and its great asset - is Ruby . . which inevitably involves Opal at some point.
You are completely right. The underlying theme here is that the requirements matter.
The requirement for Markdown is to be simple and easy. It's intended for use by people who are going to ignore whatever specs and documentation there are. They'll write a little comment, a bug ticket, or a readme and they might need things like links, bold, italic, etc. And the job is to turn that into some legible HTML. So most of its features are simple and easy to remember. Just add a blank line for a new paragraph, prefix your bullets with a -. and so on.
Markdown is undeniably simple and easy to learn. Which is why it got so popular. It has edge cases but they don't really matter. It has obscure features (e.g. tables) most people don't use, so those don't matter either. And there's a wide range of things it can't do that also don't matter. The job never was being a drop in replacement for more complex tools. It was removing the need to use those for the simple use cases and be simply good enough.
The alternatives each chase requirements that are important to their creators but not to most casual users, or indeed the people that integrate markup tools. And of course the more these alternatives differ from Markdown, the harder of a sell it becomes. And the more there are, the less likely it is for any of them to become more popular than markdown. At this point, markdown is a common default in things like issue trackers, readme's on Github/Gitlab, etc. Any tool integrating some kind of markup language support in their content management is more likely to be using markdown than anything else at this point.
The reason is simply that using something else breaks the principle of the least amount of surprise for the user. Markdown is the largest common denominator. It's good enough and easy enough to deal with. So, most new things would favor using that over anything else. It's a self re-enforcing thing.
This is how populist politics works. The thing that appeals to the most people isn't necessarily the thing we should be doing.
The internet and web appealed to a small percentage of people in the early 90s, and it was glorious. You had to put in effort to get anything out, which meant most people didn't bother, which meant it was a nice place. The music industry similarly had a high level of entry. Both are filled with crap now.
Elitist old man shouting at clouds? Maybe. Doesn't mean I'm wrong though.
These things don't win on engineering merits. Markdown wasn't better than others. It was like a bunch of others. It's just natural that one form of communication becomes a monopoly because people want to be able to talk to as many people as possible.
You only need to be good enough to enter this kind of competition... and win. The reasons you might win can be many arbitrary things, like someone deciding to adopt a practice in a large organization, or dedicating efforts to writing parsers in many languages etc.
> Whereas a perfectly-spec'd markup language would probably evolve toward an unreadable-to-humans mess in the committee-driven pursuit of precision.
Maybe, and I mean that sincerely...but are you just saying this must happen or can you actually point to where MacFarlane's proposals would make a significantly less pleasant language?
I couldn’t figure out what was meant by a sublist. Like any hierarchy? Or just list-in-paragraph-in-list, not list-in-list? That one could use some HTML disambiguation in the article.
> Whereas a perfectly-spec'd markup language would probably evolve toward an unreadable-to-humans mess in the committee-driven pursuit of precision.
This proposal shows us a clear step in that direction, going from something simple and easy for humans to understand, with complex implementation, to emphasize part of a word:
fan*tas*tic
To proposing a simple implementation that's... weird for humans:
It seems like a minor concession since most uses of intra-word emphasis are more cutesy than communicative[1] (it is of course sometimes very useful when there is a subtle syllable emphasis, or a suble typo that you want to point out).
[1] Maybe I’m being a hypocrite here? I definitely am in favor of a lot of “cutesy” ways to communicate (things that are more stylistic than necessary). But not intra-word emphasis, really.
I had a look at djot, which adresses all of the author's grievances and I must say... I don't like it.
Sure, it probably is easier to parse, and maybe there are a few edge cases that it does better, but the goal of markdown is to have text that is:
A) human readable and looks good without parsing it
B) can be parsed and presented using different themes
In djot they sacrifice a lot (e.g. we now have to insert empty lines in a nested list?!) of point A for questionable gains at point B. Guess what I as a user care more about?
Markdown accepting a wide range of inputs is not a mistake, it is a feature. If that makes parsing more complex that is an acceptable side effect not a mistake.
I agree that empty line in front of a nested list is ugly. I very often make hierarchical descriptions of things like events or things to do or recipes and that kind of thing would be annoying to have to deal with. I like my lists tight.
I would have tried harder to find some other way to make the grammar simple.
I haven’t seen anything else (in addition) that makes it less “human readable” though.
I'd argue that it won the adoption it did in spite of parsing ambiguities and the lack of a spec. Not because of it. There are plenty of examples of well specified things that have gained mass adoption, so I think you are confusing cause and correlation here.
IDK, JSON? HTML and XML are markup languages also. There are obvious issues with markdown that were fixed/resolved in various markdown variants, and missing features as well that I don't think anyone could argue helped adoption. Case in point, the most commonly used markdown flavor is GFM because we all adopted GitHub and that was what they support.
It is true that Markdown won by putting simplicity for the users in front of simplicity for the parsers. But since it became ubiquitous, there's a lot of value in codifying the standard to make sure that it doesn't diverge into different dialects.
Regarding the specific author's suggestions, he explicitly writes that he doesn't propose to implement them in the actual MD "standard", since backwards compatibility is more important. That said there is value in making the markup less ambiguous while preserving the "writability" even if it's just a thought experiment.
It's not really problem with being "perfectly specced" or not, it's just matter of inertia.
If markdown just used bold _italics_ at the start, or needed a tag for HTML instead of passing it as is... it would be entirely fine and just as popular now. Or any other generally agreed upon as "good" fix.
But inertia makes things like that near-impossible to change now. Only additions can sorta work and even those are hard as critical mass of dialects needs to apply them for it to work.
Nothing messy about HTML, whatever version. It just uses SGML features from a more civilized age, such as inferring tags not explicitly present when unambiguously required by the content model grammar.
Btw a large fragment of markdown can be implemented using SGML's SHORTREF feature, as can customizations such as GitHub-flavored markdown. John Gruber's markdown language is specified as a canonical rewriting into HTML with the option of inline HTML as fallback, making SGML SHORTREF a particularly fitting implementation model since it works just the same. It's quite striking how a technique for custom syntax invented in the 70's (however imperfectly specified, though not in a worse-is-better way lol) could foresee Wiki syntaxes and also determine the most commonly used markup language (HTML) fifty years later.
Agree with the gist of your post, though. As fantastic as MacFarlane's pandoc is, the idea to re-assign redundancies in markdown (eg. interpret minute presence/omission of space chars to mean something) was bound to fail, and that was very clear to me skimming only through a few paragraphs of the CommonMark manifesto. When it was first discussed here back then, someone commented that this was bound to happen when a logician (McFarlane) approached Wiki syntax.
If the rules are too complicated, then they are a challenge for all parties, both users and implementers. I think it is useful to be able to imagine at least on some higher level what a parser would do to the stuff I write, so everyone benefits from the ease of understanding that comes with simpler rules. The question is just how far we can simplify without reducing usability.
The rest of the article frequently takes the side of the users, and mentions how confusing certain existing rules are to them. I know I frequently don't know what to expect from Markdown in certain corner cases, and felt vindicated by the author calling them out here. Some of their ideas for simplification would surprisingly even let us do things that are currently not possible.
> If the rules are too complicated, then they are a challenge for all parties, both users and implementers
Not necessarily. Generics, and/or C++ templates are a pain to parse because they're context sensitive. But while reading/writing code it's typically obvious whether I'm writing a comparison or a generic/template.
Foo<Bar> foo;
// VS
Foo < Bar;
Likewise, in C++ you can end up with:
unordered_set<tuple<int, float>> mySet;
// >> is ambiguous here without a symbol table or context around the statement
Foo >> 5;
I think both of these are fairly obvious as a user of the language, but boy am I glad I don't have to parse that!
> If the rules are too complicated, then they are a challenge for all parties, both users and implementers.
You are still confouding rules for writing with rules for parsing. It's absolutely possible and easy to make rules making writing easier but parsing harder.
For example, if you make rule that makes formatting markers like **\_ be order insensitive (so **_word**_ formats same as **_word_**), much easier for user, as they don't need to remember order of which the operators were used, harder to code (I assume)
The problem is when it's too hard for the computers, then it negatively impacts the user experience.
There are cases that are 100% ambiguous in the spec, which means there can be no _right_ answer. Different users will have different (and both reasonable) expectations about what the same input will do. So, in these cases "too hard" for the computer means leads directly to a negative user experience. The language becomes more unpredictable.
I agree that we shouldn't _ever_ lose focus on the end user experience. But sometimes, you have to make the spec less ambiguous to improve the end-user experience.
I am flamfoozled by paragraph-in-list and list-indentation regularly in status quo markdown. Maybe it’s because the syntax is a little weird for the edge cases today? Or maybe I’m just a goof who needs to go read GitHub’s parser source.
There is something to be said for, not editing your old posts, but applying a preface that references later iterations on the idea. I wish I were better about that myself.
"In this article from 2017, I talk about dinglehoppers, which have since been improved by research from these three papers [1][2][3]. Here is where I revisit this topic in 2021."
After the controversy over naming CommonMark (where @jgm et al. caught flak over originally trying to name it Standard Markdown), I'm not surprised that he picked something totally unrelated. And really, it's not Markdown at all at this point, so being more clearly differentiated from Markdown / CommonMark seems like a plus to me.
Personally, I would like to see a markdown spec that eliminates parsing ambiguity by restricting the "edge-case" features that HTML is really much better at describing in a standard and structured way.
I think we could pick one way to handle emphasis, lists, and code blocks that covers a specific and predictable 80%.
Anything that becomes hard to describe without including additional notation to the grammar is probably best suited to be left as HTML, as was the intention behind markdown to begin with.
We are implementing markdown support in Zoho Writer (https://zoho.com/writer) and I can confirm how difficult it is to handle bold and italics.
It definitely is a weird choice to use *s for both bold and italics. Parsers could be implemented much easier, if both had different delimiters as mentioned in the post.
The article reads very much like a list of problems important to an implementation author rather than a user. Except maybe the nested list thing, which does sound somewhat annoying. But also rare.
My thought is to represent complex things, use better prose or diagrams. Though José and some of his friends are warming me up to interactive tools. Livebook has some stuff I need to look at more. Currently mostly targeted at developers, of course.
I also made my own editor a long time ago, used it for personal use and on the writing site roleplay.cloud. It had lisp-like syntax, custom expansions. It also had some of these ideas, reference links, I would run code snippets with [python ...]. Normal html code would also work, like [br] instead of <br>
Asciidoc. Particularly if you 1) need XML interoperability, 2) complex print outputs, 3) complex tables, 4) transclusion (partial and otherwise) in core spec, 5) conditionals in core spec.
The AsciidocFX program is a good "starter's editor" for those unfamiliar with Asciidoc and lightweight markup in general - it includes a "boxed" DocBook-XSL pipeline as an alternative to the Ruby-based asciidoctor-pdf. For an actual production editor, Visual Studio Code with the Asciidoctor extension is very hard to beat. Github integration on top of VSC gives you some collaborative visibility, too.
On the PDF front, another interesting Asciidoc project is asciidoctor-web-pdf, which uses Paged.js and CSS to product extremely complex PDFs using web technologies (Chromium + Puppeteer, I think). That, asciidoctor-pdf (Ruby/Prawn), and DocBook-XSL are the main PDF pipelines.
Making breaking changes to markdown is about as practical as doing it to HTML -- already existing content and mindshare give the current form massive inertia.
The is especially the case when it works for the vast majority of use cases (or can be hammered into them); ambigiuities are very visible to implementers and detail-oriented folks, but most people never see these issues, or don't care about them.
And, while it sucks that it's complicated to implement, that burden is on relatively few people. See also: the HTML Priority of Constituencies.
Oh yes. I made the fun decision to write a markdown parser/contenteditable component for https://sqwok.im and ended up spending probably a month on it, largely writing endless unit tests and covering odd cases like that.
It's far from perfect and probably will still break on certain ambiguous inputs. I like his ideas for clarifying the language for the most general audience.
I have been trying to research my way out of having to write a markdown parser that doesn't allow inline html because I don't want to be a markdown author, but I categorically don't want people being able to inject things into the wiki(s) I need to create. In some languages it's a flag. In others, there's no flag.
This is like not using bind variables on your sql library. I just don't understand it. I'm looking at you, Crockford.
This is why I like the way Racket does this with the Pollen language. You can use Pollen mark up and create your own tags and then decide how they are converted. It all becomes a list of X-expressions that can be manipulated in any form you like. But the tree nature of an X-expression means you don’t get issues like *strong* word*.
For example I can write ◊bold{strong* word} and it becomes (bold “strong* word”). It’s very clear how this should be rendered.
This makes parsing and rendering easier, but writing harder. Given the widespread adoption of Markdown I suspect this project to go absolutely nowhere, since it focuses on precisely the opposite thing that makes Markdown popular.
First Term
: This is the definition of the first term.
Second Term
: This is one definition of the second term.
: This is another definition of the second term.
I would love to see * gone but I must note that _ is annoyingly hard to type on a screen keyboard.
Back in the days of USENET one common choice was using a / to delimit /emphasis/ - the usual reading was that this indicated words that would normally be rendered as /italics/. You'd often see it used to indicate the titles of books and movies, as well, since the typographical convention was typically that these were italicized as well - note that both <em> and <cite> typically render as italics, for instance. I have always disliked Markdown's choice to use * as a delimiter for both italics and bold; / always implied italics to me, and * always implied bold.
Anyway. I propose that / would be a much better delimiter for emphasis than _. On a US keyboard, it can be typed without a shift key. And on a US IOS screen keyboard, it is a simple swipe on B, versus shifting to the numeric entry page and swiping on &.