Hacker News new | past | comments | ask | show | jobs | submit login
Thoughts on Markdown (smashingmagazine.com)
147 points by ingve on Feb 21, 2022 | hide | past | favorite | 159 comments



>That’s almost 20 years ago — yikes! What started as a more writer- and reader-friendly syntax for HTML

The author appears to have some recency bias that misses the point of Markdown. It didn't start when it was named and mapped to HTML. It was in use on usenet and in emails before HTML even existed. And most of its conventions came from typewriters before that.

You could print out a Markdown document today and jump in a time machine to 1970 and hand it to someone and they'd understand it. Chances are they still will in 2070.

In the meantime we've had countless proprietary binary word processor formats that couldn't be read before their program was released and most of which can't be read today. We've had bad attempts to replicate print publishing designs and unusable Flash websites. We've had a variety of markup formats including HTML, XML, etc. which are relatively a pain to read or write. We've had lousy WYSIWYG editors that never work correctly. And we've stored the documents split into chunks in databases. And now the suggestion is to write text as JSON data structures?

All that stuff has its place, but none of it has the timelessness and clarity of plain text and markdown when it comes to documents.

It's not particularly good for structured data. Nor multimedia (though you can reference it with links). But it's still quite good at what it is good at. Has been since before HTML and will be long into the future.


The author is also a Sanity employee and, while I like the ideas behind the Sanity CMS, if you ever create a schema with an RTF field, you'll see the opposite extreme when you read the text back via GraphQL. If Markdown was created to make the source readable, Sanity exists to make content unreadable (via expansive nested JSON for even simple tags like <b>). It's pretty easy for a paragraph of text to take 200 lines of Sanity JSON.

Like Sanity, this author has expanded three paragraphs of content into 600 lines of text. I'm an avid reader but that doesn't mean I want to reread the same reworded premise over and over.


> Like Sanity, this author has expanded three paragraphs of content into 600 lines of text.

*snort* this is mischievous but funny.


I share with the author of the article the need for a standard format for presentation-independent structured multimedia content, and that html+js+css is now too complex for this purpose.

But I think that such a format should keep rich text separate from structure, in the same way that CSS keeps style separate from content. Content structure in a web environment is inherently complex, so there should be a standardized way to link markdown-style simple "pages" that have linear or shallow tree structures, to build the superstructure of a complete web site.

In my opinion that lack of a specific format for structure is what is limiting the growth of common tools for rich content. Wikis are the only example I know of an attempt to create such a format, but it works by having basically no structure, just uniquely named pages and hyperlinks between them.


I was astonished at the length of that article as I kept reading and thinking the same, ‘isn’t this what he just said before???’ I’m happy to read long-form but I feel that piece would have benefited greatly from substantial editing


Sorry about that. You won't believe how long the first verison was ;)


It could also use some editing and running through spell check, like that comment.


Exactly, Markdown is a great example of the "pave the desire path" philosophy.

https://en.wikipedia.org/wiki/Desire_path


> It didn't start when it was named and mapped to HTML. It was in use on usenet and in emails before HTML even existed. And most of its conventions came from typewriters before that.

It used established syntax, but it didn't start there. There are also other markup-languages, which use other established syntax from the old times. A language is not just some random parts.

> You could print out a Markdown document today and jump in a time machine to 1970 and hand it to someone and they'd understand it.

Because it's human-friendly by design, not because it's syntax already existed back then. There are limits to what people would understand in 1970 from the syntax.


> There are also other markup-languages, which use other established syntax from the old times.

Would you care to share some examples? I'm genuinely interested, while nothing obvious that would fit this description comes to my mind.


AsciiDoc, reStructuredText, textile, org-mode, they all have to some degree such syntax. Or just read old usenet-archives, FAQs and such. It's natural to do it if your aim is to be lightweight and readable. You just happen to use the elements you have at hand. And people did that, wrote all kind of texts and documents in ascii with made-up syntax on the spot, for usenet, e-mail, ftp-servers, etc. It's hard to say that something started here or there, if the path to it is so short.


Ah, I thought we're talking typewriter-old times. Yes, sure, I do remember the proliferation of weird wiki syntaxes, and I also remember seeing markdown as a breath of fresh air in its readability and intuitiveness. I do believe the same as the poster before me, that markdown is the closest among them to what could be seen either in typewriting, or on usenet. Asciidoc and rst have some of it, but break the illusion by introducing too much weirdness (see esp. the syntax for links and lists in both of them). Until proven otherwise, I can't believe those happening on usenet and not being ridiculed.


I would argue that rst is much closer to how things were done with typewriters. One example: I can not think of one document written on a typewriter that indicated a heading with #, but just a brief search for typewriter document images will get you ones with "--" underlines as headings. For me that is actually still the biggest difference between rst and Markdown, and I still think underlines look much more like headings, but I also find # much easier to write


Remember that Markdown supports both SETEXT (# header, ## header, ### header) and ATX (header\n======, header\n------) headers. I prefer SETEXT headers for writing since you can use all the supported hX levels, but if all you need is h1 and h2, then ATX headers are usable.


More generally, markdown is but one (if the most successful) of the “lightweight markup languages” (https://en.m.wikipedia.org/wiki/Lightweight_markup_language).

Most of them have fallen into disuse following the ascendency of markdown, but anyone who was on the internet in the mid-aughts will remember textile and creole, they were major contenders in the CMS and wiki space respectively (IIRC).


DocBook…


Readable is not a proposition, its a spectrum.

The beauty of JSON is that you can convert it to a more readable state. For example, an unjustify would force the correct enters, already improving readability (I still find it unreadable, but OK). Converting it to YAML would, in my opinion, increase the readability even further, but YMMV.

With the binary word processors, we had a vendor lock-in. Never again! If we got open standards, we can convert to/from another document format, including the Markdown language of your choice.


Readable to me is that you don't need to convert it to another format to be able to read or make edits. A textbox input on a webpage is sufficient for me to read and and write markdown.

And when you read my comment, you don't even have to know that it's markdown to be able to read it.

The moment you add a structure to it, you need to worry about having compatible parsers or it's going to forget to pull out some of your paragraphs and the like, and if there's a missing curly brace, your parser will throw an error. You'll still be looking at the obfuscated source data

Html is already the open standard, but people like markdown instead


Well, I have a somewhat fresh take on it, as complex algebra (and therefore mathematics and programming) yields me migraine-like symptoms. Sadly, if I might add. So I am def. in game for the most simple markup language (JSON isn't it, I know that much, its was convenient to pick up hype because people who work with web design already knew JavaScript which was and is popular among web designers or frontend devs as its called nowadays).

In the end, its all structured. Even a letter is structured, but just with language and courtesy rules (grammar, for example). There's different levels of quality and complexity in that field, too. There's only so much you can do with ASCII characters to make text more structured. You quickly descent into the world of adding programming syntax to it (one asterisk for bullet point or two asterisks for bold are already). You can keep it very simple, you still gotta know the markup rules, as simple as they are.

In gaming, there's an adagio: easy to learn, hard to master. I believe a good language holds true to that. Mathematics certainly does (with exceptions).

With regards to JSON though, when you unjustify it, and you use syntax highlighting, is it really that bad? I find the only annoying thing is when you add an entry, you have to ensure the comma is correctly there (or not), but syntax highlighting and habit helps there.


The author here.

Yeah, I'm well aware that markdown didn't happen in a vacuum and was inspired by plain text formatting conventions. But there is only that much you can cover in an already too long blog post. It wouldn't have changed my argument much either. It wouldn't be historically accurate to say that Markdown existed before Gruber/Swartz, like it wouldn't be accurate to say that HTML existed before Tim-Berners Lee, even though SGML and such was a thing.

I'm not arguing for a more proprietary binary word processor format either. Or that we should type JSON. It's actually possible to have decent WYSIWYG editors that does their job. And they won't get better if don't put our minds to them.

The point you're missing in this reply is the legitime user-unfriendly experience of markdown and the challenges it poses for developers who are trying to solve things for their teams and customers today.


I'll admit that a lot of users don't like it. And maybe someday there will be a WYSIWYG editor that lives up to the name. But after about 30 years of dealing with ones that don't, I might be a little jaded. Plain text conventions like Markdown may not be fancy, but they do at least work.


Huh? Markdown was created in 2004: https://en.m.wikipedia.org/wiki/Markdown

The tools you invoke, which predate Markdown considerably, and mentioning 1970s don’t make sense in that light.


Anecdotal, but I was using many of the same formatting conventions on early IM and message board platforms in the late 90s/early 00s, even when they didn't actually do any formatting in the program. That's long before the introduction of Markdown. Markdown was an attempt to take all of those community-accepted formatting conventions and standardize it into a single format. If I still had access to any of my messages from those plaintext messaging days and fed them through a Markdown renderer, they would look remarkably similar to what I had meant to say at the time.

Anecdotal to the second degree: my mom saw me typing in the format at the time and remarked that she had learned to use similar formatting when she learned the typewriter in the 70s (coincidentally, she taught me to type on that very same typewriter).

Is it identical to what it was in the 70s? I don't know for sure, but I doubt it. Is it similar enough that someone versed in Markdown can read the 70s typewriter formats and vice versa? Most likely.


Those ways of styling were firmly enough established that a lot of people thought Markdown was violating them by having asterisks render as italic instead of bold. Italic was represented with slashes!


Emacs' Org mode gets this right.


You write:

> Huh? Markdown was created in 2004 …

Did you read OP?

> > It didn't start when it was named and mapped to HTML.

Markdown is a formalisation of conventions which were in common in email & news long before HTML even existed.


Yes, but the point of Markdown is mapping those conventions to HTML. The mapping was the new thing that made it Markdown.


I think the point may have been more that conventions like asterisks or dashes for bulleted lists, angle brackets for quoting, etc. were already heavily used conventions predating markdown, the specification.


> We've had a variety of markup formats including HTML, XML, etc. which are relatively a pain to read or write.

Right, but Markdown relies on HTML (or whatever other markup it is pre-procesing) for anything difficult.

I like Markdown[0] but surely our job as technologists is to find better solutions than a system that requires you to escape asterisks in certain situations, or rely on a preview pane to work out what it will do.

[0] but YAML front matter should burn in the fire


I would argue that it may be a case of "less is more". Yes Markdown is limited, but the limits make the language trivial to learn, modify and understand. It also means that the format can be trivially displayed almost anywhere and converted to all sorts of other formats.

There will always be a tradeoff between simplicity and the feature set, I find that for the vast majority of my use cases Markdown is simply good enough.

When I read markdown docs on github I usually get a pleasant sense of familiarity, everybody tends to gravitate towards the same basic layout because that's what Markdown does. That's a feature IMO, it means that you can focus on the content, not waste time parsing the quirky layout.

Monthly reminder that Gopher should've won btw.


Check out Gemini, its format is even more limited than Markdown, and it's TLS-native unlike Gopher.

Gopher might have won if students had ~/public_gopher directories, but it was not to be...


Gopher would not and could not have won, for two very good reasons:

(Would not) -- it was more complex to implement, being stateful (and it lacked the markup flexibility of HTML, which was kind of a big draw)

(Could not) -- the University of Minnesota announced their intention to collect licensing fees on their Gopher server (and client, I think), which was invented there.

This was a key moment in the web's development, because as a response CERN disclaimed ownership of web technologies -- under some pressure from IBM, as I recall, and IBM threw their research weight behind HTML/HTTP[0]

And that was that, basically. All happened in a few weeks in 1993.

(I can't find you a link to a discussion of IBM's action, but I can describe in detail the flashback to a freezing-cold, excesively-air-conditioned Sun SLC workstation lab at the University of Reading where I first read about it)

People crushing on Gopher these days do make me smile. Sure, it's like vinyl records.

[0] though this is hazy, I think they also made a promise not to assert relevant patents they held


Thanks a lot for expanding. I had no idea IBM was present at the creation, like some spirit of evil ;)

Netscape Navigator handling images was also a Big Deal(tm). Of course, there could have been a graphical (X) gopher client that showed images, but just like the people who like gopher and gemini now dislike images, I suspect it was the same then (think of the bandwidth!)

It's hard to convey how intensely cool it felt to be part of the early web, and images, differently sized and colored fonts were a big part of it.


> Thanks a lot for expanding. I had no idea IBM was present at the creation, like some spirit of evil ;)

It's not so much at the creation -- it's several years in (four years in for HTML/HTTP). Though it is about when I became aware of it.

What I recall of the time is that the web was just becoming a point of research interest, and big firms had product R+D interest.

Like I say, I can't find anything to prove my recollection now -- so it could well be wholly unreliable! -- but my specific recollection from damn year 29 years ago is that IBM's response (and likely the response of others) to the Gopher server licence situation was to seek clarification from CERN over their intentions with the web. IBM also had patents (concerning hypermedia) that were relevant but were never asserted.

It would always have been Tim Berners-Lee's intention to not assert patents but CERN obviously was his employer.

> It's hard to convey how intensely cool it felt

Were you in that same workstation lab in Reading, then? Seriously, you had to wear a coat and scarf if you were in one corner...


No, it was a lab in Stockolm filled with Sparcstations…


cool = hip


Why would we think there is a better solution? There's only so many symbols on the keyboard. There's only so many symbols in ASCII. That's how you end up having to escape things.

An asterisk literal usually belongs inside single or triple backticks IMO, and you won't have to escape it there.


Right, but then you have to explain to end-users why they need to escape an asterisk, or why they have to have their content on screen twice -- once in source and once in frustratingly flickering preview.

Personally I think it is our job as web developers not to encourage this for end users.

(Unless those end users are writing a book, in which case, definitely. Markdown as a source for Pandoc is fabulously better than a wordprocessor)


Users can avoid having to escape the asterisk by using Microsoft Word.

It seems to me that you expect some perfect solution to exist, and it is this solution that technologists should make it their mission to find.

I think there is no perfect solution and we should navigate the trade-offs holistically, rather than focusing on a single imperfection like suboptimal asterisks.


> It seems to me that you expect some perfect solution to exist

First, I don't mean in the Microsoft Word scenario, really (where I think that for very many applications, especially most books and most documentation, it's a very good tool).

I mean on the web. If you're putting markdown in front of end users on the web, or even content contributors on the web, you're failing.

Second, you're putting words into my mouth when you talk about a perfect solution because I said no such thing. I clearly said better solution. The absence of a perfect solution does not make Markdown equally as good as other possibilities.

(For example, I believe that if you had to deploy a text-only editor to web-based end-users, Asciidoc is a more appropriate solution, even if I also accept it has lost the war)


I write academic articles and like to use markdown for that, but you soon run into limitations that require non-standard tools. Precisely formatted tables… citations… line numbering… page numbering… inline maths… image placement… none are part of basic markdown. Outputting nice text is a big complex job, as the existence of TeX (ugh) shows.


True, different tools for different jobs.

That brings back repressed memories of the painful experience when a client came in with a Quark file (desktop publishing) that they wanted converted into a website. Getting the data out was difficult enough, actually formatting it and trying to replicate the layouts and styles and all was a nightmare. (This was back in the IE 5/6 days, btw - none of the modern CSS niceties.)

Just because you can do something (in desktop publishing software or otherwise) doesn't mean you should, and certainly doesn't mean a document optimized for print is good for the web.


what's worse: (a) a preview pane, or (b) a bloated editor (msWord) or mysterious web format (google docs)

also, there's many WYSIWYG editors that do an awesome job (Typora, Zettlr, etc)


I really like Typora as it goes, I've used it myself, and I've begun to recommend it to people who are writing long-form documentation or books (because I like pandoc a lot).

I have not used Zettlr but I will look at it; thanks for the recommendation.

But I still think Markdown has little place in content management systems for end-users; it's obtuse.


fair point; outside of it's main use case (writing documents), we're better offer using more sophisticated tools


I've said it so many times my throat is sore but here goes again...

As with literally everything, it depends. Markdown is lovely for tech documentation, notes and journaling. It's simple and beautiful and mostly independent of layout.

Want to author something rich and expressive which has a modern web audience in front of it (either end user or editor) and it's probably not the right tool.

I work with large numbers of - let me call them "creative editors and authors". They want to embed video, pullquotes, tables, interactives. They want to float images, include audio and much more.

Markdown is not for them. Something block based and visual, namely Gutenberg or similar - is.

As ever (voice grows increasingly hoarse...) it's dependent on needs.


I see this drive often: find the one true universal format, that can both provide documentation, while also providing an interactive 3D experience for a racing game, equipped with an internal web browser that also lets you listen to the radio

I think we just have to be okay with having to learn multiple formats in our lifetime; I don't see a way around it


> Markdown is lovely for tech documentation

That’s one of the weakest points of Markdown, because it has no support for any sort of cross-referencing so they have to be maintained by hand or through bespoke extension (often conflicting).


MySt and sphinx are getting attentions these days. As a ascii doc lurker, and fed up of rst, I'm very hopeful.


Sphinx & rst combo is great in theory. in practice, it has a big learning curve and the python tooling is not the fastest option. I had tried it once in my previous job, and people do not want to learn it, or spend time with doc related stuff

That makes me appreciate markdown a lot, As it is easier to pick up. And has fantastic tooling. And there is at least one md to html project, which will be in your language of choice


Yeah rST is theoretically nice but being both permissive and extremely whitespace-sensitive makes it very fiddly in practice, with inscrutable warnings and many mis-compiled documents with no warnings or information.

Some of the markup choices were really less than ideal in retrospect, especially indentation for quoting and definition lists (though it’s nice to have definition lists), even more so given the aforementioned whitespace sensitivity.

Really hard to beat as a thing to build around, but that also greatly limits the ability to process or reimplement it.


Hence MyST.


Yes, and:

For personal notes, I find Gemtext even better than Markdown, because it's even simpler.

=> https://web.archive.org/web/20220124235538/https://gemini.ci... Gemtext

I spend less time trying to remember syntax or decide on which syntax option to use, because there is so little syntax. It feels like writing plain text with just a handful of conventions.


I have never understood why Markdown has gotten so much attention over the last decade while ReStructured Text - which is a similar format but actually standardized and more developed than Markdown - gets so little.

Tools like DocUtils and Sphinx allow one to do quite a lot with ReStructured Text, and I believe Pandoc can convert it to Markdown without any issues.

If you aren't familiar with it, a good hard look at ReStructured Text may be worth your time.


I have never used rest, took a look and feel that manual underlines for sections are very clunky. And tables, one of the parts that I think markdown isn’t doing well, rest isn’t really better


Maybe it has something to do with emergence and domination of plateforms like github? Just wondering…


The discussion reminds me a little of the one around LaTeX in the scientific community. Latex is extremely powerful but reading the raw source of a text is a pain. WISYWIG is just so much more comfortable when writing, but MS Word sucks in its own specific and manifold ways (as does LibreOffice).

HTML is still not nice to read, too much boilerplate tags required. Markdown source however is mostly well-readable in a plain-text editor, and offers the basic formatting that serves 95% of the needs.

But the truth is: the ideal text layout platform for scientific writing does not exist.

Markdown+HTML+CSS could be getting close, but it currently still lacks LaTeX's tooling for citations, support for pagination.

MS Word's "track changes" is extremely useful when collaboratively editing (and no, Overleaf's version doesn't come close, unfortunately). But references and citations in Word still are no match for the flexibility of bibtex, even with Papers/Zotero/Mendeley. Figure placing can be a nightmare, although LaTeX is not infallible, either. And you need a Mac or Windows to use Word, and on a Mac it feels like it's running in an emulator.

I have given up on the belief that one tool could solve all this. There are a few good attempts like overleaf and the less-known authorea (which I liked more), but these are cloud apps which complicates writing offline.

The only solution is to choose whatever tool works for the specific use case, learn to live with its shortcomings, and never forget that in the end only the finished manuscript counts.


The problem is not writing or reading LaTeX, properly written LaTeX is very readable. It's easy to maintain and understand, and Bibtex for bibliographies works fine once you've gotten used to it.

The problems start once you have to convert this to MS Word, which is essentially impossible with any advanced LaTeX document which will invariably contain complex formulas and also include hacks and adjustments in the preamble. I've done it several times and it basically requires a rewrite. All of the existing tools fail. In fact, in a book I've once edited and made camera-ready in LaTeX, the LaTeX contributions were harder to integrate into the book than the Word files!

Recently I had to deliver a book I've written camera-ready, doing all the typesetting myself, because a large prestigious publisher I had a contract with turned out to be unable to deal with LaTeX. In a sense the problem really is MS Word, it's still the standard and I was astonished to learn about the publishing world in my postdoc time how surprisingly many publishers only deal with MS Word files and do not even use any special typesetting software - many use Word all the way down to creating the final PDF for printing! Looks horrible but I know of several major academic publishers who do it that way.


You're right: LaTeX is very readable. I've never understood the complaints about the system. What's so hard about writing `\section{Introduction}` or `$R \int_0^1 f dx$`?

After a bit of learning on your first document (perhaps requiring half the time of one of the first of dozens of edits you'll be making, if the document is important), what you get is (a) ease of transition between formats (article in journal becomes chapter in thesis in about 3 minutes work) and (b) beautiful output, not just in mathematics but in support for many language characters, hyphenation hints, etc.

On publishers, I wrote a book for Springer-Nature, and they wanted LaTeX. I wouldn't have published with them, if not.


The biggest misunderstandings about LaTeX stems from comparing it with Microsoft Word or to any other word processor. LaTeX is a typesetter. You give the text, the template and the page size and it typesets it all. You nudge it with hints, and that's all. So, LaTeX is content first and layout second in a sense.

On the other hand, word processors work at absolute terms, layout first. Looking to LaTeX from this perspective distorts the vision a lot, and people get confused.

When one understands the idea of "LaTeX gets the content and fits to the constraints at hand", the rest is liberating.


> "LaTeX gets the content and fits to the constraints at hand"

I'll believe this when scientific papers start appearing in the form of reflowable HTML.


Most journals I read offer both reflowable HTML and PDF. I don't know anybody who prefers the former.

A PDF (and the paper copy it generates) is more convenient for markup, and for memory. I can look at papers I read decades ago, and know where to go to find things, because of what I might call positional memory. Somehow, my brain has information such as "The key Figure is at top of third page" or "that equation I think is wrong is at the bottom of the second-last page". I'm not alone in this. I suppose it's just how brains work (e.g. people who do memory tricks "store" the information in an imagined space).

When I look at reflowed text, I just get lost. I can't make notes that "stick" with the text if I enlarge the font. And memories don't form in the same way as for PDF/paper.

I suppose this might be field-dependent. I think in some fields the key point of a paper is a single sentence, which could be identified easily in reflowing text and then copied into a separate file. I don't read papers like that, though. That's why, in my line of work, PDF/paper is superior to reflowable text.


Flowed text doesn't mean that it won't be printed. It means that it could be printed the way you want it, not the way the author wanted.

This comes in handy when the author decided that 50% of the surface of paper should remain blank, or using a tiny font. They may find it helpful, others not. Same goes for the decision to have pages at all.

The only aspect where I see in which someone else making an unappealable decision for you is superior, is convenience. Conditional on that other person being an expert in the field [of printing].


> I'll believe this when scientific papers start appearing in the form of reflowable HTML.

YES!

It's 2022 and in spite of the often-praised superiority of LaTeX, we are still getting paginated PDFs on ArXiv, instead of responsive text that reflows on different screen sizes. PDFs were great to read when I was still printing papers. For reading on screen they are quite inconvenient.

Many publishers have come around in the last decade and offer HTML versions of full papers (after you get through the paywall).


> It's 2022 and in spite of the often-praised superiority of LaTeX, we are still getting paginated PDFs on ArXiv, instead of responsive text that reflows on different screen sizes.

It's not about LaTeX per se. You can typeset EPUBs with it too, if you want. So reflow ability is not a shortcoming of LaTeX.

Publishers send you a Word or LaTeX template, and require your manuscript as a PDF, conforming to that template. If it's LaTeX, generally there's a switch to render the text in "Review mode" with line numbers and similar additional details, and omitting author data if the judgement will be done blindly.

So, the tool itself, regardless of its brand is the proverbial wrong tree to fight with. Also, it's a reality that reading a 25 page manuscript is not very efficient on a screen, at least in my discipline. I have a dedicated "Paper printer" at home for printing such manuscripts and taking a stab at it with pens and highlighters.


> You can typeset EPUBs with it too, if you want.

In theory, yes. In practice, this only works when the document is using a restricted set of packages. I tried.

I'm fighting not so much with LaTeX, rather with ArXiv. They have the sources of LaTeX submissions, but do not provide other formats than PDF by default. Judging from my own experience in converting LaTeX to HTML, epub, etc., I assume these conversions simply create too much headache.

> reading a 25 page manuscript is not very efficient on a screen, at least in my discipline

The vast majority of papers I "read" I never read back-to-back, rather than Abstract - Discussion - Results - Introduction - Methods, bailing out at any of those points when I have the information I was looking for. I couldn't possibly keep up with the literature when reading every interesting paper completely.

Those that are really relevant to my research I will read in detail, and at that point a PDF is useful, I agree. But before that, I prefer reflown text & figures.

There was some innovation in that regard, e.g. eLife's Lens, that was specifically designed to support reading on a (large) screen. IEEE also has a good system that works on phones (well, almost; so UI elements clutter the text).


I don't think that'll happen anytime soon, because scientific papers are one of the rare species of publications, which are primarily made to be printed, read and mangled with pens.

Even the templates from the publishers came with paper sizes and other crop marks coded inside them.

When publications start accepting reflow able formats and provide relevant templates, producing them is easy with LaTeX.


fwiw, I see a lot of papers on IEEE Xplore that are presented as reflowable HTML on the page with the option for a PDF, presumably generated from the same source.


I genuinely can’t tell if you’re being sarcastic.


The hard part isn’t \section{Introduction}, but horrible hacks like \makeatletter you need to do something non-trivial.


> On publishers, I wrote a book for Springer-Nature, and they wanted LaTeX. I wouldn't have published with them, if not.

Do you know if they used LaTeX for the layout internally, or if they used some proprietary system that was fed the LaTeX source?


I think they used LaTeX internally, but I've no way to be sure. They provide latex style sheets and sample files, and so those were my starting point. Of course, they are a big organization, and latex is a simple system, so they might have a program that translates it to something else, e.g. using latex to create PDFs and something else for other formats. But at least at the galley-approval stage, it was still LaTeX.


> LaTeX is very readable. I've never understood the complaints about the system.

Here’s how you put an image on TeX:

\begin{figure} \caption{A picture of a tucan.} \begin{center} \includegraphics{tucan.eps} \end{center} \end{figure}


Using Markdown with pandoc and pandoc-crossref is basically perfect as a substitute of LaTeX. It allows in-line and standalone LaTeX, and citations with crossref.


Pandoc completely solves any of the issues mentioned about markdown for scientific writing.

it works flawlessly, and it unarguably way cleaner than writing in plain latex

How can anyone

- tell me

- that this

- list

Is harder to read:

\begin{itemize}

  \item than this

  \item unintuitive

  \item mess
\end{itemize}


With the former, can you items of many paragraphs and other complex stuffs one sometimes need in a complex/advanced document? So you can understand why LaTeX had to go the less naive way MarkDown did.


Of course the LaTeX syntax is more capable, but for the simple and common case, it's extremely fussy. The simple case should be effortles. With markdown it is, and in Pandoc you can easily drop to more complicated syntax should you need any of its capabilities.


Are there any tutorials you could recommend please?


The official manual is pretty complete: https://pandoc.org/MANUAL.html.


In my experience this has zero advantage over LaTeX since converting your markdown to formats other than LaTeX will be just as hard as converting from a LaTeX source. It's even harder because you have to deal with an eclectic mix of LaTeX and markdown, and various errors and restrictions in pandoc.


I use the system to write documents that I convert to PDF (through LaTeX), Word, plain text, and HTML, with equations, tables, and internal and external links preserved (citations in plain text turn into footnotes). It works brilliantly, and this would be impossible when starting with LaTeX source. You can embed the target formats directly into the markdown when needed, and have alternative versions for different targets. I do this for things like tables, where I want to directly format them for LaTeX and HTML, rather than rely on Pandoc’s automatic translation. You can write filters¹ that extend Pandoc/markdown to do any processing that you want, so you can create your own syntax with custom translations to any target format.

[1] https://lee-phillips.org/panflute-gnuplot/ [This is sort of obsolete now, as you should probably write filters with Lua. But it’s still a good illustration of the possibilities.]


The problems in converting LaTeX usually come from third-party packages. When only a restricted set of packages is allowed then it is quite straightforward. This is the reason many publishers require latex, even if they don't use LaTeX internally (which is often the case).

They convert LaTeX to the XML dialect that their in-house layout system understands.


yes, these are all fraught. Any X-to-Y format exchange is a brew/apt miracle in my experience. It's not necessarily important for a LaTeX document to be render-able in all media, but the maths disciplines, for example, should be asking themselves how they want to be able to search for things including formula, algorithms etc, and finding ways to guarantee that a maxima of media support search/find/read operations on their articles. It's not just about how easy is it to input? but it's very similar to "how easy is it to get back out?" it has been a bit of a mismatch in maths re formulae expressions, and how those are represented visually vs. textually.


> Latex is extremely powerful but reading the raw source of a text is a pain. WISYWIG is just so much more comfortable when writing, but MS Word sucks

The Lyx editor solves both these problems for me. I'd probably be using LaTeX a lot less if I hadn't found Lyx (that quote about fingernail clippings in oatmeal doesn't apply to Lisp IMO, but certainly does to LaTeX).


I used Lyx back in university to transcribe the math lectures. It was good enough that the professor later asked me if I can send him the PDFs and source files ;)


Lyx is great for that exact purpose! I used it too as an undergrad. But it doesn't offer the full flexibility and toolset available in LaTeX, which limits its use for more complex documents.


> Markdown+HTML+CSS could be getting close, but it currently still lacks LaTeX's tooling for citations, support for pagination.

I don’t think the citation tooling is great, but for pagination even free tools work well with CSS Paged Media [0] [1]

[0]: https://www.w3.org/TR/css-page-3/

[1]: https://www.print-css.rocks/


> Latex is extremely powerful but reading the raw source of a text is a pain.

For the kind of text that you'd use Markdown for, i.e. text that is mostly just text and not math heavy, LaTeX should be almost as readable as Markdown. For example,

    \section{Introduction}
is almost as readable as

    ## Introduction


Yes, almost ;)


Plus, you can have nice stuffs like:

    \section[short title in toc]{very very long title at page}
and stared version. Those two things, Markdown cannot do, show the power of that syntax ovec simple bangs starting the line.


Takes a bit more typing though unless you some macros dedicated to it.

Not hard to do but Markdown is easy to write anywhere.


I find this article to be sort of ridiculous. Sure, some of the critiques about the various flavors of Markdown might be valid (but I would posit that for the non-technical users the author claims to be so concerned about, the chances of them regularly having to interact with more than one flavor (and no, Slack and Reddit don’t count), is extremely low), but the whole piece seems to be an advertisement for the author's company's JSON-based rich text specification, which couldn’t be further away from the point of Markdown if it tried.

Markdown is for writers. The fact that it has been adapted and forked and flavored into something that many developers like is wonderful. But it is fundamentally for writers. And I’ve taught dozens, if not hundredS, of “normies” to use Markdown when writing for the web over the last 15 years. Some of them have hated it — which is totally fine - but many more have appreciated being able to write readable content for the web without having to do it in the CMS, which is always, always, always, a flat-out terrible idea. Or the other alternative, which is write in Google Docs or Word or Notion or whatever and then paste it into the CMS for the hell that will ensue.

There is a certain irony in the author claiming to want to make a better authoring system for users, but only in a way that adds complexity, requires developers to build UIs around its spec, and has benefits that really only appeal to the people building the CMS or trying to sell people on why their version of TinyMCE is best (oh, but you have to build the TinyMCE UI yourself…this is just another overly-complicated JSON spec that really is provably only suited by whatever it is Sanity is trying to hawk).

Markdown isn’t perfect. And it doesn’t claim to be. But in terms of being a readable and easily usable way to write content for the web, it’s pretty damn great.


The author here!

I'm kinda bummbed out that the post read as an ad. I was super nervous to publish this because it I knew I was poking at something that's near and dear to a lot of devs (myself included). But I have experienced enough friction with Markdown in the real world and wanted to explore poking what I see as the status quo. And it seems to have sparked some conversation, which was what I wanted. I'm truly not trying to “hawk” something, but I believe the thinking that went into Portable Text can provide value and I'm sort of putting it out there for people to make up their own minds.

To your point: I agree, Markdown is for writers, but only for some writers. Maybe I didn't get that across, but it is kinda amazing that it has become so ubiquotos in all its shapes and colors.

But it comes with these actual constrains and challenges that I feel a lot of developers aren't really being upfront or honest about. The simplicity comes with trade-offs that just adds complexity elsewhere. Either for people who don't desire to learn specialized syntax or developers who have to spend a lot of time figuring out how to parse it in a sensible manner to get their job done.


Thanks for taking the time to engage. And I want to be clear, I think Portable Text could be something really excellent for it’s intended use case. I guess I just see that as fundamentally at odds with what Markdown is. I understand your point that Markdown is often used in place it probably shouldn’t be, but I still think we’re talking about two different problems. For me, it isn’t that something like Portable Text doesn’t have a place, it’s that I don’t think it is best compared to something like Markdown.

I agree we should do a better job of showing what options are best for certain scenarios, but I feel like someone making a decision to adopt a really customized version of Markdown needs a different sort of intervention/push to a more desirable format (maybe Portable Text) than the people that actively choose Markdown BECAUSE it is a readable syntax for crafting HTML and their goal is to write HTML in a readable way.


As a developer that really likes to document stuff I don't like markdown. It's limited and as the article states too fragmented. I started using AsciiDoc a while back. It's like markdown but with more advanced options for true documentation stuff like charts, tables (formatted!), diagrams and more.


I share the same opinion. Its limit make it useful only for a very small set of use cases.

It's being used a lot for code documentation, but MD is bad fit for documentation. Good documentation requires a lot more formatting and cross-referencing than you'd think. Every language I'm using that has been using MD (for embedded and online docs) added extensions, resulting in more fragmentation.

So in general I switch between 3-4 differently-flavoured markdowns constantly. The extended syntax used by various projects sucks. Where did the simplicity of MD go? It reads poorly when unformatted, which is how I read it 99.9% of the time.

I gravitate more towards RST as it's a bit more widely used, but also agree about AsciiDoc.


I agree with both these points. Inter-document references are bad in markdown and wiki style formats. Wikipedia is replete with overlapping and contradictory information that should be pulled from a single source.

Do you have any ideas re link-rot, or underlying assets changing and semantic links becoming invalid? How do you reference a code asset whose name could change, what will not only replace all references in the code but also in the documentation?


Cross-reference checking is a huge problem on it's own, almost unrelated to the formatting being used. Some language tooling do provide cross-ref checking, generally as an extra step part of the documentation generation process. I don't have anything to add to the table compared to what we do already...


For the "more advanced options for true documentation stuff like charts, tables (formatted!), diagrams and more." you might like Obisidian.

But that steers you back towards markdown. How deeply do you not like markdown?


I like Markdown, but I don’t like proprietary Electron apps.


I don’t want an “authoring experience.” I want a simple, durable text format that will still be around and useful in 10 years.

I like that markdown is cheap, free, and independent of any platform or language or tool.

When coupled with git, the editorial experience is fantastic through diff and merges.

It’s not for everyone and author can use what ve likes. But the author proposes something that in no way is as useful as markdown.


We have a binary and universal format for plain text (utf8) that is extended all the time with additional emojis. But it seems almost unimaginable that we ever get a binary text format that also allows simple semantic markup, like emphasis, tables, hyperlinks, amounts, dates, and phone numbers.

It's 2022, and really, we should be able to copy a paragraph from an email with a table in it into a chat app and have it just work. HTML is way too complex to use as data communication format between apps (vulnerabilities; every app allows a different subset). Markdown is made for humans to read/write, when what we need is a simple unambiguous binary format that is easy to parse.

We can have compound emojis where a polar bear is a bear + a snowflake joined by a zero-width joiner. But a datetime or number in text is too much to ask?


Well, that really seems to be a very hard problem to me.

Tables for instance get quite complex pretty fast -- joined cells, text alignment, etc.

As for a format which -- not plain text, though -- that handles this is rich text. It's been around for quite some time.

IMHO the problem is in the difference between presentation and data. One group of consumers want the "data" and parse it, the other group of consumers want to basically have it "look the same in this app as in the other app, and don't make my think."


Tables can be arbitrarily complex, just like html, so yes a line has to be drawn somewhere. Spreadsheets and slide presentations can be copy-pasted as file attachments. That's fine.

You can't have a table look the same in every app, nor is that desirable (you'd have to embed fonts and re-create postscript). What you can have is the ability to copy-paste a table (or a bulleted list) back and forth between apps without mangling the state in the process. Rich Text intertwines semantics and style and it's not a good solution.

Most of the time you don't need the layout precision of postscript. There is a huge difference between not having any kind of tables or semantic context for text and having some that cover the 95% of daily use.


One feature request from business that caught me by surprise when recently implementing an editing page was that it should discard all font styling when pasting from Word.

That is the #1 feature missing from all rich text editors: ability to restrict allowed formatting. No user specified fonts, no user specified font-size (all headings must be paragraph style), all lists must use standard style, etc...

This is where Markdown/CommonMark shines: no hidden formatting that is difficult to get rid of.

The real problem with Markdown is it lackluster support for tables and support for site-specific macros.


It also doesn't respect single line breaks and is rather limited/opinionated about lists and indentation in general.


While I've been a "geek" since I joined IBM in 1967, I've been primarily a writer for at least 35 years. In that time, I've used Wordstar, WordPerfect, literally every flavour of MS Word up to 2011, vi, iTerm and emacs.

I've converted files of one format to another to another to another, and finally to ASCII text and that is that. No more orphaning as one or another company goes out of business or decides to update their file format.

My editor now is Emacs and, depending on who/what I'm writing for I use plain text, or org or md if I need to include basic formatting. I've published enough to know that I'm a wordsmith, not an artist and going beyond basic formatting is a job for a typographic artist, which I'm not.

I know that makes me a grumpy old fart, as does my refusal to rely on The Cloud (i.e., Somebody Else's Computer) for backups, and so be it. Fool me once, shame on you, fool me twice (or more), shame on me.



I'm wondering that the article doesn't mention AsciiDoc with a single word. AsciiDoc is a perfect tool for complex text documents with tables, references, side notes etc. It is almost as easy as Markdown (if you compare only the features included in both), but offers way more possibilities. There are also good tools to convert it to different output formats like HTML and PDF.

A good starting point is: https://asciidoctor.org/


Isn't AsciiDoc even presentation independent?


Yes, it is. AsciiDoc was created to have a more usable syntax for the DocBook XML standard, which is about semantic markup for tech books (originally from O'Reilly).


I think author misses the point completely.

Markdown is for writing not for editing. If you want to focus on writing, text, and content instead of on bells and whistles with formatting and typesetting it's really hard to go easier than Markdown.

Sure, Markdown is overused. Probably one shouldn't use it for heavily linked content, where one has to show examples, cross-link, check references, verify the sources etc. and on top of it follow very strict line/page breaking and paragraph rules. But it's not a huge issue, because if the end result is plain text itself one can easily work with pipeline. Text can be easily split, mixed and matched. One can figure out their own content and script it easily assuming they are wanting and tech savvy enough.

Working with text first and then transforming it to LaTeX once the writing process is done is easy. Then you can make semantic adjustments and decide to either output as PDF, process in complex pipeline or plug-in one of the hundreds of solutions that exist today.

It might hold back "editorial" experience but it definitely enables "authoring" experience.


The author here!

   Markdown is for writing not for editing. If you want to focus on writing, text, and content instead of on bells and whistles with formatting and typesetting it's really hard to go easier than Markdown.
That's exactly my argument. I literally quote Gruber on it.


I like markdown. At the moment it is somewhat like HTML without CSS. Great for structuring documents. Not so good when it comes to formatting. There are extensions to it, but I like clean flavor.


Referring to slack turning on markdown: "shows how deep the love for markdown is in the developer community."

I think I need to call BS here. Typing out a phrase, taking your hands off the keyboard to then fight with wysiwyg is a step back and unusable. What is a max typing speed doing this? I'm curious if anyone can break 20wpm doing that. The mouse controls for formatting are simply not usable for anyone wanting to effectively type above 30wpm

So that aside. I thought markdown was a unification of the absurd number of wiki languages that were coming out. There is tracwiki, wikipedia's, mediawiki, countless more with different variations of markdown. They all have HTML knock-offs because html is overly verbose for quickly typing out a simple wiki page that us mostly lists and headers.


I agree with a lot in this article. I too have seen 'non-technical' users struggle with Markdown. The popularity of Markdown among developers also reminds me about SSGs (Static Site Generators). Two technologies loved by developers - and no-one else.

"For limited use cases, like blog posts of simple rich text with images and links, markdown will get the job done."

I'd argue it's even less suitable than that. Developers never look beyond Markdown when writing documentation or articles and are thus blissly unaware of the much richer writing possibilties of using HTML (which is simple to learn, unlike CSS).

Full HTML also gives readers a much better reading experience. In Markdown, even a simple image caption is not possible.


> Developers never look beyond Markdown when writing documentation or articles and are thus blissly unaware of the much richer writing possibilties of using HTML (which is simple to learn, unlike CSS).

HTML is wolf in sheep's clothing though. It's deceptively simple to learn the basics, but the nested nature of it is something that will allow one to shoot their semantical or representational foot off, even before trying to apply CSS. Take, for example, humble P tag. The thing that trips up most people, for example, is that ideally you'd need to contain text blocks in paragraphs, not just have it free floating in the document. In theory. P tag is simple. Except that end tag is optional. When writing HTML, that is. But not always, as per standard:

"A P element's end tag can be omitted if the p element is immediately followed by an address, article, aside, blockquote, details, div, dl, fieldset, figcaption, figure, footer, form, h1, h2, h3, h4, h5, h6, header, hgroup, hr, main, menu, nav, ol, p, pre, section, table, or ul element, or if there is no more content in the parent element and the parent element is an HTML element that is not an a, audio, del, ins, map, noscript, or video element, or an autonomous custom element."

But in XHTML end tag is always required. And you shouldn't have block level elements like images within P. And you can't nest them. When outputting DIV with a block of text that DIV should contain a P, not just text. DIV's are cool, you can nest DIVs. And they need end tag. Simple? Well:

"Authors are strongly encouraged to view the div element as an element of last resort, for when no other element is suitable. Use of more appropriate elements instead of the div element leads to better accessibility for readers and easier maintainability for authors."

But who does that nowadays? It's DIVs all the way down usually. So no, HTML is not simple.

https://html.spec.whatwg.org/


The tag omission/inference rules for <p> come from SGML, and, as presented in WHATWG's HTML spec, seem arbitrary. But actually what they're saying is that a paragraph is automatically ended on a non-phrasing element. In earlier HTML versions, a definition of flow content was given like this:

    <!ENTITY % flow
     "%block;|%inline">
where the flow, block, and inline parameter entities expand to strings like a|abbr|... such that they can be used in element declarations as follows:

    <!ELEMENT p
     (#PCDATA|%phrasing
     -(%flow_only|figcaption)>
meaning the content of p can be any phrasing element (and text content), except elements that are in the flow_only category or the figcaption element, capturing precisely HTML5's rules.

While Ian Hickson's initial (HTML5.0) rules were exactly corresponding to those of HTML4.01, later HTML5 revisions (^2) indeed failed to update the prose description for p-terminating elements, for the very reason that those weren't presented as a formal spec (such as a DTD).

In a presentation I held [1] I made exactly that point for the p element, and you might find the derivation of a new SGML DTD grammar for HTML5.x generally of interest.

[1]: http://sgmljs.net/docs/html5-dtd-slides-wrapper.html

^2: if WHATWG had bothered to version their specs, and W3C still would produce versioned snapshots of those, at least


XHTML was quietly led behind the shed and shot a decade ago.

I bumped into a site with that cute little XHTML approved logo, and checked if it verified. Of course it didn't.


If this is just a container format for storage / transformation - as I understand the author promote front-end editing tools hiding the storage format, then they just try to reinvent structured text… sgml, xml and all their tools, schema description…


I wish libreoffice would have some sort of editor to edit a document using markdown.


Anybody remember setext, which was used in the original TidBITS newsletters back in the early 1990s? When Gruber introduced Markdown back '04, it struck me as an enhancement on setext more than anything else.


> Portable Text isn’t design to be written or be easily readable in its raw form; it’s designed to be produced by an user interface, manipulated by code, and to be serialized and rendered where ever it needs to go.

Oh hell no. The most important aspect of markdown to me is that it is human readable in text file format. I can be quite certain that in 500 years (assuming humanity lasts that long and still has computers) that my notes in markdown will still be completely legible.


> the appeal of plain text files is understandable. But that era is pretty much gone with the emergence of backends as a service. Services and tools like Fauna, Firestore, Hasura, Prisma, PlanetScale, and Sanity’s Content Lake, invest heavily in developer experience.

Am I the only one who has never heard of any of these?


One minor note: jekyll definitely was not first to introduce front matter (though they may have introduced yaml front matter?).

I am sure that at least pyblosxom had it earlier: https://pyblosxom.github.io/Documentation/1.5/writing_entrie...

Here's the earliest surviving commit showing blog entry front matter in 2002: https://github.com/pyblosxom/pyblosxom/blob/f80cdb42eb37ee0e...

My recollection is that blosxom (a perl blogging engine) had front matter as well, but I can't find any evidence of it.


The author here!

Thanks for these links! I tried to track the origin down, but there's only so much digging one could do. But I think it safe to say that Jekyll _popularlized_ the YAML-style frontmatter.


Those who do not understand history are doomed to repeat it.

In the beginning, there was HTML, and it was good. Then "modern content creators" did not "want to learn syntax", created "tools that edited and rendered HTML differently and inconsistently", added hundreds of unnecessary tags in it, then started to get smart about "semantic information" and introduced HTML-5-style tags like <article>, which too few people use...

Techies felt that all to be unworkable and did a new thing. They remembered text files being good, and started to call a very minimal syntax "Markdown", with a common core large enough to be almost universally understood. And now we have this article again, which reads a lot like it was about HTML, ca. 1995.


Those who do not understand history are doomed to repeat it ;)

In the beginning, there were tens of text processors with rich text encoding syntaxes (troff, TeX, IBM pre-GML, DEC stuff, WordPerfect, etc.) Then came SGML with customizable meta syntax and vocabulary to unify those. A large part of HTML is actually contained in ISO 8879, as "folklore" elements <p>, <h1>-<h2>, and so on. Then came HTML as an SGML application, and later XML as an SGML subset. Then people forgot the aspects of SGML not in XML, and invented ad-hoc syntax such as Markdown and MediaWiki syntax (both of which can be partly represented as SGML SHORTREF syntax), and Ian Hickson (WHATWG) started to specify HTML parsing rules in an informal ad-hoc way, based on the tag omission/inference rules of W3C's HTML4, and what was called "real-world" HTML (nevertheless also inventing new elements <main>, <article>, <aside>). Then, due to presentation as prose and mere enumeration of element names, the basic construction of HTML as a formal DTD grammar was lost, and WHATWG added elements and rules nilly-willy, without being informed by a DTD grammar or other formal mechanism to ensure parse-ability.


As someone who regularily writes HTML manually, there are distinct advantages of markdown over html:

* The most common use case (paragraphs of text) needs no boilerplate or magical syntax

* A typical Readme can still be easily read and understood without markdown rendering (e.g. a # Heading is still understood as a Heading, and a _emphasized_ text is still understood as being emphasized, even by the uninitiated.

* Writing markdown can be faster than both HTML or a WYSIWYG-editor, because the syntax is minimal

HTML is certainly more powerful, but sometimes you just want to chug out some text quickly and this is where markdown (IMO) shines


In the beginning, was runoff. At least runoff was around on DEC machines in 1975. I wrote a couple term papers with it, Markdown is hardly different.


I once typeset an Occam assignment paper in groff, for kicks.


> In the beginning, there was HTML, and it was good.

No, it's really not. It is excessively verbose for human usage, and too loose for computer usage. It's a hack.

For human usage, Markdown is pretty decent. For computer usage, an S-expression–based HTML would have been preferable:

    (html
     (head (title "Hello world!")
           (meta ((http-equiv content-type)
           (content text/html)
           (charset utf-8))))
     (body ((bgcolor white))
           (h1 "Hello world!")
           (p "This is my first webpage.")))


I, for one, welcome our new semantic overlords


> Those who do not understand history are doomed to repeat it.

HTML5 was introduced in 2008, whereas Markdown was published in 2004.


The ideas behind Markdown are much older than HTML5 and all those tags you talk about.

Wikipedia with its MediaWiki syntax is from 2001, ReStructuredText is from 2002, Markdown from 2004, HTML5 from 2008. And I am sure you can find way older formats which Wikipedia and ReST used as inspiration.


You missed ASCIIDoc[0] (2002) from your chosen timeframe, but also some much older Lightweight Markup Languages[1] like setext[2] from 1992.

[0] https://en.m.wikipedia.org/wiki/AsciiDoc

[1] https://en.m.wikipedia.org/wiki/Lightweight_markup_language

[2] https://en.m.wikipedia.org/wiki/Setext


appreciate a lot of the points made in the post but

```

distinct( *["code" in body[]._type] .body[_type == "code"] .language )

```

is "trivial" for no human being


Your content editors would never write that. But if they asked for a feature where you can filter content by programming languages, GROQ would allow your engineers to build it very quickly.

I think it was probably a mistake for the article to go into GROQ, but the essence of the point is that content as structured data means content is queryable.


grep -rE '```\w+' |uniq seems like the same query across text files to me. it's not "trivial" but at least grep has a man page and people have used it reliably for decades now. Spinning up whatever it takes to get groq going is not the same thing as having widely available tools that just work.

Somehow structuring content in json is a new idea? There are a lot of html to json parsers out there already. jq and grep can do everything his post suggests and require no less or more investment from a dev perspective.


Correct me if I'm wrong — my use of Jq is limited to basic shell stuff — but I believe Jq does not have joins at all. You can do joining by writing your own functions, but it's not built into the syntax.

GROQ was designed to query and join multiple sources of data, and to make it easy to plan for efficient execution (e.g. on top of a relational database or search engine). For example:

    *[_type == "author"] {
      _id,
      name,
      "books": *[_type == "book" && author == ^._id] {
        title,
        year,
        publisher->  // Join referenced publisher
        genres[]->   // and all the genres
      ]
    }
This gives you all the authors with each author's books. You can do a lot of the same transformations that Jq does, but you can also do it across multiple collections of data; it's not a pipeline with an implicit input.

(Disclosure: I work at Sanity.)


I don't know if jq can do joins but getting a list of codeblock languages is a pretty simple task for almost anyone in a text file. For complex queries I agree that you typically need a more expressive query language.

I'm really not judging this syntax based on the 2 examples I've looked at, but I agree that I can mostly read all the words and stuff, but without the comments etc this is very dense and I don't think it's "intuitive". Compared to sql it's interesting, and I hope developers like it. I just am almost automatically revolted by the marketing hype train language of the last 15 years. None of the hype comes close to being accurate and it's just all so misleading.

AWS services are among the most notorious in this regard, having sold an entire class of devs on the notion that aws would "abstract away complexity" only to have devs learning about db indices not at the cost of a slow query execution time for some customers, but $50,000 row-scan bills.

AWS hype beasts have been "abstracting away complexity" for years just to tell the poor sap devs that they need intimate knowledge of query execution plans, for some poorly conceived and managed service aws put together after ripping off some os community.

And the connection to aws is not immaterial. Every query is potentially business critical and also susceptible to tanking the entire business. I don't think we should necessarily try to treat queries as "trivial" or whatever.


Actually, I'm not sure anything can be truly "intuitive". I think you're really talking about offering familiar concepts. In this case, the syntax might make more sense if you think of GROQ as a superset of JSON, because this is valid GROQ:

    {
      "a": 42,
      "b": [true, null]
    }
But we also have functions, subqueries, joins, and so on:

    {
      "allArticles": *[_type == "article"]
    }
Here, the asterisk simply means the array of "all objects", and [] is used to apply a filter predicate.

To me, I don't find Jq at all intuitive. The fact that filtering a collection is expressed this way, for example:

    .[] | select(.type == "article")
is not "intuitive", because it doesn't look like anything else. The pipe operator certainly behaves very differently from Unix shells; it's the "[]" part that causes iteration, not the pipe.

I absolutely see your point about abstracting away complexity. Developers have a lot of knowledge in their heads about how SQL translates to execution complexity, and GROQ is more opaque. We do have an "explain" mode that shows the query plan, but we are still thinking about ways to make the execution model more transparent and understandable.


> To me, I don't find Jq at all intuitive.

Same here. Intuitive is probably the wrong word. I think I'm still looking for the right word to use to describe the idea. I think there was some progress from a fb team that translated english language queries into sql.

I hope there is more progress in the field so that the same people who write the content can search it and use it like a dev would. I think that given a stack of 5K blog posts, any dev can pump out a dozen charts and word graphs or whatever given some time, and it would all look pretty good but probably not have as much value as an editor's summary. But if the editor could also make use of the dev tooling to do more in depth research or produce some new insights, or get to the "important" information more quickly, than that would represent a step in the right direction.


I did probably misstate the overall capabilities of jq and grep compared with groq. I wasn't intending to criticize the product and I'm excited about all the offerings in this space that find a niche or provide wider value. Joins across markdown is a great idea and I'm glad this exists.


Markdown is merely the lowest common denominator of what is essentially a jungle of weird and wonderful wiki languages. The reason it is so under-specified is that it emerged out of the Ruby community in a time where the whole point of Ruby was that it was a scripting language without a lot of appreciation for things like types or specifications. The first serious attempts to unit test and specify ruby itself actually came out of the JRuby community. Which had an obvious need to cover all the complicated stuff that Ruby did so they could handle all the edge cases in the same way. You see the same with other scripting languages where the obsession with specifying what things like Javascript or Python actually do came later in the life of the languages.

I think the main reason for the popularity of Markdown is that is minimal, forgiving, and uses things that are kind of intuitive for basic formatting. * for bold, - for bullets, etc. Similar compatibility issues between various implementations exist for things like Json, Yaml, HTML, etc. Some of these things even have IETF or W3C specifications nowadays. The specifications usually come after things get popular; rarely before.

Whatever the ruby markdown library did was the specified behavior of Markdown for some time. And then of course it caught on and several alternate implementations emerged for different languages; each with their own features added or interpretations of features. Then Github adopted it, added a few features they needed and then it went mainstream in a big way by virtue of world+dog using Github.

At this point Markdown is collection of features that may or may not work depending on which library and language you use; or even where you use it. The simple stuff works everywhere. But your mileage may vary with some of the more advanced stuff like tables for example. Mostly, the Github dialect seems dominant for the simple reason that that is a popular place to find a lot of markdown content.

The problem with alternatives is that they are comparatively less widely used and therefore a bit more obscure. You need good library support across multiple language ecosystems and a lot of the alternatives simply don't have that or enough of a community of users that care enough to fix that. People have their favorites but they can't seem to agree on a single one. So, Markdown it is for most of us. I'm not even sure that is a bad thing.

I kind of like it, actually. It's good enough for my needs. I use pandoc to convert markdown files to html for my website. It supports enough of the Github dialect that I don't have to obsess over what it supports or doesn't support too much. I'm unlikely to convert my content to another format at this point. I use it without thinking in Github issues, in readme's, etc.


> Whatever the ruby markdown library did ...

FYI John Gruber's original Markdown.PL is written in Perl [1] not Ruby.

> ... things like [...] HTML [...] even have IETF or W3C specifications nowadays.

HTML was originally designed as an SGML vocabulary [2], and had SGML DTDs by IETF and W3C for the longest time [3] (albeit somewhat defective from the beginning, and grew a number of warts over time).

The beauty of Markdown IMO is that it's specified as a short syntax for HTML, with the option to include inline HTML where the fragment covered by Markdown is insufficient. That's exactly the model supported by SGML's SHORTREF feature which lets you define context-dependent tokens SGML will replace into canonical angle-bracket markup. So if you're looking for a document preparation system that lets you write casual text a la Markdown (or your own customized or novel syntax), then can translate that syntax into predictable markup and goes all the way to support complex document integrations with established XML tool chains, etc. etc. and can also import/produce HTML, then there's always SGML, based on an ISO standard even.

[1]: https://daringfireball.net/projects/markdown/

[2]: http://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html

[3]: https://datatracker.ietf.org/doc/html/rfc1866


John Gruber’s original markdown processor was written in Perl, not Ruby.


I never knew :-). Perl was already a bit obscure by the time I got active in the late nineties. But it makes sense as the first wikis would probably have been implemented in that rather than in Ruby. Likewise, php would have happened later.


I still use it on my blog (as a filter).


A good thing about it is that when a tool doesn‘t support a certain feature, it‘s usually graceful, because most syntax is designed to be usable as plain text, so just fall back too that.


Discussion from the launch of Standard Markdown / CommonMark:

https://news.ycombinator.com/item?id=8264733


KeenWrite[1] is my text editor that takes a slightly different approach than MDX. Rather than include variable definitions within documents, variables are defined in an external file[2]. I find that when variables are embedded into documents, those variables often include controls for presentation logic. To me, any presentation logic meant to affect a plain text document's presentation does not belong in the document itself. Part 8 of my Typesetting Markdown series shows the power of separating content from presentation by leveraging pandoc's annotation syntax[3].

Annotated Markdown is sufficiently powerful to produce a wide variety of different styles. Here are a few such documents typeset using ConTeXt[5]:

* https://dave.autonoma.ca/blog/2020/04/28/typesetting-markdow...

* https://dave.autonoma.ca/blog/2020/04/28/typesetting-markdow...

* https://impacts.to/downloads/lowres/impacts.pdf

* https://github.com/DaveJarvis/keenwrite/blob/master/docs/scr...

What's bothersome is how some companies are setting de facto Markdown standards without considering the greater ecosystem. GitHub has done this by introducing the "``` mermaid" syntax, which creates some problems[6].

[1]: https://github.com/DaveJarvis/keenwrite

[2]: https://www.youtube.com/watch?v=u_dFd6UhdV8&t=18s (renamed afterwards)

[3]: https://dave.autonoma.ca/blog/2020/04/28/typesetting-markdow...

[4]: https://dave.autonoma.ca/blog/2020/04/28/typesetting-markdow...

[5]: https://wiki.contextgarden.net/Installation

[6]: https://news.ycombinator.com/item?id=30337894


What you have done is really interesting. You might need to do more smaller chunks of how to use it though.

I watched part of the video and I understood the substitution but the next part just went over my head. Maybe it is like a double pointer.

Anyway your editor and your idea is super interesting. Just needs some explanation. Maybe you have already done that somewhere?


Thank you. See the following links for details:

* https://github.com/DaveJarvis/keenwrite/tree/master/docs

* https://github.com/DaveJarvis/keenwrite/blob/master/docs/var...

I visualize the interpolated variables as a tree with many branches due to their recursive relationship.


> If Not Markdown, Then What?

Textile!


First, many thanks to the author for a reasoned and researched post. The editing/mass-adoption sides of the text formatting story vis-a-vis markdown were well told. I think the story of text is important and misunderstood and evaluation brings better understanding. So please take the rest of this as a supplement more than an argument, even if a lot of it is a disagreement about the value and future of markdown and where its utility does and could exist.

I'm not partial to markdown or any flavor of markdown or any other plaintext schema. I appreciate org mode and almost every variety of "readable semantic text". These are all reasonable efforts that provide actual value to practitioners and neophytes alike. There are reasons why a tiny fractional percentage of the world uses org-mode. There are reasons why orders of magnitudes more people have come across markdown, to the point where today the overlap with HTML developers is almost, but obviously not fully, complete. On top of the HTML aware writer population, cms's have propagated a lot of incompatible and often poorly implemented "wysiwyg-md" textareas that have often leave a lot of room for improvement, but still introduce the idea of semantic text, or even store that as html/markdown. In all these ways, markdown has been a bridge between the .txt and the .html for many people. It was always a simplification and an added complexity in that regard, and its sole points of reference were txt and HTML, so digital/web-oriented.

That said, HTML semantics differ from .doc(x) semantics primarily in philosophy and ergonomics. In HTML there is <h1>, which has rules and specifications about it. In a .docx document there is no such "understood hierarchy" philosophically speaking. This represents a major structural advantage from a user and machine standpoint that I believe is the single most important advancement represented by markdown. The ability to render an automatic table of contents based on hash-tags or gather/conform citations and references programatically, has _never_ been a selling point of word or any other editor until recently when markdown starting making this type of feature obvious. People will point out word-based variations on "auto-table-of-contents" but please find me one single school in America or anywhere that teaches students to write the "html/md" way by default, in terms of organization or structure. I believe the primary reason they don't is because MSword is a free-form typesetting machine, and not an editor's tool and not a research tool. I was in academic research and talked to dozens of professors publishing in every domain. None of them had a one-click answer to "this journal rejected me; I need to reformat for another journal". That was always done by hand, by every single academic I've ever met, including in maths and sciences and cs disciplines. At best they're using a latex document, but those are way too heavy for non-maths disciplines, and often people aren't or can't make full use of the features it has to offer and end up reformatting latex for various publications.

> If you think about it, do you own your content less if it’s hosted in a database? ... And is it fair to say that proprietary database technology impinges on the portability of your content? ... > But anyone who has tried to move out of a mature WordPress install knows how little this helps if you’re trying to get away from WordPress.

The author addresses a critical aspect of the markdown document but avoids discussing the implications. Text in MSword is hard to get back out.[^1] Data recovery and searchability is critical. But even "pure html" is locked in a lot of un-indexable un-searchable code. markdown, on the other hand, makes grep, or any indexer-search tool trivial to implement and use.

Imagine any company on earth having five-thousand blog posts worth of content. At the end of the day what is the only format that can be guaranteed "readable" in 5-10-20 years time, when some internal analyst wants to find out what people have been posting? Is it easier for the analyst to spin up a wordpress install from 20?? or read through plaintext files that can be searched any number of ways instantly from a kindle reader?

Content lock-in is only one side of the equation. What are you doing with all that content? What is its value? Are you deriving real value from all the content you are creating? Is that value-creation easier or harder because its in txt or sql? These answers depend on who you hire and what you're trying to do. Mostly txt people have questions and sql people help them find answers, but sql people don't always know what questions to ask. So anyways, txt people need text to formulate questions for sql people to run. But neither of them can do anything with docx.

Which is why markdown is loose lingua-franca for the developer world, and it always had to start this way and be this way and pretending like markdown could ever exist for a txt market without a million developer tools and programs is a joke, but nvm. So anyways, markdown required fundamentally a huge buy-in from the dev community and it was well positioned to do so given its html inheritance, regardless of what latecomers to this domain say in public.

Now that there are tools that do a reasonable job translating markdown to pdf, docx, etc, the reality is that for most academics, markdown should represent a huge shift in writing, from a mostly formatting based experience, to a type-and-print model. The same exact .md file should be able to be used to generate the properly formatted and annotated text for any journal publication. This is still a dream mostly, but it could save researchers thousands of hours over the course of their careers. When considered together with enhanced searchability and citability, something like markdown or a "correctly" structured txt document would provide a lot of cognitive "unload".

Anyways, the market is ripe for replacing word, there are hundreds of great text editing products in all the markets and people are piecing together writing systems that make sense for them. It never made sense for academics and students and business people to be typesetters, because that's a professional's job and requires mostly a designer's eye. People want to put important information in a retrievable format and get it back when they need it. pretty-printing for the teacher was always a dumb exercise in scholastic obeisance, but markdown makes a readable document almost by default.

[^1] A <title> tag is easy to use for a title of a document when searching, indexing, scanning, aggregating, etc. But Word defaults you to the filename.docx, which usually ends in DRAFT-FIANL(2001-232-23-).docx What is the title of word document? HTML has <title> and <h1> but each has a specification, and websites are free to conform to those.e


I know I'm an outlier, but I started using "styles" in Word 5 for the Mac in high school around 1992 and tried to avoid writing papers without them. The trouble for teaching was with the tool being broadly available (through preinstallation or piracy), most people not having a manual, and none of the teachers being really familiar with it. With styles, Word could format a table of contents automatically without any issue. It could also format indexes and, I believe, cross references. (Interestingly, I think recovering MS Word for Mac documents before version 6 is difficult).

I could tell Microsoft was trying to come up with a way to encourage style usage as they enhanced Word for Windows, but I think most people still defaulted to pushing things around with spaces and setting their fonts manually.

HTML's default styles caused people to start with H3 or something like that, because H1 is usually frighteningly large. LaTeX came with much more reasonable default styles but nobody was going to be able to start using it without reading a bit about it and then coloring within the lines of someone else's document structure.

I really appreciate that Markdown makes the headings easy. It doesn't give you a good way of demarcating sections and has no standard way to give hints to further processing. But you can definitely split up the document into rational parts and know that you can apply the formatting later.


> Imagine any company on earth having five-thousand blog posts worth of content. At the end of the day what is the only format that can be guaranteed "readable" in 5-10-20 years time, when some internal analyst wants to find out what people have been posting?

I have over 3,000+ posts for my blog [1] and I can answer that: HTML. All my posts are stored in HTML [2]. A few years ago I started using my own markup language (a mashup of Markdown and Org Mode) but I never store anything in that format; I only store the rendered HTML output. That way, I don't get stuck with a markup style that I no longer like.

Also, I don't see Microsoft Word going away anytime soon. Every other developer at $JOB writes documentation in Microsoft Word [3]. On a corporate managed Windows laptop. Joy.

[1] http://boston.conman.org/ I've been blogging since December 1999.

[2] Each post as a file. The storage format has not changed at all in 22 years.

[3] I inadverently found myself working for an enterprise company when the company I was hired at got bought out.


>The ability to render an automatic table of contents based on hash-tags or gather/conform citations and references programatically, has _never_ been a selling point of word or any other editor until recently when markdown starting making this type of feature obvious.

This is not true. Word processors have had this feature almost since the beginning.

Microsoft Word autogenerated a TOC from text styled as a header all the way back in the nineties.

WordPerfect 4.2 (1988) could generate a table of contents via specially tagged text. It also supported paragraph numbering and auto footnote and endnote handling. This was specifically added to speed adoption in the legal market and it worked. Law offices are some of the last WordPerfect holdouts today.

I don’t believe Wordstar had such a feature though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: