Hacker News new | past | comments | ask | show | jobs | submit login
Math on GitHub: The Good, the Bad and the Ugly (nschloe.github.io)
249 points by s1291 on May 21, 2022 | hide | past | favorite | 65 comments



Did Github ever solicit feedback from the community for this feature? Was there ever a beta before they rolled it out?

Because some of these critiques really should have been dealt with beforehand and I'm concerned that we're now stuck with lousy defaults that they won't ever be able to change.

The visual font size problem seems particularly disastrous:

> The math font is a really small,

> MathJax’s default font MJXTEX-I and GitHub’s default text Helvetica have a different x-height/cap-height ratio.

I don't understand how Github could launch this feature with this problem unaddressed. LaTeX isn't just about getting formulas right, it's about communicating ideas.


Small font size seems the easiest to fix. Kerning seems worse since people will manually tweak spacing with \, etc, so you can't just change it whenever you want.


Blog post author here. The kerning is probably a problem with their font config that can hopefully be fixed. I don't think -- and that's me personal estimation -- that people will write tons of `a\,=\,b` to work around the bad kerning, so that's a change I would still recommend making.


I thought kerning specifically referred to small adjustments in spacing between pairs of letters. This just seems to be a failure in honoring the spacing for the different TeX math mode symbol classes. The rel and op classes each have a certain amount of spacing they are supposed to have on either side (unless they aren't used as binary operators), and somehow this is broken in their implementation -- the spacing doesn't come from the font configuration.

For example, a\mathbin{foo}b is supposed to render as "a foo b", but on GitHub it comes out as "afoob".

When I looked at the MathJax configuration GitHub uses, nothing pops out as being odd. It would be funny if a minifier totally messed up the part of the MathJax source code specifically for this.


It was actually a config issue that they fixed now. Some issues remains, but mostly on MathJax's side (e.g., https://github.com/mathjax/MathJax/issues/2877).


I agree that kerning is also a problem, and I also agree that it it can't practically be fixed.

However, even if it's theoretically possible to change the font size, would Github ever do so? My impression is that as an organization they place a high value on interface stability. Very little about how I interact with the site has ever changed.

I suppose that individual publishers might stick in a hack to increase the font size, but a bad default means that as a consumer, I'll be stuck looking at tiny math on Github for the vast majority of documents.


> Very little about how I interact with the site has ever changed.

GitHub tunes the UI all the time.


Author here.

> Did Github ever solicit feedback from the community for this feature? Was there ever a beta before they rolled it out?

There actually was a closed preview for this, but it wasn't long and the feedback I gave them (which is almost everything that's in the blog post) wasn't implemented.

> The visual font size problem seems particularly disastrous:

They made the decision to match the capital heights of the two fonts, not the x-heights. If they match the x-heights, the capitals in math mode will be too large.

Since the small letters are way to small, perhaps they'll increase math font size a little bit in the future to balance it out.


This does seem like a remarkably poor implementation, especially given the alternatives: GitLab, mentioned; StackExchange also has a pretty decent implementation; many markdown-based static site generators also have good support


> advantages of KaTeX:

> It’s faster.

> You can copy-and-paste math.

Someone mentioned that you can theoretically copy-and-paste KaTeX output in an earlier thread about GitHub's new math rendering too. But I think calling that an "advantage" is crazy.

LaTeX will transform your one-dimensional textual formula specification into a two-dimensional graphical formula. The concept of copying and pasting the output as text is a category error. It isn't text and if you try to paste it, you'll get something other than what you wanted.

Just to be sure I'm not crazy, I tried copying the output of the demo on the KaTeX homepage. Here it is:

> f(x)=∫−∞∞f^(ξ) e2πiξx dξ

This is much, much, much worse, if you want a textual representation, than copying the LaTeX source:

    \f\relax{x} = \int_{-\infty}^\infty
        \f\hat\xi\,e^{2 \pi i \xi x}
        \,d\xi
And it's worse even though the raw source includes the quirk that LaTeX isn't able to provide proper spacing for the differential over which an integral is being calculated, so you have to space it yourself with \, commands.


KaTeX also supports only susbset of the features of MathJaX. Some of them are really important. While speed is great, missing basic features is the worse. I'm happy with GitHub's choice.


> \f\relax{x}

I know it's not your point, and the design choice is KaTeX's, not yours, but making a macro that has to be invoked as `\f\relax{x}` to substitute for `f(x)` is … kind of crazy.

(Of course, in regular TeX, at least, you could do `\f\relax x`, saving one token at the expense of looking even less like a function invocation.)


`f(x)` works just fine. As far as I can see, the entire point of their \f macro is to let you write `\f\hat x` instead of `\hat{f}(x)`.

Leaving aside whether that's a good idea, it's not at all clear why the example then goes on to use `\f\relax{x}` to display a function with no diacritic in its name. The diacritic was the only reason to use \f in the first place. And the advertised definition, `#1f(#2)`, doesn't only require \relax when you want to omit a diacritic. It also prevents you from doing perfectly normal things like `f\left( something-hairy \right)`.


A source block type would have been a very natural fit, it’s surprising that isn’t how they chose to go.

  ```equation
  e^{i\pi}+1\eq0
  ```
feels like what i’d try to do right off the bat.


A code block is used to display text literally, except adding syntax coloring. You would use it to display TeX source notation for example. A math block transforms TeX into something else, it's entirely different. I think it's correct that they did not use the code block syntax for this.


It's how Github already handles Mermaid diagrams, so there's precedent, although you're correct that logically there should be a distinction between "display highlighted Mermaid source" and "render Mermaid diagram".


Yes, and now they've reached a dead since they have no way to render mermaid source code except as plain text or by introducing a new name like mermaid-src. It's minor but it's unsatisfying conceptually. I'd rather have them introduce a new "run delimited text into external program and replace by output" tag.


Author here.

The main idea of using code blocks for math is to protect its content from being messed with. Markdown parsers do that by default, so that's how it's "natural". As you mention, the drawback is that you can't have a codeblock with syntax highlighting for a language called "math" (should there ever be one), but that seems like a small price to pay.


I prefer the $ $ way, as it makes it possible to do inline equations, while keeping the source easily readable.


you can do both, the normal markdown way:

  `$a$` squared is `$a^2$`, which is good to know for the pythagorean theorem:
  ```equation
  a^2+b^2\eq c^2
  ```


That doesn't work because then how do you display $a$ as literal inline code?


What about:

    Inline code is ` $a$ ` automatically trimmed
But I think $`a+b`$ makes more sense (or even $$a+b$$; I mean this is markdown after all, not LaTeX).


> But I think $`a+b`$ makes more sense (or even $$a+b$$; I mean this is markdown after all, not LaTeX).

True, but it is TeX notation, and `$$ $$` for inline math goes deeply against the experienced TeXnician's intuition. Why intentionally use notation that violates some users' domain intuition when there's an alternative that's no worse?


Ah yes, agreed, then that does indeed seem like to optimal solution here.


They suggest this for inline using combination of the code back tick and dollar syntaxes:

Inline math: $`a^2 + b^2 = c^2`$.


Especially since this approach has already been tried and proven by competitors like GitLab.


Yes, could use different tags for literal display and rendering eg ```tex and ```eq.


> The reason why I’m so excited about this feature is that, in combination with version control and the issues/discussions capabilities in GitHub, I can see tectonic changes in how we’re publishing science. At last, science can really reap the benefits of a connected internet by moving away from static PDFs to living, breathing repositories which render like PDFs and provide a central place where one can actually talk about the article. – And fix bugs!

I'm skeptical of this take. Gitlab has had math rendering support for quite a while now, so this is hardly novel and doesn't seem to have resulted in the utopia the author is hoping for.


I had the same feeling for a different reason: I don't think this reduces friction enough compared to sharing LaTeX files (on github or overleaf), which we can already do. So I don't think this will usher in a new era.

I realise this could very well become the next "dropbox comment" - and I'll be happy to be proven wrong.


It would be cool if they would just render entire latex manuscripts that are part of the repo. It sort of sucks that you need to duplicate what you wrote a paper submission into markdown to get github to render it.


They do this for org mode files, but that's a much simpler format than LaTeX.


Author here.

You're right, the big dreams always turn out smaller in real life when they come true, right? I'm hoping that the popularity of GitHub will aid a shift of publication style though. :)


One thing that the article gets wrong is accusing mathjax of being abandoned. Development has moved to a new repo for the next version.

https://github.com/mathjax/MathJax-src/graphs/contributors


Author here.

Thanks for the hint! I had indeed overlooked that. I have updated the article accordingly.


Note also that TeX math can contain \text{..} which can itself contain $-delimited TeX math, e.g. $x = \text{my $y$}$. This currently breaks the GitHub implementation.


I tried all the examples using

  pandoc --from markdown --to html --mathjax
i.e., using pandoc Markdown[1] with MathJax enabled, and it has none of the problems described in the article (see output[2]). The problem doesn't seem to be due to MathJax, and even using GitHub Flavored Markdown as the input format to pandoc produces the correct results.

[1]: https://pandoc.org/MANUAL.html#pandocs-markdown

[2]: https://gist.github.com/bewuethr/691b4870828d7b2261113f14eef...


The difference between GitHub and pandoc with GitHub-flavored Markdown as input is that pandoc doesn't sanitize as aggressively as GitHub does. For example, pandoc doesn't remove the backslash-escapes before non-letters as in `\{`.

Not sure if GitHub will compromise here, though.


I completely agree with the conclusions, GitHub repositories can really be the focal point of scientific publishing in years to come.

I always wondered why it took so long to implement math. There are better implementations out there for sure, it's weird they waited for so long and implemented something that is below the standard of alternatives.

The next step is to render citations from .bib files. I hope they get that right in the future.


I’m very happy they went with simple $..$ and $$…..$$

This is much more natural for anyone doing math or science and also makes it easy to copy and paste math from elsewhere

It’s better that the parser works harder than to force an inconvenient syntax on the user.


The problem with adding so much complexity to the parser is that it's now impossible to predict how any given Markdown is going to render on Github. If Github's parser was open-source, you could at least look at the source to figure out this stuff is handled, but it isn't, so the only alternative is tedious experimentation.

One of the main draws of Markdown was that it was so simple you were rarely left guessing how any given input would be handled (though the lack of standardization hampered that).


It’s simplicity is also it’s crutch. It lacks so many niche things (like math) that every implementation have their own syntax.


The problem with heuristic parsers is that you can't learn the rules, so the only way to use it now is with live preview.


Author here.

One problems is that the parser doesn't work harder yet, and it will be difficult even for GitHub to make it so. The $-syntax is familiar to TeXies indeed, but if the cost is that math won't ever work properly, I'd rather have a markdowny syntax.


The author fails to mention MathML as an alternative choice. The options aren’t just MathJax and KaTeX, but also raw MathML. MathML gives you extremely fast rendering, more font choices, copy-paste, a11y, etc. out of the box. The only downside is that Chrome is lagging behind in implementation, for that they could use MathJaX as a polyfill—as MathJax understands and is able to transform MathML—and allow Safari and Firefox users the benefits of using browsers that can render math natively.


It isn't mentioned because because they aren't really the same thing. MathML is a web native math markup language and being XML is meant to be written by machines rather humans[^0]. TeX is a markup language that MathJax and KaTeX render to a web suitable format but meant to be written by humans[^0]. Both MathJax and KaTeX have support for rendering TeX to MathML.

[^0]: Compare Pythagorean theorem written in MathML and TeX.

MathML:

  <math>
  <mrow><msup><mi>  a </mi><mn>2</mn>
  </msup><mo>+ </mo><msup><mi>b </mi><mn>2</mn>
  </msup><mo>= </mo><msup><mi>c </mi><mn>2</mn>
  </msup></mrow></math>.
TeX:

  $$
  a^2 + b^2 = c^2
  $$
I'll be surprised if someone preferred to write the first rather the second.


Nobody writes the first example by hand[^1]. They write it in TeX (or some other easy to write dialect, or use a graphic WYSIWYG editor) and then use a tool to translate it to MathML.

What I’m advocating here is that GitHub translates TeX to MathML on their servers and serve us the MathML, as opposed to leaving the TeX as is and ship a JavaScript library to render it after it reaches our browsers. Chrome users won’t see a difference as they don’t support MathML, so they need MathJax as a polyfill. But there is a world of difference for us Firefox and Safari users.

Another benefit is that you could allow us to include the pure MathML (by hand or authored by some other tool) if we preferred that.

^1: and if they did write it by hand it would be written as

    <math display="block">
      <msup>
        <mi>a</mi>
        <mn>2</mn>
      </msup>
      <mo>+</mo>
      <msup>
        <mi>b</mi>
        <mn>2</mn>
      </msup>
      <mo>=</mo>
      <msup>
        <mi>c</mi>
        <mn>2</mn>
      </msup>
    </math>


>What I’m advocating here is that GitHub translates TeX to MathML on their servers and serve us the MathML

I see. What confused me was the "options aren’t just MathJax and KaTeX, but also raw MathML" since those tools can be used server-side and they output some HTML or MathML or images. MathJax moreover besides TeX also takes MathML and AsciiMath input.

>Another benefit is that you could allow us to include the pure MathML

I guess they went with what most people work with since rendering TeX in Markdown complements their Jupyter notebook rendering.

>and if they did write it by hand it would be written as

Maybe. I copied my example from MDN docs git repo.


Author here. Thanks for the comment!

The main problems pointed out in the blog post are problems in _parsing_ the Markdown+TeX pages. The output could be MathML indeed, but this is an issue they could always fix later.


My theory is that there’s already a lot of existing content using $ and $$ that GitHub wants to start rendering without requiring any changes.

I agree the code block approach would be less ambiguous, but there is an advantage in going where people already are.


The pick of $ and $$ as delimiters seems rushed, to be frank. Although I’m not a big fan of mixing LaTeX in Markdown, I understand that choice (alternatively you could go with ascii-math like syntax which mixes way better with Markdown IMO). But $ and $$ makes not a lot of sense other then LaTeX does it this way. It would have been easy e.g. to use $$ for inline math and $$$ + newline or ```math for block, and that would have gotten rid of many of the warts of mixing Markdown and LaTeX.

In my opinion the familiarity of $ and $$ is sacrificing a lot for not much benefit.


Given that the main point of using LaTeX in Markdown is familiarity of users, using $ and $$ is actually the ONLY proper choice. But yeah, it leads to problems, which is why I would not use Markdown in the first place, but some Markdown inspired format which mixes better with $ and $$.


This is why I prefer AsciiDoc. It's consistent because there's only one implementation, it's less ambiguous, and more predictable. Although it takes a bit longer to remember all the syntax, it's not difficult, especially if you're only going to use the same subset of features that markdown supports since it supports most of the markdown syntax as well. I also much prefer the flexibility with tables compared to markdown. I just wish there were more parsers/converters other than the main ruby one and the transpiled JS one, although I know there's work being done on other language implementations.

As an example for math/equations, inline math is stem:[sqrt(4)], which defaults to AciiMath, but can be changed with a page attribute. To specify inline, LaTeX is latexmath:[\sqrt(2)] and AciiMath is asciimath:[sqrt(2)].

For blocks (which you can replace stem with either latexmath or asciimath to specify),

[stem] ++++ sqrt(2) ++++


Looks crazy verbose compared to $ and $$, especially if you're actually using it intensively.


True, it's more verbose, but I'd take increased verbosity and standardization over increased ambiguity and inconsistency since every markdown parser and renderer translates things a little differently, which is why there's things like Babelmark[0]. That verbosity also provides consistent, more powerful features like multi-line table cells, table cell spanning, table nesting, sidebars, admonitions, footnotes, table of contents, image embedding, cross-doc references, latex-like includes, etc. that all follow a similar inline and block syntax and are rather clear from a glance.

It's certainly not perfect, but I much prefer it for the flexibility and consistency to the dozens of markdown implementations that all do things a little different and not needing to drop down into HTML when I need to do something just outside of markdown's capabilities.

0: https://babelmark.github.io/


Author here.

I've been a managing editor for scientific journals for a number of years, and I can tell that -- while $ is still popular -- (almost) nobody uses $$ anymore. So I wouldn't say this is "where people already are".


If you use a proper markdown plugin to parse math instead (such as https://github.com/goessner/markdown-it-texmath), then the problems pointed out in this blog post go away.


Author here.

That is interesting! Have you tried it on the examples in the blog post? I'd be curious to see.


The way to really solve this is unfortunately for every equation that you have on Github you need to generate a png for each equation and just insert it into the file until Github fixes the various issues.


This article should not be taken seriously, which has at least 2 fatal errors in its argument.

First about the choice of delimiter. It argues for the use of \(…\) and \[…\] instead of $…$ and $$…$$. They don’t even explain why, but cite an external link to, guess what, a reason why the former is better for LaTeX, not markdown.

Then they made a claim about why putting math inside code delimiters are better.

Detouring a little bit, my guess that GitHub chooses this delimiter is because they have prior collaboration with the author of pandoc to create the common mark spec. Their GFM parser is an extension of a common mark parser.

And since pandoc chooses the dollar sign delimiter as the math delimiter, it is natural for them to choose it too, just to have lesser variants.

Then why pandoc choose $ instead of \(…\) and \[…\]? And would that goes against the best practice of LaTeX?

The answer to the latter question is no, because the $ is the markdown delimiter, and when transpiled to LaTeX, it becomes the \(…\) and \[…\].

Answer to the first question is related to the fact that at least in pandoc, \[ means escaping the [. So to type \[ in markdown, it should be \\[, which is cumbersome. (There’s a special flag in pandoc that you can use bare \(…\) and \[…\] though.)

And since all this parsing is happening at the parser level, the code-delimiter argument is invalid. They are thinking more in terms of hacking an existing parser and render math as a post-processing step.

The second fatal error is comparing MathJax to KaTeX by showing the commit freq. and claim that MathJax is dead.

This is a really amateurish argument seen occasionally elsewhere in the open source community that if a software has no recent commit history it is dead.

In this particular instance, KaTeX is far behind MathJax in terms of coverage of rendering valid LaTeX math. This alone means they have a lot of catch up to do and no wonder a less mature software has more commit activity than a more mature one.

MathJax is backed by a lot of organizations/universities and they have commitment to maintain it. It is far from being dead.

KaTeX aims at being fast (MathJax 3 reclaim that territory a bit but AFAIK KaTeX is still king in that aspect.) If the math you use can be rendered by KaTeX, then use it. But it happened to me that I have to change a config to use MathJax instead of KaTeX because soon or later I’m going to use something KaTeX doesn’t support.

As GitHub doesn’t give you choice, choosing a default that has better coverage is no brainer.


Author here.

Thanks for all the great input!

I don't mean to argue for \(...\) and \[...\] in Markdown; it was just a side note I had added to satisfy the LaTeX purists such as myself. :) I hope I've clarified this in the text now.

> And since all this parsing is happening at the parser level, the code-delimiter argument is invalid.

Well, you're right in that they have to modify their parser to protect the $-$$ blocks. The issue is that they don't do that yet, and the fact that they don't have that that ready at release date makes me think it's hard to do. A smart way around all these problems would be tp use codeblocks for math since you don't have to modify the parser for this at all. That's what GitLab does anyway.

Let's just hope they'll adapt the parser.

> The second fatal error is comparing MathJax to KaTeX by showing the commit freq. and claim that MathJax is dead.

Thanks also for this comment! Another user pointed it out to me as well, and I've adapted the text accordingly. Having worked extensively with both MathJax and KaTeX, I found KaTeX has many advantages still. Some of those stem from the fact that it's far younger and built upon younger design principles, leading to better modularity for example. Anyway, apart from load times this doesn't affect the user so ultimately it's GitHub's choice.

Thanks again for taking the time for such a thorough review.


Do \left and \right work for parens?


> Its main advantage over MathJax is that it isn’t dead. Check out the repo activity on the two projects:

I know this is a common view, but this is such a strange mindset to me. Unless you add scope, a project is eventually just mostly done. It's not that MathJax has zero commits, it just has fewer. That _in itself_ isn't a bad sign to me. It could just mean it's a mature project.


Per jaltekruse it’s inaccurate; work has moved repos.

https://news.ycombinator.com/item?id=31457631

I do tend to think that any web technology that isn’t changing is dying. Browser, HTML, are admittedly both much more stable than they used to be, but the world keeps moving.

(Edit: fixed spelling of @jaltekruse)


Completely tangential, but any recommendations on online complex math courses for engineers who haven't been in school for 15+ years? :D


What do you want to use it for? There’s complex math for physicists (differential equations and linear algebra), and there’s complex math for mathematicians (i.e complex analysis, functional analysis, etc).

It’s all pretty much the same thing, but the emphasis is different depending on if the objects of your study are physical or non-physical.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: