Hacker News new | past | comments | ask | show | jobs | submit login
Rewriting LaTeX in Pure Rust (github.com/crlf0710)
248 points by xvilka on Dec 7, 2020 | hide | past | favorite | 138 comments



Personally, I think this approach -- which as far as the engine is concerned looks like a largely-automated "translation" of xetex from C/C++ to Rust -- is misguided.

The xetex code is a byzantine tangle of disparate pieces that evolved over the course of a number of years and several changes of direction; it originally started as a personal tool to address one individual's use case for a "Unicode-capable TeX", and grew from there. While it generally works really well, it's not a solid piece of engineering on which to build the future.

The resulting code doesn't benefit from being Rust, except for the added buzzword-ness. It's simply a translation of the C code, littered with unsafe blocks, as it has not been architected to work with Rust's ownership semantics, borrow checker, etc.

What XeTeX (or LuaTeX, though I don't find the Lua integration important -- but that depends heavily on your use cases) needs is a rewrite that preserves backward compatibility for documents, while re-architecting the engine using a modern language such as Rust. Simply wrapping the 1980s-era code in a "skin" of Rust syntax brings little value.


The advantage of this approach (as opposed to a rewrite), is that you reproduce all the bugs and misfeatures that people in the wild depend on. You can then add a test suite and start refactoring and gradually move to a codebase you're happy with, while breaking few or no users on the way.

Another approach is to split it into blocks and replace parts of the system, but whether that is feasible depends on the software and how modular it is.

Finally, you could try a clean-room rewrite, but that could take years without visible results and is hard to find the motivation for if existing software works ok.


But the advantage of avoiding C specific bugs is shallow, because the Rust code contains lots of unsafe blocks. Bugfixes made to the original source are not automatically ported to the Rust code base.

From my point of view the chosen porting "strategy" doesn't make much sense. It is more of a toy project to see what's possible.

What would really make sense would be starting with an extensive test suite and trying to build a properly architected Rust implementation according to it.


Your suggestion would be the start of a clean-room or fresh rewrite - so option 3 above, but those are often surprisingly long and unrewarding endeavours, particularly in old software like this with lots of users used to its quirks and bugs and who disagree about what the spec should be.

Automated translation is obviously not an end goal in itself and doesn't improve code quality, but it could be the start of a successful rewrite as it does at least let them replicate exactly what the old program does, which is very important for end users.

A good example of this strategy being used successfully is the use of the tool c2go to convert the Go runtime and compiler from C to go. That involved quite a specific tool tailored to the codebase in question, and a lot of manual cleanup afterwards.

https://docs.google.com/document/d/1P3BLR31VA8cvLJLfMibSuTdw...


The unsafe blocks can be removed one by one as time goes on. It's no different to any other legacy refactoring project: get the old code on a new platform, instrument/add unit tests, refactor piece by piece until the end result is acceptable.


> It's no different to any other legacy refactoring project

There are two approaches to a project like that, as you say, take something which works and iteratively make it better, or derive a specification from the project which works and create a new from-scratch implementation to meet that specification.

My experience with the "from-scratch" approach is that it is very easy to miss details in the specification that will only be found out later, so it is very easy to underestimate the amount of work required. Ironically, that contributes towards making it easier to kick off the project, as it looks like it will be easier and cheaper. Especially if there is a view to drop features in the new version, which is fine until the actual users find out about the plan.

Another issue is that the old system which is being improved is often actually still in use, and the users still want new features even while the new system is being developed. Either those requests can be rejected, or implemented twice (once in the new system, once in the old). When incrementally improving the current system instead, those new features may end up touching areas of the code that have already been improved, making them cheaper to implement, not more expensive.

Basically, I think you're right. Keep the current system working, and improve it without breaking it.


The more I read about legacy (and actively maintained) project refactoring, the more firmly I find myself agreeing that the gradual replacement is the right way to go about things in nearly every case.

Whether we like it or not, the old system is a source of truth about how things are done, so the only way to preserve this knowledge fully is to copy the whole thing as-is and then "restate" parts of that knowledge in a more organised/modern way by refactoring, leaving the rest in place.


If you’re interested in this topic, you should read “How to work effectively with legacy code” by Michael Feathers.


It's been on my reading list for a while, I'll take this as a reminder to read it :)


That looks really interesting, thanks for recommending it here.


Everyone has a different way of handling this but if I had to, I would also take the approach described in this project: a variant of the strangle pattern of refactoring.

However, I don't know of a single one of these c to rust translation projects that has succeeded. I don't know of any large c to rust rewrite project that has succeeded so that doesn't help judge between rewrite patterns. It may just be that rewrites don't yield results at all.


Is is not clear if all the Bugs will manifest in the same manner after the translation.


How do you not break people who depend on misfeatures, while eliminating misfeatures?

Hey, I think Windows drive letter names are misfeature.

Let's have a Windows-in-Rust and they will soon be a thing of the past.


Lets say you have 200 misfeatures, most of which you don't even know exist. You might want to fix 10 of them and not care about the rest, but if you start by breaking all 200 of them users will be mad and stop using your port.


I don't agree.

Even if an automated translation isn't using the new language in a "proper"/idiomatic way this can still be a good base to start working, while ensuring compatibility. From there one cna start and extract pieces and rewrite individual pieces and do lots of refactoring.

It is a long and tidious process, but a process which has a chance of leading to less bugs than a rewrite from scratch, by working on smaller pieces which can be verified one at a time, even if no proper test suite exists.


If I recall correctly, they did a machine-assisted conversion of the Go compiler from C to Go using a similar technique. They wrote a compiler for C to Go, only bothering to handle the particulars of that one codebase, and changing the C if required. This emitted very un-idiomatic Go code, which they could then clean up as required.

What they got immediately was a few classes of bug gone. No more memory leaks, and NPEs and out-of-bounds errors are now defined to fail in a nicer way. Then they could spend time making their new code more idiomatic.


I absolutely agree, but would go even further.

When you say Xetex "not a solid piece of engineering on which to build the future", I think the same thing also applies to the original Tex engine written by Knuth. By modern software engineering standards, the original Tex implementation is a nightmare. It's enormously difficult to extend or add new features, and this has resulted in (1) comparatively few extensions being made and (2) when such extensions have been made (e.g. Xetex), they are very difficult technically.


I agree, but one also can't blame Knuth: he wrote the program the way he knew best (he's a machine-code programmer at heart), under the constraints at the time (portability at various academic sites circa 1980 practically dictated Pascal, then Pascal's limitations required a preprocessor like WEB, etc). In fact, the earlier (TeX78) implementation in the SAIL language was written less monolithically, as a bunch of separate modules.

He also did his best to make the implementation and source code understandable, publishing the program in print as an extensively documented/commented book (another reason for WEB), gave a workshop of 12 lectures about the implementation of the program, even had a semester-long course at Stanford with that book (program source code) as textbook (with exercises and exam problems). He also wrote TeX with hooks and some of its core functionality written as extensions using those hooks, hoping it would show others how to extend it. He has multiple times expressed surprise that more people didn't write their own versions of TeX. “Rewriting a typesetting system is fairly easy.” He seems to have overestimated the ability of others to read his code.

If anything, I think a lesson from the TeX situation is that one's work can be too good: if he had simply published the algorithms at a high level (only the Knuth-Plass line-breaking algorithm was published as an independent paper) then maybe others would have implemented/combined them in interesting ways, but by publishing the entire source code and offering rewards for bugs etc, TeX got a (deserved) reputation as a very high quality stable and bug-free codebase and everyone wanted to use literally TeX itself. What's worse is that for a few years after it was created, TeX was possibly more widely available and more portable (what with its TRIP test and all that) than any single programming language (one had a much higher chance of TeX macros working consistently everywhere TeX was used, than code written in say C or Pascal): so it must have seemed natural to write large things like LaTeX entirely in TeX macros. As “TeX macros” wasn't designed or intended as a full-fledged programming language, we can see the effects today.


> As “TeX macros” wasn't designed or intended as a full-fledged programming language, we can see the effects today.

Making them Turing complete was a conscious decision, though a reluctant one:

> Guy Steele began lobbying for more capabilities early on, and I [Knuth] put many such things into the second version of TEX, TEX82, because of his urging.

http://maps.aanhet.net/maps/pdf/16_15.pdf


Oddly, I disagree with this take. TeX stands as a ridiculously stable codebase. Something that is not valued by near anyone in industry today. Such that the things we think make good engineering are standing on empirically weak arguments.

Now, aesthetically I absolutely agree. It is an ugly language by most standards. But, if folks tried more extension and less porting to a new language, I'd wager they could get far. Instead, we seem to mainly get attempts at trying an extension by first establishing a new base language. Every time.


> But, if folks tried more extension and less porting to a new language, I'd wager they could get far.

This is what happens - pdfTex, Xetex, Luatex: these are all extensions of the core Tex implementation. The problem is that making these extensions is extraordinarily difficult given the software architecture of Tex. And, notoriously, sharing improvements between extensions is also very very difficult. The end result is that we have compatibility few improvements and complete stagnation is some aspects in the typesetting space (for example, no alternative/improvement to Tex's pagination algorithm).

The problem is that Tex is architected as a monolithic application which you can't easily plug extra stuff into. All of the extensions to Tex have worked by forking the source code entirely, which I think is not a great model.


> Simply wrapping the 1980s-era code

I just checked the Wikipedia page on XeTeX (not knowing what the heck that is at all). It's in fact actually something very recent in terms of the TeX timeline; it was released in 2004. It has Unicode support, which seems to be the big thing.

That is more recent than, I think, the last time I used TeX; which worked absolutely fine.

Someone rewriting TeX in Rust should work with Knuth's original Pascal sources, in my opinion, not some knock-off (and look at XeTeX behaviors and documentation in order to do the Unicode stuff in a compatible way).


> the last time I used TeX; which worked absolutely fine.

I've heard that original TeX is the only software in the world that doesn't have any bugs (no one found so far).


That's certainly not correct; while it is much closer to bug-free than most software, there have been numerous fixes since the original release (as well as a few enhancements).

See https://ctan.org/tex-archive/systems/knuth/dist/errata, particularly the file "tex82.bug".

Knuth will be reviewing bug reports and potentially issuing additional fixes again next year (see http://www.tug.org/texmfbug/).


I found one years ago that I didn't report. When I issued Ctrl-D on the interactive TeX prompt to bail out, it failed to issue a newline, leaving the operating system prompt juxtaposed to the right of the TeX prompt.

According to ISO C, "[w]hether the last line [of a text stream] requires a terminating new-line character is implementation-defined", so terminating the program without the last character written to stdout (a text stream) being a newline is not maximally portable.

That's a peculiar and possibly unique situation in the standard: whether or not a requirement exists is implementation-defined. Logically, that is as good as it being required, since any implementation can make it required. Those not making it required are just supplying a documented extension in place of undefined behavior.


> Simply wrapping the 1980s-era code in a "skin" of Rust syntax brings little value.

That's right for TeX, but XeTeX was first released in 2004.


Yes, but the bulk of its code is the original TeX code, from 1984. It did not attempt to reimplement or modernise the core code. Until a few years ago, it was even still built from tex.web plus a set of change-files plus some C/C++ libraries. Nowadays, the main change-files have been merged into the WEB source, for easier management, but it's still the old TeX code at heart.

Actually, there was another intermediate stage: XeTeX is in effect a descendant of TeXgX, an extended version of TeX that integrated with the now-discontinued QuickDraw GX graphics and font technology on classic Mac OS. But anyhow, it's still a direct descendant of Knuth's code. (No criticism intended: TeX was -- and still is -- a fantastic piece of work, but its code is from a different era and was shaped by constraints that are irrelevant today.)


> littered with unsafe blocks

Good points. It's unfortunate that headlines rarely distinguish safe Rust from unsafe, when so much of the advantage of Rust depends on it. You may even get a hostile response for asking about it ( https://news.ycombinator.com/item?id=24141493 )


I think you're getting your hostile answers for implying that unsafe rust is equivalent to C. It's most definitely not. Using unsafe grants you certain powers, but it does not disable all of rusts features - for example the borrow checker is not turned off by unsafe.

See https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html#unsa...


Safety doesn't just come from language features, it also comes from the language disallowing dangerous actions. Rust's unsafe mode opens the door to undefined behaviour, of the sort that plagues so many C/C++ codebases (buffer overflows etc). A program written in safe Rust offers far better assurances than a program making heavy use of unsafe Rust: safe Rust is unable to result in undefined behaviour. (Bugs in the compiler and standard library may still cause mischief, but that's another matter, the intent of the safe Rust subset is to be guaranteed free from UB.)

You may be right that a program written in 100% unsafe Rust might still be less prone to undefined behaviour than a program written in C, but that's not my point. Excessive use of unsafe features undermines the considerable safety advantages that Rust offers over C, and it's regrettable when this is disregarded.


> Safety doesn't just come from language features, it also comes from the language disallowing dangerous actions. Rust's unsafe mode opens the door to undefined behaviour, of the sort that plagues so many C/C++ codebases (buffer overflows etc). A program written in safe Rust offers far better assurances than a program making heavy use of unsafe Rust: safe Rust is unable to result in undefined behaviour.

Safety is not an absolute, it's a spectrum. No one denies that safe rust is better than unsafe rust on the safety scale.

> You may be right that a program written in 100% unsafe Rust might still be less prone to undefined behavior than a program written in C, but that's not my point. Excessive use of unsafe features undermines the considerable safety advantages that Rust offers over C, and it's regrettable when this is disregarded.

It's not disregarded. The point you are disregarding that when porting a C application to rust, unsafe rust is a step up from C, not step down from safe rust. Unless you choose to rewrite from ground up (which is infeasible in many places), you'll need unsafe rust, either for binding or by using tooling that converts the C sources to rust. But once you have unsafe rust, you already get all the help that the borrow checker brings and you can gradually shrink the unsafe code. It's a matter of practicality, you seem to be advocating for absolutes and I think that's earning you the down votes you're receiving.


> Safety is not an absolute, it's a spectrum.

Use of the safe subset of Rust means the compiler and standard-library offer you a guarantee of the absence of undefined behaviour. That's an absolute guarantee of safety, under Rust's understanding of the word.

It doesn't give you a guaranteed absence of memory-leaks. It certainly doesn't give you a guarantee of whole-program correctness, as Rust isn't a formal verification framework. Both these properties are beyond the scope of 'safety' as Rust uses it.

> It's not disregarded.

It is. Projects are described as written in Rust, treating safe Rust and unsafe Rust equally.

> The point you are disregarding that when porting a C application to rust, unsafe rust is a step up from C, not step down from safe rust.

I explicitly acknowledged this.

> Unless you choose to rewrite from ground up (which is infeasible in many places), you'll need unsafe rust, either for binding or by using tooling that converts the C sources to rust.

Sure, no disagreement there.

> But once you have unsafe rust, you already get all the help that the borrow checker brings and you can gradually shrink the unsafe code.

Sure, and that's a good use of Rust's unsafe features.

> you seem to be advocating for absolutes

Not really. If I had meant to argue that Rust shouldn't have unsafe features, I'd have done so.

In an ideal world all code would be written in a way that completely closes the door on undefined behaviour, but we agree there are good reasons Rust includes its unsafe features, and there are good practical reasons to use them. I'm advocating for a greater emphasis on the use of the safe subset of Rust.

Written in Rust tells me something about the software. Written in 100% safe Rust tells me much more about the software. That's essentially my point.

This distinction doesn't arise with languages like C and JavaScript. All C is unsafe, and all JavaScript is safe. For languages like Rust and D, there's value in being upfront about the use of their safe subsets.


This looks amazing and like everything I always wanted.

Sadly, I think basing off XeTeX and not LuaTeX is a mistake. Certainly renders it unusable for me. Having Lua integration is just great.

Also, `lualatex` does not have some of the limitations of `xelatex` (memory limitations, `contours` package, ...), but I guess this XeTeX reimplementation can work on removing those implementations, so that only lack of Lua integration remains.

Also, like another person said, not having biber breaks my workflow as well, which specifically tries to leverage the "latest and greatest" of what LaTeX has to offer [0]: `pdflatex` is obsolete, so `lualatex` it is. `nomencl`, `makeindex` etc. are obsolete, so `glossaries-extra` it is. `bibtex` is obsolete, so `biber` it is. Throw in `latexmk` for automatic compilation (which the tool presented here does too, which is a biggie! [1]) and CI/CD and you have a 1970s tool in 2020s attire. Lua rounds off the picture.

Among other things, this given Unicode-native (gasp) code/documents, and great automation capabilities (`latexmk`, CI/CD, Lua).

I think a modern TeX engine reimplementation should support all of the above, which are arguably the best modern options there are.

[0]: https://collaborating.tuhh.de/alex/latex-git-cookbook [1]: I wonder if the logs are available though? aux, blg etc. are important for debugging and shouldn't be dropped outright.


I don't understand either, I thought that all developments effort went into LuaTeX now and that XeTeX was obsolete... and, for, like, 5-6 years


> Also, `lualatex` does not have some of the limitations of `xelatex` (memory limitations, `contours` package, ...)

Are there some more details of the memory limitations you can share with us?


LuaLaTeX allocates memory as-needed, see section 3.4.1 in the manual [0] (and comments/answers in this thread [1]). Base TeX has an arbitrary, by modern standards low memory limit, leading to a whole class of errors plaguing unsuspecting users [2], and spawning entire extensions to deal with these limitations [3].

This is simply an artefact of times past and has no technical relevance nowadays. LuaTeX allows dynamic allocation, with the available system RAM as the upper limit (so effectively, no limitations in everyday usage).

Now, I could not find a mention of memory handling in the XeTeX reference manual [4]. People are using tricks like `tikzexternalize` with xelatex [5, 6]. Especially the first point makes me think XeLaTeX inherits base TeX memory handling/limits, but I cannot confirm this.

I just know that all my problems disappeared when switching from XeLaTeX to LuaLaTeX.

Lastly, see here [7] for a comprehensive (albeit somewhat anecdotal) list of advantages of LuaTeX over XeTeX. Of that list, `microtype` is another significant functionality I rely on.

[0]: http://www.tug.org/texlive//devsrc/Master/texmf-dist/doc/con...

[1]: https://tex.stackexchange.com/q/7953/

[2]: https://tex.stackexchange.com/search?q=tex+capacity+exceeded

[3]: https://tex.stackexchange.com/a/482560/

[4]: http://mirrors.ctan.org/info/xetexref/xetex-reference.pdf

[5]: https://tex.stackexchange.com/q/438131/

[6]: https://tex.stackexchange.com/q/334250/

[7]: https://tex.stackexchange.com/q/126206/


I’ve been a luatex advocate in the past¹, but I use xetex instead, unless I need the Lua integration. The memory handling is the reason. I find that for documents with a lot of fonts, luatex eats all the memory available and then crashes, taking a huge amount of time to do so, whereas xetex just breezes through the same document.

[1]https://lwn.net/Articles/731581/


What's troublesome for me is that I have been using

* xetex when I needed a font that was not easily achievable in pdftex over the past decade * pdftex for everything else because microtype(TM) just works(TM) (even though kerning can be done using fontspec and font features in xetex).

I've tried luatex multiple times over the past decade, it was mostly just too slow. Now luatex is fast. But I have no idea if I now "should" use luatex over pdftex for best out-of-the-box results or not.

Unfortunately, switching to luatex is not a zero-effort (moving to polyglossia, using fontspec, maybe removing some magic in many-lines private templates, and so on).

For all I know, because I'm always curious and peek at PDF file properties as a hobby (if only to check which cool font that is), basically every scientific paper I read is set using pdftex. luatex usage in the wild is, as far as I perceive it, nil, outside of enthusiast luatex user spheres. I don't think this will change unless texlive drops pdftex (as it still ships ptex and even uptex, it probably won't for a very long time).


> basically every scientific paper I read is set using pdftex

That is only because their templates are years behind the curve and they are slow to update. It is not an argument for the advantages of pdftex, aside from its stability, gained over many decades.

LuaTeX has been nothing but stable for me, so from a technical standpoint, there is no reason not to switch.

As far as scientific papers go, the publishers and editors probably value stability and backward-compatibility (I would).


Officially, luatex is the future. ConTeXt is based on it. I’ve heard that the kinds of problems I’m having are caused by its font-loading routines, and not the core parts of luatex, but without further research that doesn’t really help me.


I'm not sure rewriting Latex in XYZ is a good idea. The whole thing is a mess and a pain to use (and I use it multiple times a week). Sure the documents often look good, but when they don't or there is an error, fixing it just becomes a nightmare. I wish somebody would design a modern typesetting system that would accept say latex equations but without 50 years of cruft and levels of macroses upon macroses.


People have tried. I knew someone who did a term paper on typesetting systems - wish I could find it - and the conclusion was basically "TeX sucks in a lot of ways, but the attempted replacements have always been worse." Knuth actually got a lot of things right and hence TeX's staying power.


So it's the academic scholar's equivalent of CMake?


Matthew Butterick is working on the Quad document processor, which is intended to "modernize the good ideas in LaTeX," but I'm unsure what its equation typesetting is like. It's written in Racket.

https://docs.racket-lang.org/quad/


The final note in the documentation on this software is a quote very fitting of a project re-thinking digital typesetting:

“A way of doing something original is by trying something so painstaking that nobody else has ever bothered with it.” — Brian Eno


Sile appears to still be under active development: https://github.com/sile-typesetter/sile


> I wish somebody would design a modern typesetting system...

In theory yes, but I think the problem is that it will never be adopted. Too much of academia is built on the deep assumption that LaTeX is the only possible format that people write documents in.

I think one possible way out is to start a new from-scratch TeX engine that is built in such a way that a new typesetting language (or "front end" if you like) can be added to it. This may lead to some reasonable slow migration strategies.


In theory yes, but I think the problem is that it will never be adopted. Too much of academia is built on the deep assumption that LaTeX is the only possible format that people write documents in.

You're not wrong, but then again, this is not the biggest thing wrong with a lot of academic publishing, and it wouldn't be the biggest tower that a lot of academics are trying to topple. Indeed, breaking away from TeX and breaking away from the traditional journals and the toxic model around publishing in them might be happy partners.


I don’t like Word but it does accept Latex math and is easy to use.


Word does not use plain text input, and its output looks objectively terrible compared to LaTeX. These two things make it completely inadequate as a competitor.


You've been able to use plain text input for equations since Word 2007 at least: https://www.wikihow.com/Insert-Equations-in-Microsoft-Word


That's kinda like saying that anything typed on a keyboard is plain text. I'm referring to the document source code.


The port was done with the help of c2rust and the following refactoring. See more information at https://github.com/tectonic-typesetting/tectonic/issues/459


I really love Rust (believe it's the best systems programming language out there), so I've checked out tectonic a while ago. In many ways it represents precisely the direction I think Tex based typesetting should go

Sadly I can't use it though as it doesn't support biber which would be needed for Unicode author names: https://github.com/tectonic-typesetting/tectonic/issues/35


I'm no expert to anything typesetting systems, but I've noticed this on the homepage:

> Thanks to the power of XeTeX, Tectonic can use modern OpenType fonts and is fully Unicode-enabled.

...which seems to oppose your statement and the linked issue. Can you explain, please?


That refers to including Unicode in the document directly, which is great that it works, but for references, bibtex is part of the pipeline. You create a .bib file containing info about the references, and then do commands like \cite{} to cite to one of them. If you have Unicode in any of your references (author name, paper name, etc), you have a problem. The solution is to replace bibtex with biblatex and biber, but that requires biber support which currently doesn't exist :(.

Example error that bibtex users can run into: https://superuser.com/questions/60432/unicode-characters-in-...


Cristal clear. Thanks!


Biber is a reference manager, so orthogonal to XeTeX or general unicode support.


I wonder

Does this still depend, even indirectly, at the WEB files written by Knuth?

It has the strange license that says "Copying of this file is authorized only if (1) you are D. E. Knuth, or if % (2) you make absolutely no changes to your copy."

http://mirror.las.iastate.edu/tex-archive/systems/knuth/dist...


…Those version numbers… I love it!

    % Version 0 was released in September 1982 after it passed a variety of tests.
    % Version 1 was released in November 1983 after thorough testing.
    % Version 1.1 fixed ``disappearing font identifiers'' et alia (July 1984).
    % Version 1.2 allowed `0' in response to an error, et alia (October 1984).
    % Version 1.3 made memory allocation more flexible and local (November 1984).
    % Version 1.4 fixed accents right after line breaks, et alia (April 1985).
    % Version 1.5 fixed \the\toks after other expansion in \edefs (August 1985).
    % Version 2.0 (almost identical to 1.5) corresponds to "Volume B" (April 1986).
    % Version 2.1 corrected anomalies in discretionary breaks (January 1987).
    % Version 2.2 corrected "(Please type...)" with null \endlinechar (April 1987).
    % Version 2.3 avoided incomplete page in premature termination (August 1987).
    % Version 2.4 fixed \noaligned rules in indented displays (August 1987).
    % Version 2.5 saved cur_order when expanding tokens (September 1987).
    % Version 2.6 added 10sp slop when shipping leaders (November 1987).
    % Version 2.7 improved rounding of negative-width characters (November 1987).
    % Version 2.8 fixed weird bug if no \patterns are used (December 1987).
    % Version 2.9 made \csname\endcsname's "relax" local (December 1987).
    % Version 2.91 fixed \outer\def\a0{}\a\a bug (April 1988).
    % Version 2.92 fixed \patterns, also file names with complex macros (May 1988).
    % Version 2.93 fixed negative halving in allocator when mem_min<0 (June 1988).
    % Version 2.94 kept open_log_file from calling fatal_error (November 1988).
    % Version 2.95 solved that problem a better way (December 1988).
    % Version 2.96 corrected bug in "Infinite shrinkage" recovery (January 1989).
    % Version 2.97 corrected blunder in creating 2.95 (February 1989).
    % Version 2.98 omitted save_for_after at outer level (March 1989).
    % Version 2.99 caught $$\begingroup\halign..$$ (June 1989).
    % Version 2.991 caught .5\ifdim.6... (June 1989).
    % Version 2.992 introduced major changes for 8-bit extensions (September 1989).
    % Version 2.993 fixed a save_stack synchronization bug et alia (December 1989).
    % Version 3.0 fixed unusual displays; was more \output robust (March 1990).
    % Version 3.1 fixed nullfont, disabled \write{\the\prevgraf} (September 1990).
    % Version 3.14 fixed unprintable font names and corrected typos (March 1991).
    % Version 3.141 more of same; reconstituted ligatures better (March 1992).
    % Version 3.1415 preserved nonexplicit kerns, tidied up (February 1993).
    % Version 3.14159 allowed fontmemsize to change; bulletproofing (March 1995).
    % Version 3.141592 fixed \xleaders, glueset, weird alignments (December 2002).
    % Version 3.1415926 was a general cleanup with minor fixes (February 2008).
    % Version 3.14159265 was similar (January 2014).


For those unfamiliar, Knuth's versioning scheme approaches pi and therefore never reaches 4.0 [0]

> In his seminal text layout system, TeX, and his equally brilliant typeface design system, METAFONT, Donald uses a versioning number system that asymptotically approaches perfection. The version numbers of TeX approach π (the current version is 3.14159265) and the version numbers of METAFONT approach e.

[0] http://sentimentalversioning.org/


You’re leaving out the best part! When he passes (hopefully not anytime soon[a]) the version number of TeX will become pi. Any and all bugs will become “features” at that point. Same with METAFONT; it’ll become e.

[a]: I’d love to get a full and complete set of The Art of Computer Programming (all volumes) written first (on his own time, of course), but I doubt that’ll actually happen. According to Wikipedia, there’s supposed to be 7 volumes, and he’s still working on the second half(?) of the 4th.


Someone may change his name to D. E. Knuth and continue to improve code. License allows that.


Why link to a fork instead of the main project? https://github.com/tectonic-typesetting/tectonic


The reason is that the main project is written in C whereas this fork uses Rust instead as explained in https://github.com/tectonic-typesetting/tectonic/issues/459


How much of that is auto-generated and pre-existing C code though? My dusty recollection from a while ago is that most of Tectonic's development is in Rust. It looks like the fork is trying to get rid of all the remaining C code?


Here is the percentages of different languages in the original tectonic repo: C: 80.5%, TeX: 7.8%, Rust 7.5%, C++ 4.0%, Other 0.2%

And here is the forked version: Rust 85.2%, TeX 7.3%, C 6.2%, C++ 1.2%, Shell 0.1%

So, looks like %80 more rust in this one (auto-generated and refactored)


That fork is quite curious, it seems like they've really gone out and created quite an active little fork, but haven't spent the time to explain what is different and interesting about their fork in the readme.


The about summarizes it:

> Experimental Oxidization of Tectonic the TeX/LaTeX engine.


Does "oxidization" mean "a port to the rust language" ?


Yes, "oxidation" as in "rusting" as in "converting to rust". It's a bit of fun wordplay. :)


...but it's forked from an existing Rust port of Latex.

That's the bit that intrigues me; what is the difference between this port and the original?


If seems to me that upstream is C. https://github.com/tectonic-typesetting/tectonic/



It seems you're linking to wrappers to the underlying https://github.com/tectonic-typesetting/tectonic/tree/master...


As I understand/recall - the intention was always to be a rust rewrite; the starting point was 'let's take all the C and wrap it in `unsafe`'.

The issue linked up-thread introduces this fork (~ a year ago) titling it '[convert to rust] everything' (emphasis mine), and going on to talk about 'pure rust' (again).

I think the main repo's issue at least would've been a clearer submission than the fork; I initially had the same reaction as above.


This “x implemented in y” is a curious thing we are seeing with growing frequency nowadays (imo). On the one hand, good for these folks, to be rewriting something old in a new way (be it language, library, or framework). On the other hand, could this be an indicator that we’ve reached a certain level where it is easier to rehash the ideas of the past than create new work or invent new ideas or applications? Maybe someone with a more classical education can comment on how this manifests in the arts?


Yeah, roughly something like doing cover versions of older songs but with modern, state of the art music arrangement. From what I know people do seem to love that.


The best way to learn a language is to port an existing tool to that language. Whether the rewrite is better depends on how you measure 'better', but what's certain is that someone gets to put 'I know Rust' on their CV.


What’s next is obviously JS or even WASM to typeset on the client side! /s


That should be fairly easy in Rust.


The title is a bit misleading.

LaTeX is a macro package written using the TeX macro system, so I doubt that LaTeX itself is being rewritten in Rust. TeX, created by Donald Knuth, is now over 40 years old. It has a number of remarkable characteristics:

* TeX is essentially bug free. It can be a challenging program to use, but in my experience I was always consoled by the knowledge that if something wasn't working it was my fault (in contrast to my use of other document processing systems and word processors). See [1].

* The TeX program is extraordinarily well documented. Knuth has an entire book which is the literate program comprises TeX. See [2].

* TeX was originally written in a version of Pascal, the source code was written using a literate style and processed using tools the Knuth wrote (Tangle and Weave). This was later machine translated into C.

* TeX hasn't changed. It reached version 3 in 1989 and Knuth essentially froze it at this point. One of its great values is that documents written in TeX will always produce the same output, whether written last century, this century, or next century. We will always be able to see what the original author wanted.

* TeX is unusually capable. Knuth is obviously a perfectionist. Unhappy with the state of digital typography when he wrote TeX he designed an entire toolchain for creating fonts call MetaFont. I used it to design my company's first unique logo back in the 90's. (I only needed five characters.) See [3]. Knuth did this before Postscipt was invented.

* TeX is public domain. Before open source was a thing, Knuth put TeX in the public domain.

* TeX is the Lingua Franca used by Mathematicians, Physicists, Computer Scientists and others for books, academic papers, and even presentations; nothing else comes close.

* TeX has a number of widely used forks, notably XeTeX and LuaTeX. XeTeX added Unicode support and LuaTeX is an attempt to make TeX and it's companion LaTeX more easily programmable. See [4] and [5].

* TeX has a powerful macro system. This is how packages are added to TeX. The most widely used such package is LaTeX itself. LaTeX is not written in C or Pascal, it is written as a complex set of macros.

* Tex's macro system is great for its intended purpose, but not so great for how it is currently used. Knuth originally used the macros to define layouts of books and papers. Things like line spacing font choice for page numbers and such can be easily encapsulated as TeX macros. TeX's macro system is powerful enough that even complex packages like LaTeX could be written in it. Today, there are packages like TikZ/Pgf that are very sophisticated and complex programs written entirely in TeX macros. TikZ/Pgf is a powerful graphics, drawing, and even animation system. It's manual is 1271 pages long. See [6]

Where does this leave the originally linked project called Tectonic? It's not obvious from the github repository, but the user web site is a bit more informative, see [7]. This "rewrite" seems to rely on XeTeX and provides some command line operations that streamline the invocation of XeTeX. I don't understand its value proposition.

For me, what's really needed is a better language than TeX's macro system for writing extensions. A few years ago I was writing TeX code (macros) to format spine labels for the books in my personal library. I used the number of pages in the book to adjust the size and layout of the call number and 2D barcode printed on the label. All of this was quite challenging in TeX's macro system, so I wrote a simple Python program that generated the LaTeX output that resulted in a page of labels that were easy to print. Maybe I could have done it in LuaTeX using Lua, but at the time LuaTeX was so poorly documented (how ironic) that I couldn't figure out how to do it.

The adoption of some programming language that could guarantee backward compatibility indefinitely and a software package written in it that was compatible with LaTeX2e or LaTeX3 and replaced them. This language should not be TeX's macro system. The right language, perhaps something like ML or Scheme, would make programmed typesetting so much easier.

[1] https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.43801907..., behind a paywall but copies are easily available.

[2] https://www-cs-faculty.stanford.edu/~knuth/abcde.html

[3] https://en.wikipedia.org/wiki/Metafont

[4] https://en.wikipedia.org/wiki/XeTeX

[5] https://en.wikipedia.org/wiki/LuaTeX

[6] http://mirrors.ibiblio.org/CTAN/graphics/pgf/base/doc/pgfman...

[7] https://tectonic-typesetting.github.io/en-US/

See Also:

* Wikipedia entry for TeX: https://en.wikipedia.org/wiki/TeX

* Wikipedia entry for MetaFont: https://en.wikipedia.org/wiki/Metafont

* Wikipedia entry for LaTeX: https://en.wikipedia.org/wiki/LaTeX


Currently using this to build my Master's thesis. It's nice because it detects all the packages used in the tex source and downloads them automatically.


What do you need that isn't there already with TeXLive? I haven't downloaded a package in many years, despite many reinstalls and a lot of varied Latex use.


You are not having any issues because TeXLive is everything. The `full` scheme on `install-sh` or `texlive-full` on Debian takes up around 7 GB last time I looked, 2 to 3 of which are PDFs for package documentation.

That is, by a long shot, the largest install a lot of people have on their system. Nowadays, no biggie, but systems like Miktex allow to install-on-demand and of course, it allows you to shrink your installation significantly.

Full TexLive contains a number of packages average "Western hemisphere scientific authors", arguably the largest LaTeX demographic, will never need: Eastern and other languages, music, humanities.


You don't need to install the bits you don't want though, like languages you don't know, documentation (online is easier), music, and humanities. My installs are a bit over 2GB, which seems comparable to a minimal Android Studio install (and a maximal TeXLive install is much less than a large Android development install, if I'm reading things right), and full TeXLive is less than full Qt.

I use the installer from TeXLive itself rather than the distro packages, as you have more control that way. Not only that, you can share everything but binaries between multiple OSes, like Windows and Linux (via WSL).


That means spending time to figure out what you don't need. It's far better to just spend those extra 5 gigs than waste time on this.


Your time must be pretty valuable! It only takes a minute or two, as it's pretty clear which of the ~30 things to deselect.


Unless you deselect too many, in which case you have to go back and try again until you have all the packages you used. It's not much time, but it counts to me.


You don't play very many new games do you?


Being on HN and talking about TeXLive, I was thinking in terms of Linux, for which those 90GB blockbuster games don't exist (?). So I did a mental jump there, because to me TeXLive is the "native" Linux TeX distribution, and Miktex the "native" Windows distribution.


> I was thinking in terms of Linux, for which those 90GB blockbuster games don't exist (?)

https://www.protondb.com/


TeXLive has to me been the obvious choice on Windows for years. Miktex always seemed to be fussy and missing things the few times I tried to use it.


I also switched to TeXLive on Windows recently, because lualatex runs significantly faster there.


XCom 2 runs natively on Linux (Thanks Feral!). Needs > 70GB of disk space with all extensions installed.


I use inotify-wait + tectonic + a PDF viewer instead of a TeX editing tool.

Simple and I can use my own editor.


That sounds like an absolute nightmare.


texliveonfly can do this, too.


Previous discussion:

1 year ago: https://news.ycombinator.com/item?id=21172964

3.5 years ago: https://news.ycombinator.com/item?id=14450448

The project seems pretty active, though, so some of that may now be irrelevant.


Why it may be irrelevant that the project is active? Not sure I understood the point behind your last sentence. Can you explain please?


I think the intent is: the previous discussions were some time ago and may be out of date, so those links may not be relevant.


"That" refers to the previous discussion just mentioned, and "now" confirms it by contrasting with 1 or 3.5 years ago. The point is that it is possible some of the criticisms and problems discussed back then could have been fixed since then (though I don't know, hence "may").


Please do not downvote questions like this (the comment is faded out as I'm writing this). It's a genuine question from a new user, possibly a non-native English speaker, trying to understand someone's comment.


Thank you for the support. Yes it was a genuine question. I am not native English speaker. I was trying to understand the comment.


Probably more outdated than irrelevant.


Getting LaTex running through wasm would be great. Tools like this, which just interact mostly with the filesystem, are a great example of what non-web wasm can do by simplifying the distribution process.


You're in luck (if a year behind :-) https://news.ycombinator.com/item?id=21710105


While the name “Rewriting LaTeX in Pure Rust” is very exciting, there are two problems:

• “LaTeX” is a set of macros (ab)using the macro/text-expansion feature of an underlying TeX engine (LuaTeX, XeTeX, pdTeX, original Knuth TeX, or this one), to provide things like cross-references, section numbering, etc. When you as a typical user use “LaTeX”, you're actually using these macros, along with a huge variety of macro “packages” written by many authors, that together add up to orders of magnitude more lines of code than the TeX engine itself (about 25000 lines originally: https://tex.stackexchange.com/a/505664). Most of the incompatibilities and error messages a typical user encounters when using LaTeX are caused not by the TeX engine but by these packages or LaTeX, which this project doesn't touch.

• This project comes about as:

•— tex.web, Knuth's original source code, written in his literate-programming system WEB (a preprocessor on top of Pascal),

•— extended to xetex.web (XeTeX), to provide Unicode support etc,

•— automatically tangled to Pascal code (macros expanded and constants replaced),

•— automatically translated to C code,

•— now automatically translated to Rust code, by wrapping the C code in `unsafe` Rust blocks.

So it has gone through multiple rounds of machine translation. Actually, I'm impressed that this project has made many improvements in undoing and restoring some of the WEB macros, since the last time this was mentioned here. (Compare a random comparison from 1 year ago: https://gist.github.com/627399d0150e66d211a264bc05b33beb with the same comparison today: https://gist.github.com/71e47f2276f9c6c030efe0e3357ef3bf) — two of the four differences I had pointed out in my comment then (https://news.ycombinator.com/item?id=21177367) are gone, and the only ones to mention are:

• The comments from the original are gone.

• Something like “cur_if:=subtype(p);” becomes “ cur_if = MEM[p].b16.s0 as i16;”

So there's still a long way to go. But even with all that, what would be more useful IMO are (in order of difficulty):

• Translate directly from WEB (xetex.web) to Rust. The original uses only a subset of Pascal so IMO it should be possible to write an automatic translator that preserves more of the original. Something along these lines has been done for translating WEB to CWEB by Martin Ruckert (see https://w3-o.cs.hm.edu/users/ruckert/public_html/web2w/index... and http://tug.org/TUGboat/Contents/listauthor.html#Ruckert,Mart... and his demo from TUG 2020). Alternatively, one could give up on XeTeX and start with the LuaTeX source code (in C).

• Understand how the TeX program is written/works, and write a more idiomatic translation in Rust: something along these lines is in progress as a one-person hobby project by Emily Eisenberg (one of the original developers of KaTeX): see https://github.com/xymostech/XymosTeX This one is most exciting to me, as I also have (not very useful so far) grand plans to understand TeX and to make it more understandable for everybody (https://shreevatsa.net/tex/program).

• As most of the code a user deals with is in the macro/LaTeX layer, modify TeX macro expansion to make it more “debuggable” (https://tex.stackexchange.com/a/384881, https://cstheory.stackexchange.com/a/40282): show full stack traces beyond what one gets with \errorcontextlines=\maxdimen (so file names and line numbers, by tracking the provenance of token lists), show arguments scanned (“passed in”) and expected/found types thereof, cleanly separate the macro expansion layer from the typesetting (line-breaking / page-breaking) layer, maybe even (JIT?) “compile” some of the lower-level or frequently-used macros — the eventual goal would be for the user to completely understand what is going on when they compile their LaTeX document, so that (1) they are less surprised by errors and know what to do, (2) they may be induced (if they're programmers) to do less in the (unsuitable) macro layer of the TeX engine and do more at the appropriate layer: either do it in a preprocessor that runs on the .tex file, or do it at the typesetting layer via something like LuaTeX's hooks (pre_linebreak_filter etc).

Anyway, this project could still end up getting there, so let's see! Interacting with the TeX community (the TeX StackExchange, mailing lists, write an article for TUGboat, etc) might also be useful.


But what is the license? Checking up the repo, I see the license is MIT, but I'm not quite satisfied with it. I mean does LaTeX permits rewrites like this?


The project appears to be based on XeTeX, which is covered by the MIT license.

I don't know the full history of XeTeX, but if it was indeed a re-write of TeX/LaTeX from the ground up (and not based on the TeX source code), then the original TeX/LaTeX license wouldn't necessarily apply. But if it borrowed actual source code, templates, etc., from other projects under different licenses, then it probably gets more complicated.


I think that Rust is a perfect open source language. For the longest time the open source community has shown that it has an issue with innovation, however is great at recreating existing ideas and products. I think that this strength can be combined somehow with the "rewrite in Rust" strength and get great synergy.


> For the longest time the open source community has shown that it has an issue with innovation

How so? A bunch of my favorite software has no real closed-source equivalent, sometime precisely because it’s open source.


The vast majority of open source software is a clone of a closed-source alternative. I currently having issues with thinking of a widely adopted, consumer-oriented software that has no closed-source alternative. What is your favorite software that has no real closed-source equivalent?


I feel you are shifting the goal posts by adding “consumer-oriented". There are plenty of areas where open source is what advances the state of the art. E.g programming languages (almost all are open source), compilers (LLVM and GCC is where a lot of innovation happens), GIS databases (PostGIS is the leader,the commercial ones follow) and in operating system kernels Linux is probably the most advanced while innovation at the same time happens in experimental kernels which are also open source. Plus some of the most complex beasts in software, web browsers, are all open source. Your snark against open source feels so 2005.

Open source dominates quite many areas but has admittedly some issues producing consumer facing products (but there are exceptions like Blender and vlc).


> has admittedly some issues producing consumer facing products

Not just them. Open source community has issues with complex software that needs substantial amount of R&D, UI/UX design, manual QA, hardware design, or pretty much everything else that's not coding.

Multimedia codecs aren't user facing, yet only PNG was invented as open source, the rest of the wide spread ones (gif, jpeg, MP3, MPEG2, h264, h265, AAC, etc) - were made by companies. Open source developers made their Vorbis, Theora and others, were not good enough.


TeX... ?


QEMU, maybe?


how is this widely adopted, think in the scale of Linux or Gimp.


In addition to being useful in its own right, QEMU powers or has powered a number of virtualization projects, including VirtualBox's software-based virtualization, parts of Xen, Android Studio's emulator, …


we are literally discussing LaTeX which definitely has no relevant closed-source equivalent.


vim?


OTOH, there's the hypothesis that people who're in it primarily to advance their programming language or other tool are bound to never advance the state of the art, or create something original really. I think this phenomenon is cast well in terms of Keirsey temperaments [1] used to this date by partnership agencies. In particular, the different characterization of The Engineer vs The Operator (or The Architect and the Artisan) comes to mind.

There are obviously other psychological and sociological phenomenae associated with adolescence, such as peer group pressure [2] and self-righteousness [3], at play as well.

It's nothing new; the "100% Pure Rust" rhetoric is strikingly similar to 90s/00s Java developer attitudes.

[1]: https://en.m.wikipedia.org/wiki/Keirsey_Temperament_Sorter

[2]: https://en.m.wikipedia.org/wiki/Peer_pressure

[3]: https://en.m.wikipedia.org/wiki/Self-righteousness


Will the project offer a matching program to https://en.wikipedia.org/wiki/Knuth_reward_check? :)


Could this unify LaTeX and MathJax? I feel like any rewrite of LaTeX that could address portability would be a huge plus.


Would you care to elaborate on the point of your suggested integration? Are you (seriously) proposing the inclusion of a javascript backend into *TeX distribution?


\insteadof re-write \latex, it \might be \more \interesting to \change this \backslash \filled \syntax.


Does this mean that at some point maybe it'll be possible to just drop it into webassembly?


Is the rendering speed better, or probably not?


[flagged]


>Does anyone know if the Rust community has ever done anything original, though?

Yes, they made an original, better, programming language.


> Does anyone know if the Rust community has ever done anything original, though?

Depends on what you call "original". Is the Servo layout engine _not_ original because layout engines already exist? :)

Moving away from the rhetorical question, I think there are several reasons to rewrite something in Rust, the first and foremost being the huge improvement in memory safety/management. The other reasons are typically related to speed: C wasn't built with concurrency in mind, and trying to develop multi-threaded C applications that run in a cross-platform manner is not straightforward. On the other hand, Rust has the following:

1) Memory safety during concurrency [1]. 2) Multi-threading in the standard library [1]. 3) An amazing package ecosystem revolving around "crates". And there are several build specifically to make concurrency safe and easy to implement. [2]

[1] https://doc.rust-lang.org/stable/book/ch16-00-concurrency.ht... [2] https://crates.io/categories/concurrency


Rust seems to be a lot safer than C or C++ but I don't really understand why a documentation system must be written in a low level language.

Why not use C#, Java, Go or something similar? Most users can't really need extreme performance.


I'm quite sure anybody who has written substantial documents with *TeX disagrees that "extreme performance" isn't needed.


The average document size hasn't increased in the last 30 years, though, did it? But hardware is several orders of magnitude faster.


I guess we're doing more with graphics, which tend to require more fiddling back and forth to make it look alright, as well as having much higher expectations about interactivity than in the past.


In TeX? My impression was that it's still mostly used for papers, and the output is still normally static DVI/PS/PDF.


LaTeX is a typesetting ststem, not a documentation system. Doing typesetting of large documents is performance critical.


If typesetting is performance critical, what would count as a non-performance critical program?


Wow, this looks like a pretty impressive rewrite!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: