Fast Typesetting with Incremental Compilation [pdf]

bradrn · on Nov 18, 2022

I find it interesting that, for every page in this PDF which includes an image, that page is rendered as a raster image itself, including the text, which then becomes unselectable (e.g. pages 12, 14, 15). This seems like a pretty severe bug in Typst to me.

reknih · on Nov 19, 2022

Author here: this is the fault of Adobe Acrobat unfortunately. I converted the thesis to PDF/A using that software and redacted my student registration number. Acrobat apparently did not like my file, therefore it converted many pages to images.

bradrn · on Nov 19, 2022

Thanks so much for explaining! I was wondering what kind of bug could possibly cause this… in any case, it’s nice to know it’s not Typst’s fault. Pity I can’t edit my post any more.

reknih · on Nov 19, 2022

No worries!

tln · on Nov 19, 2022

Agreed, that's not a feature!

It's not every page with an image, eg pages 1, 5, 11 are set with text.

Interestingly the rasterized text seems to use Windows-style subpixel aliasing with the red and blue fringes on the text.

orangepanda · on Nov 19, 2022

Interesting. Page 5 also includes an image but text is selectable there.

bradrn · on Nov 19, 2022

I think the diagram on Page 5 is created within Typst itself, rather than being included from elsewhere — note that it’s a vector rather than raster image.

rustEU · on Nov 18, 2022

Works for me

bradrn · on Nov 19, 2022

Weird… I tried it in Firefox, SumatraPDF and Acrobat Reader, and it worked in none of them for me. (In case you didn’t notice, it’s only specific pages which have the bug, namely those with raster images.)

codetrotter · on Nov 19, 2022

Keep in mind that some PDF readers are able to OCR text in images and thus make even bitmapped text select and copyable. It’s possible that this is why the parent to your comment was able to do so.

bmc7505 · on Nov 18, 2022

https://typst.app/

ivan_ah · on Nov 19, 2022

^ This is the software described in the thesis

Very impressive. Both the markup language and the performance.

a9h74j · on Nov 18, 2022

> (p.7) This thesis was written in Typst.

Looks good.

> The Typst language is Turing-complete, dynamically typed, and not macro-based. It provides capabilities for user-defined, possibly variadic functions, variables, arrays, dictionaries (hash maps), and a module system. A Typst file specifies a document that the compiler should set. Typst combines typical markup and scripting/programming elements into a single context-sensitive grammar.

JadeNB · on Nov 18, 2022

> The Typst language is Turing-complete, dynamically typed, and not macro-based.

Knuth did a lot right, and some things wrong, when designing with TeX, but I think he has said that he regrets allowing Guy Steele to convince him to make TeX Turing complete, and that makes sense to me. Sometimes I want lots of computing power when typesetting a document, but more often I want to know that I can TeX a document sent to me without worrying that I will be opening myself up to unforeseen exploits. There's a reason that good security involves turning off VBA macros in Office documents from an external source!

tiagod · on Nov 19, 2022

Something being Turing complete doesn't mean much. VBA isn't dangerous because it's Turing complete, it's dangerous because it has APIs that allow it direct access to the host system. You could perfectly well design a system that is not turing complete that would still expose the same kind of vulnerabilities.

JadeNB · on Nov 19, 2022

> Something being Turing complete doesn't mean much. VBA isn't dangerous because it's Turing complete, it's dangerous because it has APIs that allow it direct access to the host system. You could perfectly well design a system that is not turing complete that would still expose the same kind of vulnerabilities.

You are right that I was emphasising the wrong part of it, in the sense that Turing completeness and security are partially orthogonal: you can be highly non-Turing complete and totally insecure. But you can not be Turing complete and totally secure (essentially Rice's theorem), and that's the point that I meant to make.

My post also may have misrepresented Knuth's concern (sadly, he didn't explain it to me personally, so I don't know). A more practical concern with a Turing-complete markup language, even on an airgapped machine whose security is of less significance, is that you can't guarantee halting, and it doesn't matter how fast your markup language compiles if you feed it a document on which it doesn't halt ….

zokier · on Nov 19, 2022

> But you can not be Turing complete and totally secure (essentially Rice's theorem), and that's the point that I meant to make.

What do you mean by "secure" here?

> A more practical concern with a Turing-complete markup language, even on an airgapped machine whose security is of less significance, is that you can't guarantee halting, and it doesn't matter how fast your markup language compiles if you feed it a document on which it doesn't halt ….

In practice that is less of a problem; you just run the program with some resource limits and timeout, which admittedly turns the system into state machine from pure theoretical viewpoint.

JadeNB · on Nov 19, 2022

> > But you can not be Turing complete and totally secure (essentially Rice's theorem), and that's the point that I meant to make.

> What do you mean by "secure" here?

Whatever it means to you, literally—you can't decide any non-trivial property of a document in a Turing-complete markup language.

zokier · on Nov 19, 2022

Point I was trying to make is that the types of attacks that can be made purely by computation are very limited, and generally fall in the category of excessive resource consumption, which is something we have fairly decent tools to manage. Or put another way, no amount of computation is going to get a program access to my emails or install rootkit (or whatever) if the runtime does not provide APIs to do so.

JadeNB · on Nov 21, 2022

> Point I was trying to make is that the types of attacks that can be made purely by computation are very limited, and generally fall in the category of excessive resource consumption, which is something we have fairly decent tools to manage. Or put another way, no amount of computation is going to get a program access to my emails or install rootkit (or whatever) if the runtime does not provide APIs to do so.

My point in turn was that it doesn't matter what APIs the runtime intends to expose, as long as it can be abused accidentally to allow that behaviour. But I guess that, if one doesn't trust the API, then an intended guarantee of circumscribed behavior by the program can't be relied upon either.

(But I think that resource consumption shouldn't be underestimated as an attack! Of course, as you say, it can be mitigated by imposing artificial resource limitations, but, as you said earlier, that's not really a solution so much as a renunciation of Turing completeness.)

zokier · on Nov 22, 2022

> as long as it can be abused accidentally to allow that behaviour

That is big if there. Trivial example, canonical brainfuck runtime provides one input and one output stream for the programs; you can easily say with confidence that no brainfuck program is going to open network connections simply because the runtime has no facilities to do so.

azalemeth · on Nov 19, 2022

A very large number of academic publishing sites execute arbitrary (La)TeX as part of their online submission process. The general consensus is that it's safe if you disable a number of primitive commands that give broader OS API access, such as "write18" [1], a TeX primitive that gives you shell access. The good thing about pdf(La)TeX is that it's so old that it's pretty damn battle-hardened and I'm unaware of any recent exploits that use it. If you blow up the stack, it notes that you've done so, and dies. I think it has canaries built in, too, but I might be wrong.

[1] https://tex.stackexchange.com/questions/20444/what-are-immed...

a9h74j · on Nov 18, 2022

Very good point. From the website it appears their editing/live-view environment will be browser-based, so that might help.

rustEU · on Nov 18, 2022

Sounds like a good alternative to Overleaf. I think the future is jupyter style notebooks with formatting and rich text mode. However the developer should look for python interop, at least for the computed expressions part. It would be awesome to generate rich text documents programmatically!

PaulHoule · on Nov 18, 2022

Not kidding about fast... If I read it right it is less than 2 ms to parse and layout a 20 page document!

jahewson · on Nov 19, 2022

I think you’re reading it wrong. Figure 17 says it’s 2.9 seconds, not milliseconds. Slower than pdfTeX which was 2.2 sec. But their incremental mode is faster at 0.6 sec.

kibwen · on Nov 19, 2022

The source code for Typst: https://github.com/typst

chrismorgan · on Nov 19, 2022

Except that Typst itself is not public yet—that’s only a few library pieces so far.

teleforce · on Nov 19, 2022

Yeay, a typesetting system that uses familiar programming constructs instead of hard-to-understand macros [1].

[1]Commentary on "Sile: A Modern Rewrite of TeX"

https://news.ycombinator.com/item?id=33461930