I find it interesting that, for every page in this PDF which includes an image, that page is rendered as a raster image itself, including the text, which then becomes unselectable (e.g. pages 12, 14, 15). This seems like a pretty severe bug in Typst to me.
Author here: this is the fault of Adobe Acrobat unfortunately. I converted the thesis to PDF/A using that software and redacted my student registration number. Acrobat apparently did not like my file, therefore it converted many pages to images.
Thanks so much for explaining! I was wondering what kind of bug could possibly cause this… in any case, it’s nice to know it’s not Typst’s fault. Pity I can’t edit my post any more.
I think the diagram on Page 5 is created within Typst itself, rather than being included from elsewhere — note that it’s a vector rather than raster image.
Weird… I tried it in Firefox, SumatraPDF and Acrobat Reader, and it worked in none of them for me. (In case you didn’t notice, it’s only specific pages which have the bug, namely those with raster images.)
Keep in mind that some PDF readers are able to OCR text in images and thus make even bitmapped text select and copyable. It’s possible that this is why the parent to your comment was able to do so.
> The Typst language is Turing-complete, dynamically typed, and not macro-based. It provides capabilities for user-defined, possibly variadic functions, variables, arrays,
dictionaries (hash maps), and a module system. A Typst file specifies a document that the compiler should set. Typst combines typical markup and scripting/programming elements into a single context-sensitive grammar.
> The Typst language is Turing-complete, dynamically typed, and not macro-based.
Knuth did a lot right, and some things wrong, when designing with TeX, but I think he has said that he regrets allowing Guy Steele to convince him to make TeX Turing complete, and that makes sense to me. Sometimes I want lots of computing power when typesetting a document, but more often I want to know that I can TeX a document sent to me without worrying that I will be opening myself up to unforeseen exploits. There's a reason that good security involves turning off VBA macros in Office documents from an external source!
Something being Turing complete doesn't mean much. VBA isn't dangerous because it's Turing complete, it's dangerous because it has APIs that allow it direct access to the host system. You could perfectly well design a system that is not turing complete that would still expose the same kind of vulnerabilities.
> Something being Turing complete doesn't mean much. VBA isn't dangerous because it's Turing complete, it's dangerous because it has APIs that allow it direct access to the host system. You could perfectly well design a system that is not turing complete that would still expose the same kind of vulnerabilities.
You are right that I was emphasising the wrong part of it, in the sense that Turing completeness and security are partially orthogonal: you can be highly non-Turing complete and totally insecure. But you can not be Turing complete and totally secure (essentially Rice's theorem), and that's the point that I meant to make.
My post also may have misrepresented Knuth's concern (sadly, he didn't explain it to me personally, so I don't know). A more practical concern with a Turing-complete markup language, even on an airgapped machine whose security is of less significance, is that you can't guarantee halting, and it doesn't matter how fast your markup language compiles if you feed it a document on which it doesn't halt ….
> But you can not be Turing complete and totally secure (essentially Rice's theorem), and that's the point that I meant to make.
What do you mean by "secure" here?
> A more practical concern with a Turing-complete markup language, even on an airgapped machine whose security is of less significance, is that you can't guarantee halting, and it doesn't matter how fast your markup language compiles if you feed it a document on which it doesn't halt ….
In practice that is less of a problem; you just run the program with some resource limits and timeout, which admittedly turns the system into state machine from pure theoretical viewpoint.
Point I was trying to make is that the types of attacks that can be made purely by computation are very limited, and generally fall in the category of excessive resource consumption, which is something we have fairly decent tools to manage. Or put another way, no amount of computation is going to get a program access to my emails or install rootkit (or whatever) if the runtime does not provide APIs to do so.
> Point I was trying to make is that the types of attacks that can be made purely by computation are very limited, and generally fall in the category of excessive resource consumption, which is something we have fairly decent tools to manage. Or put another way, no amount of computation is going to get a program access to my emails or install rootkit (or whatever) if the runtime does not provide APIs to do so.
My point in turn was that it doesn't matter what APIs the runtime intends to expose, as long as it can be abused accidentally to allow that behaviour. But I guess that, if one doesn't trust the API, then an intended guarantee of circumscribed behavior by the program can't be relied upon either.
(But I think that resource consumption shouldn't be underestimated as an attack! Of course, as you say, it can be mitigated by imposing artificial resource limitations, but, as you said earlier, that's not really a solution so much as a renunciation of Turing completeness.)
> as long as it can be abused accidentally to allow that behaviour
That is big if there. Trivial example, canonical brainfuck runtime provides one input and one output stream for the programs; you can easily say with confidence that no brainfuck program is going to open network connections simply because the runtime has no facilities to do so.
A very large number of academic publishing sites execute arbitrary (La)TeX as part of their online submission process. The general consensus is that it's safe if you disable a number of primitive commands that give broader OS API access, such as "write18" [1], a TeX primitive that gives you shell access. The good thing about pdf(La)TeX is that it's so old that it's pretty damn battle-hardened and I'm unaware of any recent exploits that use it. If you blow up the stack, it notes that you've done so, and dies. I think it has canaries built in, too, but I might be wrong.
Sounds like a good alternative to Overleaf. I think the future is jupyter style notebooks with formatting and rich text mode. However the developer should look for python interop, at least for the computed expressions part. It would be awesome to generate rich text documents programmatically!
I think you’re reading it wrong. Figure 17 says it’s 2.9 seconds, not milliseconds. Slower than pdfTeX which was 2.2 sec. But their incremental mode is faster at 0.6 sec.