After developing the initial prototype you see in the webpage, I've since gone back to the drawing board. I'm working on developing a firmer foundation for issues like:
I still very much believe in the high-level philosophy, but Nota will look very different within ~6 months. In the meantime, the single coolest development in the document language space is Typst, which I encourage you to check out: https://typst.app/
Also: the next version of Nota will be written 99% in Rust :-)
Not sure the language you choose matters as much as making the API usable by a wide audience. Sure if performance is a real issue then rust makes more sense than JS but I’m not sure that’s going to be hugely meaningful in most use cases.
I’ve never been a fan of Latex despite writing some mammoth documents over the years. Latex always felt like a beast for academics not for business. Yet there’s often things I wanted to do consistently in Word etc. that have never been easy.
Styles can easily become a muddle. Having consistent numbering and bulleting is a pain and errors can easily creep in.
Tracking changes becomes a real problem when you get into many revisions and that often always ends up relying on a level of trust between parties to not override the tracking. I think there’s a killer app in just fixing this issue with a product that guarantees that guarantees all changes are properly shown from the start of a process to it being fully approved by all parties.
Businesses, lawyers etc would love that stuff. Heck if you sprinkle blockchain in you might even get easy funding but I think it’s more of a basic cryptography thing than a blockchain thing - at least it doesn’t need that level of complexity.
A moment of imagining how that would influence the market value of certain skillsets should easily cure you from that surprise ;)
On the other hand, legal systems have effectively been doing the equivalent of git since basically forever. There have been very few law books written from the ground up. All other law authoring, be it by kings, priests, dictators or parliaments, was I the form of diffs to an existing codebase.
The Common Paper app[1], though not quite a git workflow, has always struck me as being pretty close to how an software engineer might approach contracts:
1. An immutable set of standard terms, with variable references.
2. A collection of cover page variables, that modify the standard terms by reference.
3. A structured negotiation workflow, where users "propose changes" to the cover page variables with automatic "diff-ing" (redlining).
It's not a product targeted to software engineers, but has always appealed to me as a way to sneak in some engineering best-practices into the world of lawyering :)
Nisus Writer Pro [0] has been around for 40 years this year IIRC (IANANWP) and has a user base who can vouch for many of the features that HN readers want something to offer.
One of the interesting things that I discovered while working on some legal papers with a lawyer were that legal documents don't have copyright protection. Lawyers regularly copy and paste from other lawyers work. I suppose, that since legislators most often have backgrounds as lawyers, they legislated rules for themselves that are not the same as the rule the rest of us have to follow.
I am not a lawyer so I don't know that anything in the previous paragraph is true; it's just based on a recollection of something I was told once a long time ago.
1. That's an overstatement: For "original works of authorship," copyright happens automatically upon "fixation" in a "tangible medium of expression" (e.g., saving to a file, maybe even just typing). [0] And it doesn't take much "original ... authorship" to qualify for copyright protection.
2. Here's A hypothetical example: Alice drafts a contract from scratch as Version 1 and saves it to a file. It's copyrighted; on these facts, Alice owns the copyright. [0] Then Bob takes Alice's Version 1 and modifies it to create Version 1.1: Bob's "original" contributions to Version 1.1 are themselves protected by copyright, which Bob owns, bu with two caveats:
(a) Bob has no claim to copyright in Alice's Version 1; and
(b) Bob's own contributions to Version 1.1 won't be protected unless one or both of the following is true: (1) Bob had Alice's permission to base his "derivative work" on Alice's Version 1; [1] and/or (2) Bob's use of Version 1 qualified as "fair use" (a complicated question in itself). [2]
Style and substance separation is easy and should be a requirement. Legal is pure text "programming" and what I mean is that the style of the text has zero bearing on the judicial process.
The benefits of working at the proper level of abstraction compound. It enables tech like diffs and git, which then nicely solves a bunch of other problems as well. Using Word completely side-steps all those benefits. Sure, you get a few nice buttons, but that's literally it. You are trapped forever with no way forward.
This feels like actually programming in Word and manually highlighting comments to be green or something. It's a travesty IMO.
Of course this isn't and hasn't been true for quite some time. I'm the first to blast MS Word for being a total disaster (esp. templates, ie. style/substance separation, are bad) but it is no longer a locked-in platform. Even the docx format is only a zipped XML file. If you want, you can unpack the document file and put it into git. Thank you Open Document Foundation!
On top of that, all contemporary word processors I'm aware of have, of course, versioning with diffs. It is just different than git (or other programmer tools.) Just as you are using your tools of your trade and don't know much about MS Word, lawyers use their tools of their trade and don't know much about git. It's like saying that editing POs is superior to Trados, because for a programmer it is but a professional translator is going to tell you a different story.
(Of course, everybody everywhere should be using LaTeX for fine-looking documents in all circumstances. No argument here ;))
My point is not that. Sure, you can go from Word to OpenOffice. Great, now you manually highlight your code in that..
It’s a deeper thing. You can hack Word and related tools for coding and eventually it is acceptable I guess, but it’s starting from the wrong foundation.
This ladder will never reach the moon.
Word’s diffs are not “just different”. they are objectively inferior in many ways. I personally witness daily the travesty of government staff’s handling of information.
Word is a fancy digital typewriter and IMO it’s the wrong abstraction for this day and age and cultural issues are the only thing keeping us back. As always.
Edit: academic papers looking like they were written on a 19th century typewriter.. I don’t get this fascination with style, from scientists of all people. Lay down the info, provide the data. Kerning your fonts properly.. oh my god, I need to cool down. I am a hot headed type of guy, sorry about that.
Hey, thanks for the reply. From what I read, I think you think that language use is some kind of coding with words where you have a deterministic relation between input and output. You seem to treat the semantic content of a statement as if it is somehow static, objective, and oberservable. I don't think it is and I'm in good company on that matter.
That being said, that's just my reading of your comment and I could be wrong, which is kind of my argument here. If I'm right lawyers don't care about your notion, they use language for something different than mere information encoding. Therefore they need tools that support their use case. MS Word (or word processors in general) might not be the best tool for that job, but it is good enough. Integrating a well trained ChatGPT into MS Word will help lawyers much more than any structured entry form ever could.
BTW, the LaTeX quip was intended to make light of the idea of separating content and style, which goes way back. Consider TeX' age. Your reaction tells me, you think LaTeX is a styling tool, which in a sense it is, and that's what it is about, which it is not. Hordes of scientists (and type-setting professionals) argue in favor of LaTeX (or other type-setting systems) because you just write the content in plain text. LaTeX takes care of the style. TeX files are also just markup and easily git'able. It does make life easier, but it is not as important as some people make it out to be.
Thanks for indulging me. I know I am yelling at the clouds.
I also know people usually misunderstand me because I am a “programmer” and all I see is “code”. I guess that’s fair enough, but I fully understand legal being of a completely different nature from Rust.
What I also understand is that no matter how long everyone argues about it, the only thing that matters about legal is the text. The font, the styling, etc is all secondary. It might be important, but it’ll never be primary. Unless courts start judging differently based on page margins I guess.
The same goes for science. Publishing “attention is all you need” in an 8bit NES font might not be fashionable, but it does not and cannot detract from the discovery within it. LaTex produces the exact same documents (I know it is configurable but we are going for a certain style) and that’s what this is about. Not how the tools work but that we fundamentally even care about it instead of focusing on the primary issues like correctness, openness, accessibility. I’d like academic papers to be APIs actually.
Again I see the importance of styling and appearance in general. It’s just that we start with that and I think that’s problematic and actively harms our progress.
Also, to conclude, I am nitwit. This is just my take.
Edit: A man can dream, right? If a paper was plaintext I could typeset it last minute in 8bit NES fonts if I’d be so inclined. I hate ya’ll deciding how everything looks and works. I know that’s technically challenging, but to me that’s where the progress is. An academic paper like, say, a jupyter notebook would be awesome, not? Would you give up your fancy type setting? I would!
If you are a nitwit, I'm one, too. Don't worry. I think I get your take. You say the important part of legal and scientific texts is their content, not their form. And I agree. But that is not where we started. We started with (paraphrasing) "programmer tools like git are superior to MS Word, therefore lawyers should use git." There, I disagree.
This 100%. It does get interesting when you get into non-plaintext things that have to somehow integrate into plaintext systems (git managed codebases). We've kind of left it up to CMS systems to handle the non-plaintext bits but this leads to many more orthogonal process problems.
IMO, I think it really comes down to finding a universal mechanism for diffing and 3-way merging things that aren't plain text (document diffing). I think distributed version control can be universal (at least on a data level), how an application renders a meaningful diff for a specific task is incredibly subjective to the document type and task at hand. My point being that I completely agree that plaintext makes a whole lot of sense for programmers and pretty much nobody else. However, distributed version control does not have to be confined to plaintext, it's just tricky to see when all the version control systems we're familiar with are plaintext ones.
Git is popular because it's linear, and the linear paradigm usually translates well to serial things such as programs, instructions, document sets, etc.
It's actually bad at non-linear stuff, which you will have noticed if you have ever been working with hierarchical formats, especially e.g. xml or nested JSON.
Word is bad for a whole litany of reasons, but the reason it can't be easily versioned (atop the format being a literal Goldberg machine requiring inane transforms to properly) is that it encodes a bunch of non-linear formatting instructions. Sure, we can sort-of reason about this stuff e.g. with a hierarchical css+html+js structure, but without a way to render that I challenge you to be able to simply diff that information. Seeing "bold" or "blue" seems simple enough, as long as you also know to which elements it applies and in what layout. So, suddenly you can't reasonably diff the css file without also difficulty the html.
For programmers, we are used to reducing things by their dimensions into fairly linear spaces, this then helps us reason fairly linearly about changes, but doing this from any other context is challenging. Lawyers e.g. perhaps focus on the relations between various clauses, so linearizing their document flow is not very important to them, at least when there exists methods to diff the general textual content without investing much in how they are doing that.
As programmers we see the similarities to editing a code base and that excites us, however we do have a tendency to go off and write frameworks to parse and simplify these things, without ever actually bothering to learn to apply these things. This is not invaluable, but it's a different focus, which maybe explains why lawyers are not in the habit of using git.
> Sure, we can sort-of reason about this stuff e.g. with a hierarchical css+html+js structure, but without a way to render that I challenge you to be able to simply diff that information. Seeing "bold" or "blue" seems simple enough, as long as you also know to which elements it applies and in what layout. So, suddenly you can't reasonably diff the css file without also difficulty the html.
We’re in complete agreement. But you can do this, you just need to provide a “renderer” and a schema that describes how your tree structure should merge or conflict. If you want to test out a weird version control for structured data, my email is in my bio.
You are correct. But this is a culture issue. Culturally legal folk don’t see what they do as programming so they use different tools and work their processes their way. This is advantageous to outsiders who see through this! ;)
I think the main benefit would be to be able to represent yourself in court... otherwise there currently exist certain pratical and ethical hurdles to capitalizing into this, such as passing a bar exam (non-trivial), providing credentials, operating in the best interest of your client/society...etc.
We used to have word processors that exposed mark up. I wrote immense amounts of documentation in Wordstar on 8 bit machines and it was definitely more efficient than the WYSIWYG word processors that came later and faster even when the newer ones were running on much faster hardware.
Something like Wordstar would be better than MarkDown.
Markdown isn’t detailed enough for legal stuff. Internal references, tables, complex section numbering require extensive post processing or simply don’t work. You quickly wind up with a lot of hidden magic that frustrates people used to word.
Last time I lost patience with doing Legal stuff in word and evaluated alternatives, I was most optimistic about Asciidoc. Unfortunately the ecosystem was relatively anemic… the strong syntax was limited by the tooling.
Looks like there’s been some improvement, maybe I’ll try again. There’s a nice new homepage at least: https://asciidoc.org/
The IntelliJ AsciiDoc plugin is a little juwel with all bells and whistles, Syntax highlight, Preview, structure view, even refactoring of references. We use it together with Antora.
Lawyers have gotten rid of secretaries and so spend time (and billing) futzing around with document formatting, fonts, margins, bullets, numbering, autonumbering and the like.
To me, that just calls for a sensible UI with attractive styling and interaction over a git backend for the heavy lifting of tracking changes through time.
A lot of successful products have been built in this way. I've seen developers get upset with Apple for making successful products out of just giving a nice UI to a piece of open source tech that does the heavy lifting. Like it's cheating.
You should research what happened with distros and UI systems...open source was building lots of nice UIs and you could even have them on Windows/Mac, but the was constant drama over the "right" way to code a UI framework...which led to a ton of fracturing and leaning back towards minimalism (because the nice stuff was always very heavy).
This even happened with Microsoft, they had so many false starts and changes in messaging that they killed their own portfolios. I suspect at least that is why they "embraced linux" because it was excellent at web, and web wasn't busy changing every month (it has been, but that's a different story).
Apple introduced Swift but besides new Xcode versions I get the general impression their tooling has been far more stable.
Actually the little known "Review" feature of word allows to visually track approve/reject/comment collaborative changes over a document in a really user friendly way, not need for git here
The standard for legal docs is to redline changes with an additional tool, because you don't necessarily trust the other contributors. They have decent tools for this, and the system works ok I suppose. Editing tends to be in tic-toc fashion anyways so I guess it works. You could do someting like this with git and a markup language, but I don't know you'd convince many lawyers that the squeeze was worth the juice.
I don't know the names. They all seem to have the pro Acrobat stuff, but more often use something also bundled with search tools perhaps? Communication between lawyers on opposite "sides" often seems to be by PDF, not source (although sometimes that too) so I imagined they both have working docs kept separate because they don't want to to share some of the markup/comments. I asked one of them about that they claimed that using clean pdf output (no metadata or history) was worth the extra hassle as it avoided costly errors.
Anyway that's my limited experience having dealt with a bunch of them - no expert.
It's never pdf. You can't easily make corrections on a pdf, never mind major revisions (such as moving sections around). If someone sends me a pdf I ask for a Word document, or convert the pdf to Word myself. Sending someone a pdf is a little like saying "fuck you."
Confetiur.
Collegial lawyers don't send each other pdf's. They are impossible to mark up. One "innovation" I have seen is with banks. They do not want their employees to be creative; edge cases don't exist, everything is binary.
So the banks issue grids/tables containing a list of questions. The answers are found in the corresponding place on the table. Imagine 7 columns, all containing binary answers: "yes" or "no."
Except, everything is not binary.
So the 8th column contains, I dunno, 500 words, a minitable, etc., running over page after page. The other columns on these runover pages are blank.
And these are all pdf's.
> Not sure the language you choose matters as much as making the API usable by a wide audience.
Fully agree with this, and having typeset my masters thesis and later my resume using LaTeX, I think that the “authoring experience” is definitely the place to focus on improving, LaTeX just takes too damn long to get something good.
If you’re interested in the “markup to document publishing” space, you might also be interested in the open-source report publishing tool I’m now working on, Evidence.dev (https://github.com/evidence-dev/evidence).
It’s similarly based on markdown, though uses code fences to execute code, HTML style tags for charts and components, and {…} for JavaScript, i.e.
---
title: Lorem Ipsum
description: dolor sit amet, consectetur adipiscing elit
---
```sql petal_vs_sepal
SELECT
petal_length,
sepal_length
FROM iris_dataset_table
ORDER BY 1 DESC
```
<ScatterPlot
title="Petal vs Sepal Length"
data={petal_vs_sepal}
x=petal_length
y=sepal_length
/>
The longest petal in the dataset is {petal_vs_sepal[0].petal_length}.
Our design philosophy here is that the rendered documents should be beautiful by default, but highly configurable so you can get pixel perfect results.
We’re also aiming for first class output options for desktop, mobile, PDF and image export.
I always disliked that it was so difficult to interact with Word if you wanted to create automated documents. Instead I'd love it if there was a developer-first experience to create standardised documents from nice looking participation certificates, invoices, memos, documentation up to multi-tome histories.
As an academic, 99% of my time is spent doing two things:
1. Writing statistical computations using a language like R or python.
2. Writing English text.
The most important thing about a document language is that it should prioritize those things. For example, here's why Rmarkdown/Quarto is better than TeX. A TeX document starts:
---
title: "Natural selection in the Health and Retirement Study"
author: "XXX"
abstract: |
I investigate natural selection on polygenic scores
in the contemporary US, using the Health and Retirement
Study. Results
partially support the economic theory of fertility as
an explanation for natural selection: among both white
and black respondents,
scores which correlate negatively (positively) with education are
selected for (against). Selection coefficients are
larger among low-income
and unmarried parents, but not among younger parents or those with less
education. I also estimate effect sizes corrected for noise in the
polygenic scores.
date: "September 2023"
You are comparing apples and oranges, at least a bit. The latex equivalent is
\documentclass{article}
\title{Natural selection in the Health and Retirement Study}
\author{XXX}
\date{\today}
\begin{document}
\begin{abstract}
I investigate natural selection on polygenic scores
in the contemporary US, using the Health and Retirement
Study. Results
partially support the economic theory of fertility as
an explanation for natural selection: among both white
and black respondents,
scores which correlate negatively (positively) with education are
selected for (against). Selection coefficients are
larger among low-income
and unmarried parents, but not among younger parents or those with less
education. I also estimate effect sizes corrected for noise in the
polygenic scores.
\end{abstract}
...
\end{document}
Everything else you have there in your preamble is about either adding capabilities or changing formatting, you don't show how that is achieved in the other markdown.
I think I get your point, but in practice that part doesn't really get in the way, and if you are doing the same thing over and over (e.g. for the same publication) it's just a template anyway.
I don't love Tex/Latex, but most of the other markdown comparisons that emphasize "it's simpler" are because they can't do as much. Which is fine until you need some of that capability.
It's absolutely true that you may need to customize things. And then you are stuck with the big quarto disadvantage: debugging a toolchain that typically looks like
quarto -> knitr -> markdown -> pandoc -> [tex -> pdf | html]
and not knowing exactly where the error came from.
At the same time, the markdown defaults produce a nice, readable paper. The TeX defaults get you something that reminds you of Rubik's Cube and Duran Duran.
Is Lyx still around? I remember it had good defaults. Haven't used in ages and it had some installer issues but I got fairly comfortable writing latex papers without learning a ton of latex...
Ofc that was a major downside, something other markdown editors figured out - if you give people buttons that make it easy and you make it easy to learn, they will learn what they need.
I don't get the problem. If 99% of your documents need the same packages and formatting, then all you need to do in LaTeX is create a template (eg via Yasnippet in Emacs) or dump it all in a LaTeX class file and then import it in your frontmatter, and Bob's your uncle. There are many frustrating things about LaTeX but I don't see how this is one of them.
Probably that's true. But first, I don't know how to create a class file, or a template (if that is a LaTeX thing). And since I've never seen anyone else do this, I guess that most academics don't either.
Second, my point isn't just about the specific issue, it's that this issue reveals how TeX thinks about the world. It thinks you want to spend your time writing TeX. No, I want to spend my time writing English. Here's another example. This is how you embed an image in quarto - it's just markdown:
I understand your frustration. Maybe it helps to know where this problem comes from.
TeX is extremely powerful and lets you create arbitrary documents. This is the first time I heard of quarto, but apparently it makes a lot of choices for you that you understandably don't really care about.
Instead of developing quarto, one could have simply written a LaTeX class that defines a function like so:
Of course, it is now much less flexible, as you cannot define a custom label or different placement instructions. But that is the price you pay for short and memorable syntax.
By the way, developing a LaTeX class is not necessarily hard. It is more or less a file whose name ends in `.cls` with all the commands that you typically put in your preamble. It just needs a header of three lines that define some meta data and also supports options. See here for an example: https://github.com/latex-ninja/colour-theme-changing-class-t...
You put it in the same directory as your main tex file or in the system wide TEXMFHOME or user-specific TEXMHFHOME.
I keep a directory called LaTeX inside my home directory. Inside that I keep a file with all my frontmatter, myfrontmatter.sty (technically a package rather than a class), and also my biblatex file and a scan of my signature for signing letters. When I start a new LaTeX document I add the line \usepackage{/home/nanna/LaTeX/myfrontmatter} to the top (note, no .sty). This keeps my frontmatter minimal and tidy.
Inside myfrontmatter.sty:
\NeedsTeXFormat{LaTeX2e}
\ProvidesPackage{/home/nanna/LaTeX/myfrontmatter}[2015/01/01 by me]
\RequirePackage{amsmath} % Just replace `usepackage` with `RequirePackage`
\RequirePackage{amsthm}
...
\addbibresource{/home/nanna/LaTeX/biblatex.bib}
...
%% Macros like for inserting my signature
\newcommand{\mysignature}{\noindent\includegraphics{/home/nanna/LaTeX/signature.png}}
...
\endinput % Not sure if this line does anything?
And that's it. I never have to worry about a package I've forgotten to add in. Granted a journal might not accept my custom package but I can always just copy and paste it all into my frontmatter, minus the top two lines and replace all the RequirePackages with usepackages.
That is just an expression of the LaTex problem though.
People (as in, the majority of people) will not be comfortable using a tool that is so unintuitive and hard to use that you need to use an AI to help you in writing.
Writing a document is not supposed to be hard and require assistance to do.
The problem is that your goals and skills don't match the purpose and capabilities of the tool, not that the tool is insufficiently "intuitive".
Manuscript composition used to be: write your document by hand or with a typewriter, handwrite some notes in the margin, throw in some pages with your figures on them, then let a professional typesetter take care of all of the technical details of making a typeset document for printed output. This was a whole separate career, and the typesetter would sink almost as much time into making your document look pretty as you put into writing it.
If you are using LaTeX, you are taking on the role of the professional typesetter yourself, and you need to make some specific technical choices to get some output from it. This can be a problem if you are inexperienced and don't know which choices to make or in a hurry and don't want to make any choices, but is also good insofar as it lets you actually produce a professional quality document if you have the time and expertise to do so. The difficulty involved is at least an order of magnitude less than doing composition of metal type.
If you are using markdown (or whatever), you are just punting on having a professional document at the end, and/or letting a system make all of the choices for you (often badly), or perhaps expecting to still hand off your document to a professional at the end for proper typesetting.
Thanks for the help and I can feel the enthusiasm. I have to tell you, my hatred for TeX is profound and goes far beyond this one point. But if I start ranting, I'll never stop.
Didn't you say with quarto you had to debug a 5 layer pipeline? I wonder if it's not biasing you here a bit...(stuck fighting "arcane" latex syntax somewhere at 3 in the morning).
I'm not saying you should love TeX, but it's a bit like saying you hate assembly language - if you have the wrong abstractions (writing a 3D game or a web page using assembly language) of course the experience will be beyond frustrating. I don't hate assembly language, but I generally don't need to touch it because higher order abstractions generally suffice. If I am optimizing my compiler output, though, then it's a tool I can use.
Ofc if I have the wrong or missing tools while using assembly language, or any other TBH (python, html, etc), that is also a source of considerable frustration. Not sure where the "hatred" comes from, but perhaps you encountered a poorly done package or editor?
* "It gets in the way. If I open my article in a text editor, I want to see the title, author, abstract and first paragraph."
* You feel forced to use it. " One idiot reviewer told me, as a supposedly legitimate critique of my work: “this wasn’t written in TeX”. PhD students are forced to learn it"
Some of your points about escape codes etc. seem contrived, and the alternatives are just more buttons hidden in menus or a bunch of search. I'm not defending \’e but TBH that seems pretty darn logical compared to e.g. hitting a keycombo and searching through a million characters in a unicode table.
The first is a matter of a good editor, the second is true of any system you can be asked to use. I've been forced to use some pretty miserable programs in the past, so I can commiserate.
Hmm. As a PhD student you feel constrained to do something in a way you don't want to. I don't think that's a good enough reason to say something sucks, though. Why do PhD students use it? I'm not one, but I can suspect an answer - print journals. You can/could submit a TeX article and have it relatively seamlessly typeset into a for-print journal.
I don't work with print - I imagine if I did I might seriously consider TeX, and if parts of it legitimately sucked, seek to replace or modify those. Markdown is good enough for the majority of what I need to do. Microsoft Word is my "useless disease program" that all good "computer workers" are forced into using, and they become proud they know how to use templates and bullet formats. For what I typically use documents for, I would tell someone to use TeX before I told them to use Word, a quirky mess of program filled with bugs from a toxic work culture (and I don't need TeX so I don't use it).
Because doing something like this is opinionated. The LaTeX developers try to do the opposite: they want to provide packages that cover as many usecases as possible.
And users of LaTeX are probably not knowledgeable enough or too busy to publish their opinionated subset of LaTeX as a class. I don't know for sure. There is no central body that has an interest in removing barriers, so you might as well ask me or yourself why I or you haven't published anything.
I developed something similar to this at my company, because we write lots of LaTeX documents and need shortcuts like this not only for brevity but also so that we achieve some consistency across the entire team. It's only for internal use though and thus not public.
As I said: Maybe you want to specify a different placement option like `b` instead of `t`. Maybe you don't want the label equal to the file path, in particular if you happen to include an image twice.
Or you don't want it centered. Or you want several images in one figure environment arranged in some way. Or you want to scale the included image to fit only 70% of the page width. Or maybe you want to rotate the included image by 90 degrees. Many more use cases are imaginable.
You could define the macro to cover all of this, but then you might as well just write the original code.
And as I replied: how does any of this *prevent* you from "cover[ing] as many usecases as possible" given that writing the original code is always an option?
The first (namespaces) contrast speed (brevity) and specificity (verbosity). Namespacing (long names) are great when you don't have a good idea the full scope - putting your library in namespace means you generally never have to be concerned that terms will overlap.
Macros tend to be the opposite - short, in global namespace, with little flexibility. They constrain the possible output by focusing on a certain goal - this is the point of the macro usually, to avoid some common pitfall without being overly verbose.
If you combine the two, you have a problem - either the macro dictates your preference, or many different macros must exist which, usually being brief or in pursuit of similar goals, will result in namespace collisions.
So of course you can define them, but generally this is better suited towards a specific use case or organization.
Or at least that's the idea. I'm not sure some standard approachable macros are a bad thing, as long as you can redefine them later or pick and choose, but I do get the point of "why not", because they don't want to constrain the language itself. The irony is other systems do this for them and tex becomes more an outsider thing and systems like markdown are far more constrained...
On the other hand there is no way to make everyone happy, and I respect the Tex author decision to opt for a less-is-more strategy wrt to offering building blocks only not templates.
You are of course free to build your template library, and make it popular, but be forewarned that likely whatever it is popular at doing (say, writing science articles), will affect what TeX is used primarily for. Which is the "why".
This is not the same thing! The LaTeX equivalent to your markdown would be
\includegraphics{path/to/image.png}
which is arguably simpler and cleaner than the markdown. The figure environment is unnecessary when you just want to put a figure right there. You only need the figure environment when you want your image to "float" to a random place in your page.
> You only need the figure environment when you want your image to "float" to a random place in your page.
Which is also something the Markdown version can't do at all (give fine control over how the image is positioned). You have to use raw HTML plus probably some CSS if you want that.
I can tell you as another academic i almost always get links and images wrong in Markdown (which of title URL is in square which in round brackets, forgetting the !, the conventions around file paths (some Markdown processors need file://...). Admittedly if I would write always one Markdown style I would get used to it, on the other hand I never get it wrong in latex and let's not even talk about how to do different alignment, captions and labels.
> I don't know how to create a class file, or a template (if that is a LaTeX thing). And since I've never seen anyone else do this, I guess that most academics don't either.
Exactly, academics usually don't do that - they write the text with appropriate markup, and then put it in the publisher's template and the formatting according to the appropriate standards is done. You can write your own template, but usually you use someone else's, with the big benefit that you can generally move your content to a very different template of a different publisher with minimal or no changes to your actual writing.
Now how would I do that in quarto - what (and how much) would I need to write to ensure that, for example, the captions for all the images and all the references to the images are all formatted in a specific manner? Because for quarto I would need to make my own template specifying the exact formatting and layout, and a quick browse of its documentation didn't lead me to any examples on how I would control that.
In sane environments there is a split between text and formatting, however, the formatting part has to be sufficiently powerful to meet the various requirements, so there is a certain quite high minimum bar to meet there. Latex works because I can rely that I will be able to easily get my markup laid out exactly as required by arbitrary standards, for any markdown-type standards I need some assurance that this will be possible and easy, that I won't need to (for example) go over all my references and do something to them.
Again, apples and oranges. Yes it's more markup than e.g. markdown (which is fundamentally less capable) But how do you do the equivalent ot the [t] and \centering in the former on a per figure bases? what about scaling it differently from other figures in your doc, or embedding a reference in caption with a particular style?
For that matter your equivalent is still one line, it's just \includegraphics{path}. The figure environment is just adding extra capabilities.
I agree not everyone needs to do this, but the trade offs you are illustrating are not "X is better than Y" so much as "X is simpler than Y, and can't do as many things"
For you that trade-off makes sense, great. But I wouldn't generalized it to the value of the tool. I know plenty of academics who are quite proficient at Tex, let alone the simpler Latex, and find it lets them generate the content they want easily enough, given it's power.
This isn't just mathematicians either, though most of the people I know using it came to that out of a need to do math typesetting properly. How would you for example generate a mixed language document with both left-to-right and right-to-left languages formatted correctly?
LaTeX's real problem isn't the syntactic load (easily handled with a decent editor) it's the package system. It can be abused to e.g. generate conference posters well, but it's hairy once you get into the details.
That’s a great of the tradeoff. On the surface the latex version looks harder, but you can specify a caption, how the figure floats with other items, how it’s justified, the zoom level, you can add a reference label that you can hyperlink to from elsewhere in the doc, etc etc etc.
The markdown one you get what you get. Maybe that’s fine. If it isn’t you are out of luck.
The latex one requires more of you but gives you much more functionality in return.
Which is better is going to be entirely situational/personal preference.
\documentclass{article}
\title{Natural selection in the Health and Retirement Study}
\author{XXX}
\date{September 2023}
\begin{document}
\maketitle
\begin{abstract}
I investigate natural selection on polygenic scores in the contemporary US, using the Health and Retirement Study.
Results partially support the economic theory of fertility as an explanation for natural selection: among both white and black respondents, scores which correlate negatively (positively) with education are selected for (against).
Selection coefficients are larger among low-income and unmarried parents, but not among younger parents or those with less education.
I also estimate effect sizes corrected for noise in the polygenic scores.
\end{abstract}
\end{document}
Makes me wonder if Nota did work with Rust, then we could move the dependencies into a cargo.toml file and compile our documents. That would enable declarative macros for document generation at compile time as well as interactive rust code inside your document at reading time. Plus you could refer to the dependencies by their package name like amsmath::line or something
It would be interesting to see if Nota could solve one problem that TeX and LaTeX, while theoretically capable, don't really solve in practice. Namely, the ease of styling according to dumb external requirements. Just to give you several examples:
* Very tight, but very loaded layouts like A0 conference posters
* Apply a national standard, such as, for example, post-Soviet GOST documentation styling standards
* All combinatorial explosion of bibliography styling requirements in different international traditions
* Make the documents look like a default style in so and so MS Word version
* Precise positioning of one picture upon another, or text upon a picture for quickfixes in the papers
* Be able to consciously tweak any of the above
The problem with tex universe solutions here is while technically all of the above is possible, in practice it requires some black magic far deeper than a lay person (even with a scientific degree) wishes to dive into.
Wow, Nota looks good. I created something similar I call "Literate Markdown"[1], a play on Knuth's "Literate programming" concept. My focus from the beginning was interleaving computation with explaination. It's been a lot of fun to work with for the last few months, for example to explore ideas around SVG animation[2] or flesh out a novel datastructure/algorithm[3]. Also, its about 500 lines of legible node code that is the entire server and markdown processor with very few dependencies, no transitive dependencies and it does not require a compilation step. By default the server allows no 3rd party resources, or cookies, of any kind because why not. It's MIT licensed on github. I want to move it to an organization repo instead of my personal repo and clean up the repo in the process, which is happening today.
Curiously, all these problems, and more, have been reasonably solved in Org Mode. Sadly, too few people know about it because too few people use Emacs.
Can you add a subheading in an Org mode document, then pop out of that subsection to get back to the previous heading? E.g.:
* Foobar
Foobars are great.
** Warning
Foobars are not to be used with Booms.
Foobars are great for reading, writing, and flying. This text is outside the Warning subsection.
You can, just like you could do in Markdown. You can use inline TeX and get the right overall format with a template. I haven't actually done this for a paper, but I used Pandoc to typeset a textbook with lots of math and code and it worked well. We could have used org-mode just as easily, but Markdown was already familiar to my non-Emacs-using coauthor. (Hey, we all have our faults!)
Yes, and you can use Pandoc to export to different formats, including Epub, HTML, Docx, etc. You can embed LaTeX and customize in multiple ways using Lua filters for Pandoc. I've found Pandoc to be more powerful than Emacs when it comes to writing documents in Org mode, while Emacs is more powerful when you use Org mode to do literate programming. YMMV
If you want another data point, I made my own markup language a while back that aims for markdown-like simplicity while making it easy to define simple macros and operators: http://breuleux.github.io/quaint/
It's pretty extensive. I still use it for my own writing, although I'm probably the only one.
Do you have an example of a document where it’s special features are more heavily used? Most of the posts I saw were basically just paragraphs and italicized sections.
You should definitely look more into restructured text btw. It lets you build documents for many different formats, it has a nice way to reference sections of documents, add code support, and seems to have all the basic features you need. It is very similar to markdown but writing something in restructured text means you can output in just about any document format you need (its much better than markdown or html, imo.)
Your question about content and computation is difficult. When I was writing docs for my side project I would have liked to have done something similar to having an interpreter run in the page itself and have interactive code you can play with. But such an approach wasn't quite practical (ive seen some top-tier docs do this though!.) Though I ended up writing all my code examples in such a way that they're tested in the unittests. So I at least know if anything breaks.
I really don't like Nota, but you get my upvote because:
1. Typst is amazing and
2. you are open to criticism and are looking to redesign it, and I am hopeful you'll create something good
While LaTeX is cool, and I use it extensively, I personally feel that it has not adapted quickly to various use cases. It is not _easy_ to compile into different formats for consumption, and sometimes the layout issues are quite hard to debug. Efforts such as these, even if they do not take off, might give the LaTeX community enough to think about what to focus on for improvement...
> - How do different syntaxes make different document tasks easy, hard, or impossible?
That's a good idea, would be nice to optimize that instead of sticking to the poor decision of markdown to double asterisks for the more common bold formatting while also wasting another _ symbol in the process
I think the big different between quarto and typst is the scope. quarto is a tool for combining prose and computation to generate many different output types (HTML, PDF (via latex), PDF (via typst), PPT, ...) through the power of markdown/pandoc. typst is a typesetting system for turning plain-text markup in to beautiful PDFs.
I think you're much more likely to want to write typst by hand than latex, but out of the box it doesn't provide any tooling for combining writing and computing (if that matters to you).
I used typst for some university project and what I really liked about it is that it is as programmable as latex but with a language that's much more intuitive (no macro weirdness, but actual functions/if/loops etc etc). It feels like a better of TeX rather than a different way of writing documents.
I haven't really used quarto so take this with a grain of salt, but from what I can see it is much more declarative, you just declare the content of your document and pick a template to show it. It feels simplier, but at the same time what if I need to customize something here and there? Looks like there are extensions that can be programmed, but they are more like second class citizens that you are not suppose to use normally.
(disclosure: I'm a quarto dev. But I'm also a big fan of typst)
You're right that typst is _very good_ at extensions, and likely will always be superior to quarto when it comes to that. The fundamental advantage typst has is that it's a "greenfield" project, and a very well-designed one at that, especially when compared to TeX.
> Looks like there are extensions that can be programmed, but they are more like second class citizens that you are not suppose to use normally.
We take quarto extensibility pretty seriously! "Simple" customization is available without need to program extensions, mostly through metadata configuration and classes and attributes in the document. This covers the basics like CSS, layout, document listings, etc.
For slightly more sophisticated extensions, you can create "filters" that operate directly on the document AST, either using the built-in Lua extension API or reading/writing a JSON representation (these are both built on top of Pandoc's capabilities, which quarto leverages extensively).
For reusable, packageable functionality, the extension system as it exists today is simple but certainly meant to be used "normally". It's how custom formats (the common, concrete use case is to provide different styles for particular academic journals) are defined and used.
I'm also curious. As always for the arXiv there is latex source code for the paper available at the "Other formats" link, and I'm having a hard time telling if its been auto-generated from Nota, but I presume it has given the visual similarity.
Rust makes tremendous sense for this and I really like your borrowing syntax in Nota. Keep it up. This reminds me of MDX, another project I find inspiring and use a lot
> There are two main mediums for digital documents: PDFs and web pages. PDFs were designed to mirror physical documents, so they impose the real-world constraints of paper: page breaks, fixed width, and immutable styling. Web pages, by contrast, provides an essential dynamism. Web pages are undeniably the future of digital documents.
I actually don't agree with this. I think _not_ having "essential dynamism" where it's not needed is actually a feature, not a bug.
OK but then what is the thing in the middle, between documents and applications?
There are fantastic, beautiful interactive experiences (for lack of a better word) that are obviously not documents (they can't be represented on paper, there is code running) but they're not really applications, either (they are fully offline, self-contained, state that's only evident on the page).
This is what I think the future of textbooks and presentations should be. But I think part of the problem is that not only do we not have tools geared toward creating them, we don't even have a name for these things. If we say "document", they flunk the pdf test. If we say "web application", they are lumped into the same lumbering category as office docs and enterprise software.
Maybe Nota is a step in the right direction. But it'd be an even better step if it didn't call itself "21st century documents", if for no other reason than to defend against the valid criticism you levied against it.
When I was a kid, we called it "multimedia". Encarta being the example that comes to mind. My school library had some other interactive encyclopedia that came on a 6–CD-ROM magazine changer (the Pioneer style, if you remember music CD changers). I remember being absolutely flabbergasted at the sheer amount of megabytes sitting in that unassuming disc magazine.
That term is really quaint these days, and it doesn't fully capture what you're talking about. IIRC, it was more prepackaged animations, photos, video, and music; rather than "dynamic code running on paper"
"Dynagraph"? Sorry, that's lame, but there's my entry.
Some of these interactive simulations for learning are called “Explorable Explanations” [1], coined in 2011 by Bret Victor, the author of your second link (worrydream.com), which also talks about “Explorable Explanations”.
For more examples, see [2].
Very much agree that this is what explanations and presentations should be in the modern age. I think a documentation language (what Nota and Typst aim to be) is still needed in this age of Large Language Models, when the ideas are more complex than those expressible by natural languages.
Ha, I was totally thinking about the watch blog too!
I think that one is rather a special case. You could print it out with the first still of each bit and probably get out 90% of the content/context, and modify document to print out a few stills from each interactive part and get the same intellectual content. In it's case, the interactivity is superfluous, it's there to spark joy. It's really an example of a document with outstanding figures/visual aid - the interactivity is just a bonus.
Similarly, the space elevator could be a picture book/pdf. The interactive bits there also spark joy.
The tenbrighterideas page mostly annoyed me. It's just bastardizing structure to be "interactive", which is to say most of the information is hidden away behind a bunch of clicks and it could have been a document, with one page dedicated to each idea.
I strongly disagree with the idea that the interactive elements are only there to "spark joy" - in both of the cases you mentioned, the interactive elements are pretty fundamental. Their purpose is to let you get somewhat hands-on with the concepts the text is discussing - to allow you to take apart the watch yourself, at your own pace, and understand what all the pieces are doing.
I'm sure you can convey the same information in text format (like you say, you could just print out the page), but these particular sites would be a lot weaker, because part of their explanatory power is the interactivity.
The original quote was that dynamism was "the last thing I want in a document", and I think these interactive diagrams and explainers directly show how useful dynamism can be in conveying information.
That's not to say that all dynamism is good - I don't usually want you to use Javascript to just load a new page, my browser can do that just fine - but every medium can be abused. That doesn't mean that medium is bad!
I'm not saying they're only to spark joy, but to me at least, their contribution is mostly in that category. I didn't really find it central to the content in any of the examples.
To my point it's "the last thing I want in a document", I stand by it. What I mean is I should be able to print it and really lose nothing central to the content. Yes, digital offers features which may enrich the experience/use/navigation, but at the same time there's questions of accessibility and ease of parsing. If I _have_ to interact with things to get the information//content out, it's effectively a web app, and not a document, and if done wrong it actively interferes with my ability to absorb the information. IMO the brighter ideas page firmly falls under that last point.
The watch page is wonderful, and the visuals and interactivity is done masterfully, in such a way it's obtrusive and not _required_ to understand the document.
I'd classify the elevator page as a web app, but there's really nothing keeping it from being a document/children's book.
And I just really, really, did not like the brighter ideas page. I think the content is good, but the execution got in the way instead of adding to the experience.
I guess what I don't quite understand in your comment is why these two categories of "web app" and "document" are so fundamental, in the sense that we can divide all web content into either document or web app. Is this a useful categorisation, or is it just applying concepts from existing forms of media to a medium where those concepts don't really fit that well?
For example, with the watch page, if we're defining document by printability, it makes a poor document - while you can print it out, what you'll end up with is a document with lots of static pictures and a bunch of (now useless) text referencing how you can move the pictures to see different things. If I wanted a fully printable document, I'd find a different one that was written with the expectation of being printed - maybe a book about watches, or an entirely static page. That will suit the print medium significantly better than this interactive page.
It makes me think a bit of a science museum, in the sense that most science museums will have a lot of text written around that explains all the concepts they want to discuss - this is how a pivot works, that's what a cow's digestive system looks like, here's a description of a space ship or whatever. And you could collect all this text and turn it into a book, and it would be an informative book that you could read and thereby learn something.
But the value of a museum is that it doesn't just have to be text. You can put a pivot into your visitors' hands; you can show food moving between different parts of a cow's digestive system in real time; you can show genuine pieces of real rockets and discuss what journeys they've been on. The medium allows you a huge amount of extra freedom, and a good museum curator will use that freedom - wisely - to produce an experience that allows visitors to get more insight than they would have if they'd just "printed out" the museum's text and read it all.
That's not to say that written text doesn't have its own advantages - you don't need to visit a book every time you want to get information from it, for example! What's key is that by tailoring the content to the medium that we're employing, we can produce a better result than by trying to apply the norms of a different medium. If we'd built our museum like a book, it would have been a bad museum.
I think a similar principle applies to the web - it comes with its own set of tools and features that differentiate it from books or print media. Some of those are fairly subtle - the ability to reference different pages and sites using hyperlinks, for example - but part of that is the interactivity. And not all sites need interactivity at all times, and the best sites use interactivity only when it adds to the experience (just like the best museums - not the ones that surround you with flashing lights and noise just to distract your attention). But the use of interactivity can elevate a simple text far beyond what a print document can do. I think the watch site is a really good example of what happens when you don't see web pages as "just" documents, and rather embrace their unique qualities.
That's why I don't think it's always helpful to make this category distinction between "document" and "web app", particularly when "document" just means "uses the norms of a different form of media", because the whole point of the web is that it a new media form, with its own features and capabilities.
Reputable scientific journals now post videos online alongside their articles. Interactivity is even better for understanding. But permanence is an issue I suppose.
I'm not saying it's all bad. Digital augmentation can be handy (love the hell out of document search), but honestly their examples aren't compelling.
I think accessibility is a major issue, where the text doesn't make as much sense without the interactive bits, and often the text itself isn't substantial enough to be stand alone.
But also, it's starting to cross the line between web app and document. I can print out a pdf and I just lose peripheral QoL benefits like document search. However, if I try to print a web app I usually lose a lot of the content/context.
Edit: as far as supplemental material goes, I'm all for it. People learn differently, so video, audio, web app, whatever are all great supplemental materials, but a good document should be able to stand by itself.
There are some genuine benefits that come along with. I mean, if I could have auto expand inline footnotes/references in a document, I'd be a happy camper.
I think some basic dynamism is still necessary to make reading pdf on small screens comfortable (e.g. mobile). When on mobile, I vastly prefer reading webpages over pdf because most reasonable webpage should be able to fit mobile screen.
Sadly, if you have an iPhone I don't think you can easily read Nota docs currently. The article that introduces Nota [1] has only been tested on Chrome. I tried multiple browsers on iOS to no avail (likely since they all use the same underlying rendering engine).
If you follow the comment trail back up, in the context of this conversation, static means PDF, hard constraints of physical page layouts; dynamic means HTML, digital, no hard layout constraints.
I had read all those comments. What I'm saying that is that re-flowable text is just in a very different class than web pages that auto-translate the text, have animations, or run arbitrary code.
No, Nota is not just typesetting and document layout. Things like dynamical code examples, auto-translation, advanced tool-tips, and reader-customizable notation are intended to set it apart from predecessors. In other words, it is different precisely because it does things that would be considered dynamic to a front-end dev.
PDFs constrain information by limiting the way it can be reproduced digitally. There is no such thing as "just" copy/pasting from a PDF - weird errors abound. The text has to be extensively reformatted or run through special software.
A format is needed that encodes information visually and digitally. The digital layer doesn't have to be visible by default, just accessible when needed.
>I actually don't agree with this. I think _not_ having "essential dynamism" where it's not needed is actually a feature, not a bug.
Yeah.
To author's surprise, Adobe's PDF spec supports JavaScript execution[1]. And interactive 3D graphics [2][3]. Not to mention, audio and video [4].
And "Liquid Mode" for responsive-layout PDF documents [5].
Of course, these "features" were considered bugs by the ISO PDF/A spec (archival, i.e. future-proof), so they were all stripped out [6].
The point being: sometimes a document should be a document.
As for science papers: LaTeX is written by humans, for humans. Custom latex commands and packages allow one to write a plaintext document that is as easily read as the paper it generates.
Which is great for accessibility, among other things.
The static issue also permeates to webpages and other formats though. Although this is now just yet another competing method for documentation or creation, the restrictions caused by using TeX or LaTeX over more dynamic approaches are not insignificant.
One unfortunate problem is that nobody bothered setting the measure for legibility. On my display the text block is far far too wide. Cf. https://en.wikipedia.org/wiki/Line_length (while we're talking about typography, the fonts for body copy and code are mismatched in size in a distracting way)
As far as formulas/notation is concerned, the notation used in this paper is targeted only at experts in theoretical computer science, approximately the level of advanced grad students or above, who also happen to be pretty familiar with Rust and C++. The gimmicky popups are probably not meaningfully helpful for such an audience, and in my opinion don't really make the notation any more accessible to people without the extremely steep prerequisite expertise (e.g. I don't think this paper is going to be at all accessible to the vast majority of working programmers or computer science undergraduate students).
If you really want to make the paper more accessible, it would be better to focus on reducing the reliance on formulas, reducing the amount of jargon involved, and explaining the concepts and techniques using plain English targeted at a broader audience, rather than trying to add extra colors, click targets, or popups. (A research paper may alternately want to just target experts; that can also be fine. Even for experts this paper is pretty dense though.)
They were helpful for me, an ex-grad-student who read some type theory years ago. I think anyone breaking into the topic would appreciate that.
Requiring authors to publish two versions to make it accessible, when the motivated reader just needs a little comfort, is too high a bar. Let them write densely for their primary audience (and this pass peer review) and still give affordances for everyone else.
As for width, font, etc, a stylesheet can fix that. I'm assuming they allow stylesheets to format for each venue appropriately.
This is a good point. Can a similar markup mechanism help make accessibility easier? Equations are compact to the eye but can be really unpleasant read out loud.
It is needed. I hate reading pdfs on my 24" or 32" monitor. I hate reading them on my Phone. The main thing that my father complained about when switching from Blackberry to iPhone, was missing PDF reflow feature. Basically the only screen where I find pdfs comfortable to read, is on the 12.9" iPad, and only if the author has the same font-size preferences as me.
* More in the space of LaTeX than Markdown (but with elements of each).
* Written in JavaScript (so lots of of people can contribute in a language they already know).
* MIT license.
Nice! I don't know that I have an immediate use for it today, but this looks super nifty. If I did want to write something that needed some LaTeX-y features, and wasn't aiming for publication in a place that required it, I'd give Nota a shot. While I think Knuth is basically a demigod, it's not like he descended from on high, gave us TeX, and said "thou shalt never try anything new ever again".
TeX was definitely groundbreaking, but I consider it a product of its time. Far too much cleverness in macro expansions and weaving tricks to keep memory usage in check for what are now laughable limits.
Missing far too many niceties in comparison to modern languages with more guardrails to protect yourself from silly mistakes. The only way I can write Latex is to heavily rely upon \input{} segments to keep isolated blocks in case I break something through a missed escape.
I keep yearning for a modern take, but it feels like we are stuck in a local optimum from which there is no escape. New platform has to fight with the decades of accumulated inertia and packages which exist in Tex.
There are plenty of these markup languages. The reason none of them really challenge tex/latex in its own space, is that they don't aim to do what tex/latex does.
Latex is "typographically-complete". Markdown and friends are explicitly not. HTML+CSS is. But what latex has is a reasonable enough syntax that a human can write it by hand, unlike HTML+CSS. Moreover, the syntax, though clunky [1] is designed, as much as possible, to not interfere with the content that the human is writing.
For instance, Latex uses curly brackets {} for macro arguments, because they are least used brackets for content. So when you are reading a latex source, you know that () and [] are content, and only {} are ambiguous [2]. Nota, uses a mix of all three brackets for its syntax, causing additional pain for the person reading/writing the source.
The replacement for TeX/latex is never going to a simpler language. It is going to a language just as complex as latex. But it can definitely be cleaned up and sped up compared to latex. IMHO, somebody should write tex from scratch, improve it's syntax but otherwise keep it largely unchanged. Basically, any plain latex source using some of the popular packages should continue to compile and give the same output. That is the only reasonable way out.
[1] A typographically-complete language will never have a non-clunky syntax.
[2] Escaped brackets \{1,2,3 \} are literal curly brackets. Personally, I only use them for mathematical sets and have defined a macro \set, so in my documents {} are 99% not ambiguous.
> what latex has is a reasonable enough syntax that a human can write it by hand, unlike HTML+CSS. Moreover, the syntax, though clunky [1] is designed, as much as possible, to not interfere with the content that the human is writing
I could not disagree more. LaTeX syntax is not 'clunky', it's a mess, and has intentionally been engineered right from the start to be clever rather than consistent. And it's not the syntax only, the obvious mess that is LaTeX's surface goes right on, right to the heart ("the guts" as TeXnicians prefer to say) of the machinery, where no concern is dealt with separately, and anything can influence and break everything else.
Hell you don't even get a semblance of sane text (string) processing or decent numerical computation. Yes, you can do it, in the way you could use a toothbrush or wet wipes to paint your house.
> Latex is "typographically-complete"
Yes as long as one is ready to ignore the fact that quite a few simple things are quite difficult to achieve in LaTeX, e.g. keeping lines the same height and keep register instead of jumping around whenever a superscript is encountered.
> The replacement for TeX/latex is never going to a simpler language. It is going to a language just as complex as latex.
The complexity of LaTeX is just in part due to the complexities of typesetting. It is complex because of an endless litany of bad design choices. HTML+CSS+JS gets a lot of flak for being too complex, but they pale in comparison. For example[1]:
In order to use numerical codepoints to write 東京, you can write any of:
^^^^6771 ^^^^4eac
\char"6771 \char"4EAC
The space between the entities is used to signal the end of the codepoint number, hence to write 東 京 with a space you must use tricks, one of
\char"6771{} \char"4EAC
\char"6771\ \char"4EAC
In this system, ^^5c represents the backslash. But, unlike reasonable systems which TeX is not one of, using numerical reference doesn't deactivate the backslash's special role as command indicator.
Compare this to XML / HTML 東京 which is a much more reasonable syntax, not any harder to write, and uses an explicit end-of-command marker instead of the 'clever' space which is highly problematic.
Or better still, use XeLaTeX. But that's not the point. The point is that (1) sometimes you don't want the literal codepoint but a numerical reference in your source code; a use case for this would be ` ` instead of a literal ideographic space which might be useful to prevent it from being accidentally elided when at the end of the line.
(2) irrespective of whether you want to use numerical references or not, the example shows that apparently the authors of (La)TeX are unable to use sane syntaxes for their stuff. It's just a very bad idea to terminate your variable-length commands with a space when a space in the output could possibly follow. Same with identifiers: only letters are allowed, no underscores, no digits. You then get names like `\fooBarBazVI` instead of \foo_bar_baz_6 which many would prefer. These are all trifles to be sure, but they're legion, so you get a software that seemingly takes Death By a Thousand Papercuts as a positive design
maxime.
LaTeX definitely has many messy parts that need to be cleaned up. Native support for unicode characters and bidi text (which is somewhat implemented by xetex), is mandatory in new-latex.
TeX engine obviously will need to be rewritten completely from scratch for the reasons you suggest.
For me, there is one fundamental issue which makes me want to switch from LaTeX -- it can't produce accessible documents (and good HTML would do just fine as accessible). LaTeX is making good progress in this area.
Amazingly (to me) it seems typst is doing even worse than LaTeX, while starting much later! I'm happy to be told they have succeeded in this area of course.
While the accessibility is an important area, it affects minority of the people.
You need to get the core functionality first before you can spend resources on accessibility.
Latex is very old and has the features; they can focus on accessibility now.
No, I completely disagree. You need to design accessibility in from the start, it's almost impossible to retrofit. Very few systems manage to add high-quality accessibility later on.
I'm one of the Typst devs and I do agree with you here. LaTeX has a lot of trouble with accessibility because it's hard to retain semantic information through layers of macros. However, I think we are in a better starting position because Typst is designed to revolve around semantic elements that the compiler can actually understand. We haven't gotten to it yet (there's lots to do), but we want to use this information both to output Tagged PDFs and for semantic HTML export. I guess we'll see how it turns out!
I wish you the best of luck. I don’t have any time to get involved in any more open source projects, but I consider the lack of common accessible publishing formats for science one of the biggest embarrassments of academia — for a field that claims to be open, we sure seem to love churning horridly inaccessible PDFs (and yes, I’m as guilty as anyone else here).
> I keep yearning for a modern take, but it feels like we are stuck in a local optimum from which there is no escape. New platform has to fight with the decades of accumulated inertia and packages which exist in Tex.
I believe the issue is that the better-than-LaTeX language needs to be not just better, but so much better that all the tooling and extensions for LaTeX are ported to it. Before this, it won’t be better than LaTeX. So it’s a kind of a chicken-and-egg issue.
The problem with any new initiative in this space is that it enters a field that is more than well populated. It doesn't matter if existing solutions are less than perfect.
Its like walking into a historic European city that has architecture going back millennia and arguing for a great new building design. Greenfield space is scarce and people will not just demolish old structures to try something new. They need to sense overwhelming advantage.
The analogy gives some hints as to what needs to happen for a new approach to take hold. In building construction, massively better use of space was one example: For better or worse, use of steel and reinforced concrete opened the vertical dimension and the rest is history.
Is there such an unexplored dimension that could entice people into yet another document format to "improve" on ascii, restructuredText, wikitext, markdown, tex/latex, asciidoc, html etc. etc.?
The stock answer is some sort of semantic hypertext infrastructure. The original vision is still unfulfilled. If we assume that the walled gardens of today are just a bad nightmare that will pass away, in a re-decentralized web one would need modern, user-friendly and empowering document writing infrastructure.
But there might be other dimensions that would elevate document writing and sharing to new heights. The beauty of innovation is that it is not bound by conventional rules and pre-existing wisdom.
If you want a decentralized web, a "Userweb" (maybe to parallel a bit Usenet), in my view you need a BitTorrent-like protocol.
You need efficient documents formats, which means formats that don't try to do everything, which means that you have to segregate different document types in different files - e.g. tables in some standard spreadsheet format - in contrast to HTML that tries to do tables, graphic (SVG), etc. on its own.
Separating each thing in their own file lets the user choose which programs they want to view/manipulate documents (rather that the one-size-fits-all browser), help with distribution across the network, and help with low storage/low bandwidth situations (or do more with the hardware we have right now).
So from my perspective, the answer is negative: there's nothing new to invent. Specialize and refine what was already invented.
If you follow the links, I think https://willcrichton.net/nota/ makes a compelling argument for the benefits of Nota at least in the niche of its intended use-case: making programming language academic papers more understandable.
As mentioned in another comment, rendering to high-quality PDF is an obvious need/question: can it do that?
(Perhaps) better would be rendering to LaTeX for compatibility with existing system.
Basically this is markup + code (Javascript), which is a combination targeted by MDX. The last time someone mentioned this, the author said Nota is geared towards documentation, while MDX is geared towards web sites: https://news.ycombinator.com/item?id=31349579
If that is so, I think someone needs to wrap Nota into a product for it to take off. Because while the results look great, fiddling with node.js to build a document is too much work -- it's like Latex all over again. Most people will prefer to use Notion or a word processor.
LaTeX, or more generally, the TeX family, never went away. It's still the go-to toolkit for writing journal papers in most of the STEM disciplines.
Moreover, Nota is not a direct competitor to the TeX family because Nota generates something for web browsers to read, whereas xxxTeX generates something for PDF readers to read. (And yes, I know xxxTeX can generate other output formats but most people use it to generate PDF.)
The fact that Nota comes with a default style (some kind of article style) already puts it in a completely different league than MDX. MDX is ready, willing, and able to use whatever React/Vue/etc framework you use, but that means it doesn't come with any opinions out of the box.
Nota feels to me like it could be plugged in to a static site generator. Except then you have to get the two to cooperate. Still, I think that's an achievable hope for someone a little more dedicated than I am.
Yeah this looks like a sensible extension to Markdown in a way I quite like the look of. Might be worth spending some time plugging it into Nikola (which does one thing I like really well - let's me blog as iPython notebooks, but that's not always appropriate).
Why do we need a language that creates a text page by compiling to JS? the same set of goals are handled by AsciiDoc. Real publishers (eg Oreilly) use AsciiDoc as their input format to create books and websites.
I can second AsciiDoc. It hits the sweet spot between Markdown (arguably more suited for short-ish content) and LaTeX (full-blown academic papers with citations, formulas, etc.).
Have been using AsciiDoc for the past few years and loving it, only falling back to Markdown on places where AsciiDoc is not (yet?) available. GitHub and GitLab, for example, supports rendering AsciiDoc. PyPI unfortunately has not supported it, but more seem to be looking into it [1], which is great.
asciidoc is my favourite too, thanks to Asciidoctor. I agree, asciidoc hits the sweet spots as a format. Been frustrated by the tooling lately though. I can see the huge effort put into Asciidoctor, and am thankful for it, but there are still big downsides i.e. no semantic html 5 output, difficult (or at least more difficult than necessary) integration with image generators, heavyweight (only ruby dependency on my entire machine). I imagine this just needs more time and resources put to it, as all these issues (except the ruby one) are open on GitHub.
I was just thinking about what the easiest way would be to create a blog with mostly static content but with room for interactive graphs or arbitrary customization if i want it. Is this a good option?
Another option I saw was Quarto [1]. Maybe even a simple static site blog like Jekyll can be used as well where i just edit the output HTML as needed? What do you all recommend?
I use XHTML, mixed with a little custom XML, and processing with XSLT to produce final XHTML [1]: all of HTML is usable that way, the generation process is customizable, the dependencies are relatively easily manageable, the involved formats are standardized.
I would very highly recommend Astro [0]. Astro lets you write React-style components that compile to plain HTML/CSS (unless you _actually_ need JavaScript). My personal site and blog [1] is built with Astro
You could also go down the route of using shortcodes in Hugo (I think there is something similar as well for Jekyll?) and use the output HTML file as the input for a shortcode from data visualization libraries such as Vega-Altair (Python) or plotly.
If it did not compile to JS (needing a browser to display? or at least a JS engine?) and did not depend on NPM and NodeJS, it would have a higher chance of being adopted into my workflows. I am also still wondering, what it does, that I would need in a blog or even academic paper, that I could not express using reStructuredText or the Org format. Especially reStructuredText is quite powerful, and even more so, when you add custom directives to it. I built a rudimentary wiki on top of that, with hypertext linking between pages, table of content and so on. Custom directives could be used for citations maybe.
- Make & Python as glue and helpers for compilation
I manage my references using Zotero like any other academic writer. The configuration is less than 100 lines and I can get a pretty solid result using only basic HTML/CSS skills intertwined with Markdown. You sometimes end up with weird formatting issues but there is nothing you can't fix using HTML/CSS/JS. My manuscript has images, figures, tables, code, etc...
It's good to see people trying to tackle the problem of formatting documents again. LaTeX is good but not for everything and the ecosystem is extremely hard to understand. Word, Pages and other similar tools are... proprietary. What would be a game changer for my use case is to see something like Scrivener with more formatting/layout options: https://www.literatureandlatte.com/scrivener/overview
> Existing document tools like LaTeX, Pandoc, Markdown, and Scribble can (for the most part) only generate static web pages.
I feel like Nota is underselling itself here, or at least not properly arguing for why a new language was created. If LaTeX were a perfectly fine document language, then surely we could extend e.g. Pandoc to provide these dynamic features when rendering LaTeX to a web page.
But instead, a new document language was created. Why?
The trouble with js used then becomes whether the browser has implemented whatever function that nota expects, in the same way as other browsers, or implemented at all. Even in modern web development there are situations where things don't work in Firefox or let's say KDE browser. This leads to the usual notice put up be lazy or incompetent developers about viewing in Chrome/Safari only.
Quick reading tells me this is Pandoc[1] but in JavaScript. Interesting, would love to see a bunch of examples such as a Static Site Generator, etc. to make it easy for people to related to fast and try it out.
An aside: the font on that website is gorgeous...and perfectly crisp and readable on a screen despite having serifs. I wonder why it looks so much sharper than some (serif) fonts I see on the web.
I like the idea! So basically, a better and modern LaTeX. I've had the same idea for a while now, for example the ability to reference definitions, etc.
But I think Nota goes about this a bit too heavy-handed:
% let nota = @Smallcaps{**Nota**}
.@Definition[name: "nota", label: nota]{
#nota is a language for writing documents, like academic papers and blog posts.
}
That's 4 mentions of `nota` to introduce a definition, 5 if we count `Nota` as well. Come on. Also, when referencing, instead of `¬a` maybe just allow `[[Nota]]` and `[[nota]]` instead?
You can just replace the name and label with whatever you want. Also in general you’ll be referencing an item much more frequently than you’ll be defining it.
Is there an advantage over AsciiDoc? In my opionion AsciiDoc is really powerful and offers anything an author of academic papers (and blog posts) would need. The output is also not limited to html.
> A ¬a document compiles to a JavaScript program, meaning it's easy to:
> * View documents on any device that has a web browser.
Ah yes, a JS runtime and a browser, 2 things which are feasible to develop and definitely not massive black boxes.
I love the idea.
I just think it would be better being a format unto itself, or at least not requiring JS and/or a browser. Decoupling from these at least permits other language implementations of viewers/editors; browsers are already basically unimplementable by anyone without massive commercial backing.
I would be interested in a comparison between this and Jupyter notebooks. Notebooks seem like they'd be easier to use than a specific language for most users.
Can’t help but contrast it to ObservableHQ (https://observablehq.com/), which is basically front end notebooks with reactivity and other niceties. Leans right into “dynamism” while also allowing for simple document authoring.
Though it’s a platform, there’s a runtime and some open substitutes for the bits not open IIRC.
Give me an inverted IPython and I'll be happy. By inverted I mean: I write code and from time to time I want to document it with rich information (plots, equations, a bit more structured text). So I would love to have powerful comments in my regular code. I'm not much interested in bits of code in a single document.
I'm not sure is it safe to rely documentation on nodejs based software.
In my humble opinion JavaScript world evolves so fast and obsolete so many things that writing reliable software (in term of availability and sustainability) is very very very hard task.
> The goal of Nota is to bring documents into the 21st century.
I think the modern approach might be to recognize that all readers are different, and that they would use an LLM to transform the document into the form that is suitable for a particular reader.
Please, please, please, make sure that it uses a context-free grammar. For the live of Knuth, correct that one stupid mistake he made and allow other people to parse your syntax. Good things will follow!
I am not a really intelligent person, but having an ability to caps while being in caps (that's how a Nota word is written on the example) is kind of too xzibit-like.
After developing the initial prototype you see in the webpage, I've since gone back to the drawing board. I'm working on developing a firmer foundation for issues like:
- How do you interleave content and computation? See: https://arxiv.org/abs/2310.04368
- How do different syntaxes make different document tasks easy, hard, or impossible? See: https://github.com/cognitive-engineering-lab/doclang-benchma...
I still very much believe in the high-level philosophy, but Nota will look very different within ~6 months. In the meantime, the single coolest development in the document language space is Typst, which I encourage you to check out: https://typst.app/
Also: the next version of Nota will be written 99% in Rust :-)