The Craft of Text Editing (1999)

cmyr · on Jan 30, 2017

I've been learning a lot lately by following along with the development of xi[1], a new text editor written in Rust. Through reading that project's RFCs I've then come across other interesting projects, like swiobe[2] and wi[3].

What are the other canonical resources on this topic? It feels like tons of the interesting thought is scattered around various blogs and usenet posts and the like. I'd love to create a nice collection of good writing on text-editing / tools, but I'm not sure where to start.

[1] https://github.com/google/xi-editor

[2] https://github.com/swiboe/swiboe

[3] https://github.com/wi-ed/wi

omtose · on Jan 30, 2017

Kakoune[1] has been posted recently here on HN to great reception. As a C++ developer, I think it has a very high quality codebase, especially considering how non-trivial it is. As a user, it's been my main text editor for over a year. It also has a vibrant community, jump in on IRC if you have questions or ideas.

[1] https://github.com/mawww/kakoune

zellyn · on Jan 30, 2017

I think getting Kakoune UI support into Xi would be a very interesting project…

omtose · on Jan 30, 2017

Are you talking about Xi as a frontend for kakoune? If it's the UI part that you're interested in, there is also kakoune-qml[1] made by one of the regular uses.

[1] https://github.com/doppioandante/kakoune-qml

zellyn · on Jan 30, 2017

I think creating a Xi frontend that mimicked Kakoune's modal keybindings and highlighting would be an interesting exercise, and probably stretch the set of supported ideas in Xi in a good way.

jasonm23 · on Jan 30, 2017

REmacs is a very interesting development...

https://github.com/Wilfred/remacs/blob/master/README.md

nextos · on Jan 30, 2017

It'd be amazing if it crystallizes in a few years. Improving the old parts of Emacs, especially GUI code and low level things is a must. Rust seems like an ideal replacement for C. Great performance, much better safety guarantees.

Equally important is perhaps improving elisp concurrency. The jury is still out on whether this will happen by migrating to Guile Scheme [1].

[1] https://www.reddit.com/r/emacs/comments/4zttlt/guileemacs_st...

_pfxa · on Jan 30, 2017

I believe Rust is too young a language to replace C in Emacs. I'd be happier if it was a standardised, time-tested language with a large user base and lots of docs.

fridsun · on Jan 30, 2017

Rust is standardized with lots of docs. The rest two will come with time.

brians · on Jan 30, 2017

It's still moving fast and breaking things in ways that---while necessary---aren't reasonable choices for programs like Emacs or TeX. These programs need to run the same way in 20 years, and be runnable the same way in 50.

gwright · on Jan 30, 2017

Sam comes to mind as an interesting project in this area:

http://doc.cat-v.org/plan_9/4th_edition/papers/sam/

0x445442 · on Jan 30, 2017

Too reliant on the mouse.

jhallenworld · on Jan 30, 2017

There is some theory of operation stuff for JOE:

https://sourceforge.net/p/joe-editor/mercurial/ci/default/tr...

Todd · on Jan 30, 2017

A paper that covers some of the data structures used in editors is:

https://www.cs.unm.edu/~crowley/papers/sds.pdf

The gap buffer, in particular, is a great example of a simple yet powerful idea that is perfectly suited to the problem domain.

polm23 · on Jan 30, 2017

The text in that PDF didn't show up correctly for me, but there's an HTML version of the paper here:

https://www.cs.unm.edu/~crowley/papers/sds/sds.html

bernardlunn · on Jan 30, 2017

Was irony intended?

dws · on Jan 30, 2017

Fun blast from the past. The original version of this shipped with Mark of the Unicorn's Mince/Scribble package for CP/M. (Mince = Mince Is Not Complete Emacs)

ScottBurson · on Jan 30, 2017

Glad to see someone remembers that! (I was a cofounder.)

The sources for the original version of Mince were lost long ago. It's too bad; they were very clear (thanks to the skills of Jason Linhart, the primary author, as well as Craig) and would have made a great example for study.

kabdib · on Jan 30, 2017

I used MINCE quite a bit, and it was great. MINCE was what you used for Emacs if you couldn't get to an ITS machine :-)

Thanks, it was a really nice editor.

AlexanderDhoore · on Jan 30, 2017

I once build a text editor using a rope[1] data structure where every line was a node. The tree was augmented[2] with information about line numbers, titles in the document... for very fast navigation. I don't think primitive data structures like a gap buffer are useful anymore. They come from a time where saving on memory was more important than it is now.

EDIT: I forgot it was also a self balancing tree! Very cool stuff.

[1] https://en.wikipedia.org/wiki/Rope_(data_structure) [2] https://en.wikipedia.org/wiki/Interval_tree#Augmented_tree

geocar · on Jan 30, 2017

A dissent: Saving memory isn't strictly orthogonal to editing performance.

A paged gap buffer (as described with an array index) remains ideal when you need to make a small number of surgical changes (insertions, deletions, etc) to a very large file, especially given the fact that all modern systems have page mapping hardware, so anything you implement is effectively on top of a paged gap buffer anyway. What we're really searching for is a better program-visible structure.

To that end, the biggest difficulties in efficient text editor, is that most text editing is (ahem) textbook, and neglects the fact that fork() copies on write making most real operations asynchronous, and combining writev() and mmap() can be used to produce whatever memory layout you want (a plain old stupid byte array if that's convenient); The kernel will memcpy your page table for you, so there's no sense in also doing it in user-space. And so on.

If you consider at which point a write() and a mmap() (or on OSX a mach_vm_remap()) will be faster, just how much faster it will be -- imagine: programming something as simple as a plain byte array but with instant inserts (memmove) across multi-gigabyte buffers. Then consider the cost of a write()+mmap() syscall combination in the worst case (a couple hundred micros?) and you'll never use a complicated (linked list) data structure again.

maxbrunsfeld · on Jan 30, 2017

How would a gap buffer handle distributed editing operations like search-and-replace, or multi-cursor typing? The data structure seems optimized for editing in one place at a time.

geocar · on Jan 30, 2017

There aren't a lot of structures that have an amortised cost-per-edit; you're really only ever considering the cost of doing a single insert/deletion operation, and you're really only ever trading that performance against the complexity.

A paged gap buffer is actually more like a tree, so instead of a single gap in the middle of a file, you have a gap in the middle of a block, and one block mapped per modified space. Cost of insert/delete is limited to the cost of a memmove within a block (cheap!), and your extent map never grows beyond 2x the number of changes. These upper bounds are incredibly good for edits, and there are only pathological cases that do better.

But what about search?

Search is faster too! Because virtual memory has your entire file contiguous, a search is as fast as a scan[1], which might embolden you to try indexing your file, which might really impress your users.

[1]: https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm

teddyh · on Jan 30, 2017

> I don't think primitive data structures like a gap buffer are useful anymore. They come from a time where saving on memory was more important than it is now.

I beg to differ. I routinely open multi-gigabyte log files and SQL dumps in text format, and would not like to have to resort to “sed” to edit them.

AlexanderDhoore · on Jan 30, 2017

I don't disagree. Although a good rope implementation can handle that. Ropes are nice because it doesn't matter where you edit. Gap buffers have to copy around a lot of data if you are editing in different places.

Btw the memory overhead is not that much. It's just the memory needed to keep the pointers between the nodes. So I'm talking 10% increase or something.

edmccard · on Jan 30, 2017

>I don't think primitive data structures like a gap buffer are useful anymore.

Emacs seems to do just fine with gap buffers. And thinking about algorithms with greater complexity as being less primitive can obscure the fact that the complexity isn't always worth the tradeoff; see "Gap Buffers, or, Don't Get Tied Up With Ropes"[1]

[1] http://scienceblogs.com/goodmath/2009/02/18/gap-buffers-or-w...

gf263 · on Jan 30, 2017

I never understood why these webpages can't have like, 4 lines of CSS to make them much more readable. Preserve the older aesthetic I guess?

grzm · on Jan 30, 2017

As pointed out elsewhere, the post is from 1999. You're effectively asking the author to, apart from creating the content, to also maintain it over the years to someone else's arbitrary satisfaction. I'm not sure if that's fair. The author likely has other project's they're working on, and maintaining the look of something they wrote 18 years ago isn't a priority.

Consider what would have happened if they had just published a book. Likely the book would be out of print and no longer available. This is definitely a step up from that.

If this is something that you truly care about, you may want to reach out to the author and offer to provide a maintenane update on your own. That said, be prepared for the author to perhaps consider even the little time it would take coordinating with you and performing any updates not worth their time. Then again, they may welcome your interest and willingness to help.

massysett · on Jan 30, 2017

Maybe because the author's expertise is something other than writing HTML, so he picked up an old book on HTML, marked it up, and that's it. That the browser can render HTML written 20 years ago is quite a virtue. If it only takes 4 lines of CSS to make it more readable, then the page's lack of readability is more an indictment of the browser (which could do this tidying itself) rather than of the author, who should not have to continually update HTML so it renders well on recently-invented devices.

stinkytaco · on Jan 30, 2017

> then the page's lack of readability is more an indictment of the browser (which could do this tidying itself)

I strongly disagree with this. The browser should do nothing it is not explicitly made to do, which is one of the reasons it can still render HTML from 20 years ago. We used to have browsers that tried to do that kind of thing and we're only now extracting ourselves from that mess.

It is 100% on the website author to make their page more readable.

massysett · on Jan 30, 2017

The whole point of vanilla HTML is that it has few presentation details. It has some headings, bold, italic, and such. If the page does not specify margins or font size, the browser absolutely should set these so it is most readable on the device. If I write plain HTML today I am not optimizing for some VR headset that will be used twenty years hence. The headset should render the plain HTML in a manner faithful to the semantic markup, not so it looks the same way it looked on Netscape with a VGA screen.

gf263 · on Jan 30, 2017

If you have something worth saying, you can take the 5 minutes it takes to make it readable.

massysett · on Jan 30, 2017

Learning CSS does not take 5 minutes.

moron4hire · on Jan 30, 2017

Looks great on my phone. Symantic HTML is responsive by default.

axiomabsolute · on Jan 30, 2017

It's a bit awkward to read on large screens. Long lines and whatnot

zeveb · on Jan 30, 2017

> It's a bit awkward to read on large screens. Long lines and whatnot

For years and years and years I always had a half-screen-width browser window, precisely because it's easier to read text that way. But then site authors started assuming that I'd have a full-width window, and using CSS to waste half the window width.

I still think that the correct response to 'window too wide' is 'shrink the window,' but it's a losing battle.

albedoa · on Jan 30, 2017

Wait, why wouldn't the solution be "define a max width"? Site authors should not be wasting your window width, windows should not be "too wide" for text, and you shouldn't have to choose between the two solutions you mentioned.

zeveb · on Jan 30, 2017

Only because it seems to me that if the user really wants a terribly wide window … that's his choice. Who am I to prevent someone from doing something which seems stupid to me? Maybe he wants a half-inch high window at the bottom of his screen where he can scroll through my text slowly, or something. His call, not mine.

throwanem · on Jan 30, 2017

But, in demonstration of another virtue inherent in this sort of simplicity, reader mode handles it beautifully.

moron4hire · on Jan 31, 2017

One may find it useful to employ CTRL+PlusKey, CTRL+MouseWheel, pinch-to-zoom, or snap-to-screen-edge.

daveguy · on Jan 30, 2017

What 4 lines would those be?

axiomabsolute · on Jan 30, 2017

Inspired by https://bestmotherfucking.website/

  body {
      margin: 1em auto;
      max-width: 40em;
      font: 1.2em/1.62em sans-serif;
  }

That seems like a good start

daveguy · on Jan 30, 2017

Nice! Thank you!

erikb · on Jan 30, 2017

> In its most general form, text editing is the process of taking some input, changing it, and producing some output.

Funny how similar that definition is to the "programming" one.

fmoralesc · on Jan 30, 2017

That's because that is just what a Turing machine does.

z3t4 · on Jan 30, 2017

As someone currently working on a code editor I love this stuff, but there's usually more focus on the technical part then the human part. With todays hardware we can do millions of stupid things every second and it will still feel snappy. We should spend more time trying to optimize for the humans instead of their computer.

jwhitlark · on Jan 30, 2017

I liked it so much I bought a hard copy a couple years ago. Lots to learn in that book.