Boehm, Atkinson and Plass (1995) proposed ropes as an alternative to strings for...

malthejorgensen · on June 1, 2021

My impression is that most editors already use this or similar optimized data structures for representing the text internally.

KirillPanov · on June 2, 2021

Emacs uses a much less advanced data structure called a "gap buffer":

https://www.gnu.org/software/emacs/manual/html_node/elisp/Bu...

https://news.ycombinator.com/item?id=15199642

It's basically just two stacks of lines, where a line is an array of characters. One stack has the lines before the cursor, the other has the lines after the cursor (in reverse order).

I use emacs, but ropes are a much better way to go if you're starting from scratch.

eddieh · on June 2, 2021

Is it though? After about 20 years of Emacs I have never thought it was slow regardless of anything found in etc/JOKES

usr1106 · on June 2, 2021

The last 20 years have been OK. Before that not really so.

I remember starting emacs on a VAX-750 with 4 MB main memory and 8 users in the 80s.

Even in the 90s with Suns and competitors in single user usage emacs could bring things to a halt. Especially when starting a certain messaging client (hej LysKOM).

Well, back then emacs was a resource hog like today VSCode using 8GB and swapping as someone wrote here. (I would not touch VSCode with a stick). Times have changed, but emacs rather little. So today it can nearly be considered lightweight and efficient.

gpderetta · on June 2, 2021

opening any file with extra long lines (for example minified js files which somehow end up littering any codebase) in emacs is still bring emacs to its knees, especially if you have line wrapping enabled.

eddieh · on June 2, 2021

Ah, I don't have line wrapping enabled by default.

KirillPanov · on June 2, 2021

I wish there were a name for the generic type Foo<T> such that a rope is exactly a Foo<Char>. In other words, a rope-of-non-character primitives (like bytes or floats).

Generalizing even further, I wish there were a name for the generic type Bar<T,M> which is a b-tree of T-values, plus a monoid [1] M over T, where each non-leaf b-tree node stores the M-sum of its descendents [2]. This lets you quickly compute the M-sum of arbitrary ranges or search for an element by the M-sum of its predecessors. Ropes are essentially Bar<Char,(0,+)> so the monoid gives you the number of characters in a range.

If you let M=(0,max) or M=(0,min) you can get the min/max value in arbitrary ranges in O(log n) time. This is extremely useful for all sorts of things, from algorithmic trading (order books) to VLSI (plotting waveforms from gigantic simulations). If M1 and M2 are monoids, then $M1\cross M2$ is a monoid, so these "range summaries" are very modular -- you can mix and match them.

In the functional programming community there is a data structure called a finger tree [3] which are a specific case of Bar<T,M> implemented as an immutable data structure. Immutable data structures are nifty, they have benefits, but also costs. I don't know what to call Bar<T,M> implementations which are non-immutable (i.e. ephemeral or update-in-place) data structures.

If anybody knows of generally-accepted names for these (or wants to suggest some) please reply. I plan to add this functionality into (a fork of) LMDB.

[1] https://en.m.wikipedia.org/wiki/Monoid

[2] actually you want to store the M-sum of the descendents of each child node alongside the pointer to that child node

[3] https://en.m.wikipedia.org/wiki/Finger_tree

gnull · on June 1, 2021

Ropes is a performance improving technique, not a conceptually new model to think about strings, isn't it?

nielsbot · on June 1, 2021

Depends if you're talking from the perspective of the API client or the string storage implementor's.

giovannibonetti · on June 2, 2021

Interesting, could you elaborate on that?

bjoli · on June 2, 2021

I was in a discussion about this recently. I would probably use an RRB tree, which would give me virtually free tree-based undo/redo, an a simple way to implement multiple cursors.

bruce343434 · on June 1, 2021

Can’t find a download link of that article on that page nor another article explaining what “ropes” are. Can somebody tell me what a rope is?

josephg · on June 1, 2021

A rope is a data structure for big strings. Its a string where instead of storing the characters in an array, the characters are stored in a different data structure to make common operations faster. Ropey stores a string where the characters are kept in a b-tree. This allows log(n) time complexity for inserts or deletes anywhere in the string.

So you can have a string with millions of characters (a loaded document) and you can edit it efficiently.

celeritascelery · on June 2, 2021

It seems the biggest issue with ropes is search. The rust regex engine (which this editor uses) expects an array. In the worst case scenario you forced to copy the entire document into an array to run search on it. Only to throw that away when you are done. That offsets a lot of performance gains ropes are supposed to provide.

https://github.com/xi-editor/xi-editor/issues/1192#issuecomm...

archseer · on June 2, 2021

Yeah, the search implementation right now is kind of thrown in there just to get out of the way of my editing.

There's a way to search through rope chunks by using the low level regex-automata: https://github.com/rust-lang/regex/issues/425#issuecomment-7...

Alacritty currently does that to search over it's cells, I'd like to look into it.

a1369209993 · on June 2, 2021

> That offsets a lot of performance gains ropes are supposed to provide.

It sounds like that offsets lot of performance gains the rust regex engine is supposed to provide, actually.

celeritascelery · on June 2, 2021

Search is pretty fundamental to text editors. Getting fast insertion with slow search is not always a desirable trade-off.

a1369209993 · on June 2, 2021

If the regex engine doesn't support indexing into a arbitrary (indexable) data structure, that's a deficiency in the regex engine, not in one particular such data structure. You can tell this isn't a deficiency of ropes, because the exact same thing would happen if you used[0] hash tables or b-trees.

Search might be slower with a more complicated lookup (versus a probably-inlined-and-otherwise-optimized pointer-plus-index addition), but that wasn't the problem described; the problem decribed was that non-array lookups weren't supported at all, and search required copying the data over to a array before the regex engine would deign to touch it.

0: for a example, not a suggestion that they would be a good fit for a text editor

psykotic · on June 2, 2021

It's not widely known but PCRE (which I assume is still the most widely used regex library) supports partial matching which you can use to interoperate with any piecewise linear string representation (ropes, gap buffers, etc).

brundolf · on June 1, 2021

When you think about it, keeping editor text in a naive array-based string would be absolutely nutty

banana_giraffe · on June 1, 2021

It's a b-tree for strings, optimizing for editor tasks that are traditionally inefficient if you keep a text document as a simple string.

https://iq.opengenus.org/rope-data-structure/