Boehm, Hans-J., Russ Atkinson and Michael Plass (1995) "Ropes: an alternative to strings", Software—Practice & Experience 25: 1315–1330, DOI: 10.1002/spe.4380251203, https://dl.acm.org/doi/10.1002/spe.4380251203 (cited 2021-06-01).
It's basically just two stacks of lines, where a line is an array of characters. One stack has the lines before the cursor, the other has the lines after the cursor (in reverse order).
I use emacs, but ropes are a much better way to go if you're starting from scratch.
The last 20 years have been OK. Before that not really so.
I remember starting emacs on a VAX-750 with 4 MB main memory and 8 users in the 80s.
Even in the 90s with Suns and competitors in single user usage emacs could bring things to a halt. Especially when starting a certain messaging client (hej LysKOM).
Well, back then emacs was a resource hog like today VSCode using 8GB and swapping as someone wrote here. (I would not touch VSCode with a stick). Times have changed, but emacs rather little. So today it can nearly be considered lightweight and efficient.
opening any file with extra long lines (for example minified js files which somehow end up littering any codebase) in emacs is still bring emacs to its knees, especially if you have line wrapping enabled.
I wish there were a name for the generic type Foo<T> such that a rope is exactly a Foo<Char>. In other words, a rope-of-non-character primitives (like bytes or floats).
Generalizing even further, I wish there were a name for the generic type Bar<T,M> which is a b-tree of T-values, plus a monoid [1] M over T, where each non-leaf b-tree node stores the M-sum of its descendents [2]. This lets you quickly compute the M-sum of arbitrary ranges or search for an element by the M-sum of its predecessors. Ropes are essentially Bar<Char,(0,+)> so the monoid gives you the number of characters in a range.
If you let M=(0,max) or M=(0,min) you can get the min/max value in arbitrary ranges in O(log n) time. This is extremely useful for all sorts of things, from algorithmic trading (order books) to VLSI (plotting waveforms from gigantic simulations). If M1 and M2 are monoids, then $M1\cross M2$ is a monoid, so these "range summaries" are very modular -- you can mix and match them.
In the functional programming community there is a data structure called a finger tree [3] which are a specific case of Bar<T,M> implemented as an immutable data structure. Immutable data structures are nifty, they have benefits, but also costs. I don't know what to call Bar<T,M> implementations which are non-immutable (i.e. ephemeral or update-in-place) data structures.
If anybody knows of generally-accepted names for these (or wants to suggest some) please reply. I plan to add this functionality into (a fork of) LMDB.
I was in a discussion about this recently. I would probably use an RRB tree, which would give me virtually free tree-based undo/redo, an a simple way to implement multiple cursors.
A rope is a data structure for big strings. Its a string where instead of storing the characters in an array, the characters are stored in a different data structure to make common operations faster. Ropey stores a string where the characters are kept in a b-tree. This allows log(n) time complexity for inserts or deletes anywhere in the string.
So you can have a string with millions of characters (a loaded document) and you can edit it efficiently.
It seems the biggest issue with ropes is search. The rust regex engine (which this editor uses) expects an array. In the worst case scenario you forced to copy the entire document into an array to run search on it. Only to throw that away when you are done. That offsets a lot of performance gains ropes are supposed to provide.
If the regex engine doesn't support indexing into a arbitrary (indexable) data structure, that's a deficiency in the regex engine, not in one particular such data structure. You can tell this isn't a deficiency of ropes, because the exact same thing would happen if you used[0] hash tables or b-trees.
Search might be slower with a more complicated lookup (versus a probably-inlined-and-otherwise-optimized pointer-plus-index addition), but that wasn't the problem described; the problem decribed was that non-array lookups weren't supported at all, and search required copying the data over to a array before the regex engine would deign to touch it.
0: for a example, not a suggestion that they would be a good fit for a text editor
It's not widely known but PCRE (which I assume is still the most widely used regex library) supports partial matching which you can use to interoperate with any piecewise linear string representation (ropes, gap buffers, etc).
hx uses the Ropey crate (Rust library) to implement ropes: https://crates.io/crates/ropey
Boehm, Hans-J., Russ Atkinson and Michael Plass (1995) "Ropes: an alternative to strings", Software—Practice & Experience 25: 1315–1330, DOI: 10.1002/spe.4380251203, https://dl.acm.org/doi/10.1002/spe.4380251203 (cited 2021-06-01).