Hacker News new | past | comments | ask | show | jobs | submit login

Boehm, Atkinson and Plass (1995) proposed ropes as an alternative to strings for use in text editors.

hx uses the Ropey crate (Rust library) to implement ropes: https://crates.io/crates/ropey

Boehm, Hans-J., Russ Atkinson and Michael Plass (1995) "Ropes: an alternative to strings", Software—Practice & Experience 25: 1315–1330, DOI: 10.1002/spe.4380251203, https://dl.acm.org/doi/10.1002/spe.4380251203 (cited 2021-06-01).




My impression is that most editors already use this or similar optimized data structures for representing the text internally.


Emacs uses a much less advanced data structure called a "gap buffer":

https://www.gnu.org/software/emacs/manual/html_node/elisp/Bu...

https://news.ycombinator.com/item?id=15199642

It's basically just two stacks of lines, where a line is an array of characters. One stack has the lines before the cursor, the other has the lines after the cursor (in reverse order).

I use emacs, but ropes are a much better way to go if you're starting from scratch.


Is it though? After about 20 years of Emacs I have never thought it was slow regardless of anything found in etc/JOKES


The last 20 years have been OK. Before that not really so.

I remember starting emacs on a VAX-750 with 4 MB main memory and 8 users in the 80s.

Even in the 90s with Suns and competitors in single user usage emacs could bring things to a halt. Especially when starting a certain messaging client (hej LysKOM).

Well, back then emacs was a resource hog like today VSCode using 8GB and swapping as someone wrote here. (I would not touch VSCode with a stick). Times have changed, but emacs rather little. So today it can nearly be considered lightweight and efficient.


opening any file with extra long lines (for example minified js files which somehow end up littering any codebase) in emacs is still bring emacs to its knees, especially if you have line wrapping enabled.


Ah, I don't have line wrapping enabled by default.


I wish there were a name for the generic type Foo<T> such that a rope is exactly a Foo<Char>. In other words, a rope-of-non-character primitives (like bytes or floats).

Generalizing even further, I wish there were a name for the generic type Bar<T,M> which is a b-tree of T-values, plus a monoid [1] M over T, where each non-leaf b-tree node stores the M-sum of its descendents [2]. This lets you quickly compute the M-sum of arbitrary ranges or search for an element by the M-sum of its predecessors. Ropes are essentially Bar<Char,(0,+)> so the monoid gives you the number of characters in a range.

If you let M=(0,max) or M=(0,min) you can get the min/max value in arbitrary ranges in O(log n) time. This is extremely useful for all sorts of things, from algorithmic trading (order books) to VLSI (plotting waveforms from gigantic simulations). If M1 and M2 are monoids, then $M1\cross M2$ is a monoid, so these "range summaries" are very modular -- you can mix and match them.

In the functional programming community there is a data structure called a finger tree [3] which are a specific case of Bar<T,M> implemented as an immutable data structure. Immutable data structures are nifty, they have benefits, but also costs. I don't know what to call Bar<T,M> implementations which are non-immutable (i.e. ephemeral or update-in-place) data structures.

If anybody knows of generally-accepted names for these (or wants to suggest some) please reply. I plan to add this functionality into (a fork of) LMDB.

[1] https://en.m.wikipedia.org/wiki/Monoid

[2] actually you want to store the M-sum of the descendents of each child node alongside the pointer to that child node

[3] https://en.m.wikipedia.org/wiki/Finger_tree


Ropes is a performance improving technique, not a conceptually new model to think about strings, isn't it?


Depends if you're talking from the perspective of the API client or the string storage implementor's.


Interesting, could you elaborate on that?


I was in a discussion about this recently. I would probably use an RRB tree, which would give me virtually free tree-based undo/redo, an a simple way to implement multiple cursors.


Can’t find a download link of that article on that page nor another article explaining what “ropes” are. Can somebody tell me what a rope is?


A rope is a data structure for big strings. Its a string where instead of storing the characters in an array, the characters are stored in a different data structure to make common operations faster. Ropey stores a string where the characters are kept in a b-tree. This allows log(n) time complexity for inserts or deletes anywhere in the string.

So you can have a string with millions of characters (a loaded document) and you can edit it efficiently.


It seems the biggest issue with ropes is search. The rust regex engine (which this editor uses) expects an array. In the worst case scenario you forced to copy the entire document into an array to run search on it. Only to throw that away when you are done. That offsets a lot of performance gains ropes are supposed to provide.

https://github.com/xi-editor/xi-editor/issues/1192#issuecomm...


Yeah, the search implementation right now is kind of thrown in there just to get out of the way of my editing.

There's a way to search through rope chunks by using the low level regex-automata: https://github.com/rust-lang/regex/issues/425#issuecomment-7...

Alacritty currently does that to search over it's cells, I'd like to look into it.


> That offsets a lot of performance gains ropes are supposed to provide.

It sounds like that offsets lot of performance gains the rust regex engine is supposed to provide, actually.


Search is pretty fundamental to text editors. Getting fast insertion with slow search is not always a desirable trade-off.


If the regex engine doesn't support indexing into a arbitrary (indexable) data structure, that's a deficiency in the regex engine, not in one particular such data structure. You can tell this isn't a deficiency of ropes, because the exact same thing would happen if you used[0] hash tables or b-trees.

Search might be slower with a more complicated lookup (versus a probably-inlined-and-otherwise-optimized pointer-plus-index addition), but that wasn't the problem described; the problem decribed was that non-array lookups weren't supported at all, and search required copying the data over to a array before the regex engine would deign to touch it.

0: for a example, not a suggestion that they would be a good fit for a text editor


It's not widely known but PCRE (which I assume is still the most widely used regex library) supports partial matching which you can use to interoperate with any piecewise linear string representation (ropes, gap buffers, etc).


When you think about it, keeping editor text in a naive array-based string would be absolutely nutty


It's a b-tree for strings, optimizing for editor tasks that are traditionally inefficient if you keep a text document as a simple string.

https://iq.opengenus.org/rope-data-structure/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: