An editor for composed programs

tikhonj · on Aug 21, 2014

Emacs's structured editing, in the style of paredit[1], sounds very similar to a text-based version of syntax-directed editing (SDE). Designed for Lisp and S-expressions, it's a system that lets you edit and navigate around nested expressions at a syntactic rather than textual language. As you type, the correct level of nesting is maintained and there are commands for moving and editing code in terms of s-expressions rather than characters.

In practice, it's very easy to start using and very powerful... but requires using Lisp. It'll seem odd if you're not regularly a Lisp user.

Happily, this idea can generalize well enough to distinctly non-Lisp languages. Structured Haskell mode[2] does this for Haskell, which has a less regular and sparser syntax than s-expressions. The documentation has a bunch of handy animations showing the various features, giving you an idea of how it works without having to install it yourself.

This doesn't address the meat of the blog post—composing languages and grammars by taking advantage of structured editing—but it shows that the necessary foundation is possible with a perfectly normal text editor. It would be interesting to see if some of the ideas from Eco could be added to structured editing modes like this without requiring you to use an external, self-contained editor.

[1]: http://www.emacswiki.org/emacs/ParEdit

[2]: https://github.com/chrisdone/structured-haskell-mode

Also, here's a great video of Paredit in action:

https://www.youtube.com/watch?v=D6h5dFyyUX0

klibertp · on Aug 21, 2014

> with a perfectly normal text editor

That's... a bit unfair characterization of Emacs. It is built for extensibility, built as a development platform for Elisp packages - and that's not how most editors are designed and coded.

Anyway, it's probably completely possible to adopt many or most proposed features into Emacs but it could be prohibitively hard to adopt them into some other editors.

dgellow · on Aug 21, 2014

There is the web-mode for emacs [1] to manipulate files with different languages. It is oriented for editing templates but something more general should be possible.

Currently I am using it a lot to write React's JSX which combine javascript and XML.

[1] http://web-mode.org/

kazinator · on Aug 21, 2014

... and of course, in Lisp you can embed other languages in terms of their syntax trees, so the composed language problem goes away. Your HTML just looks like (html (head ...) (body (h1 ...))) or whatever; no "language box for HTML" needed.

mtdewcmu · on Aug 21, 2014

>> Your HTML just looks like (html (head ...) (body (h1 ...))) or whatever; no "language box for HTML" needed.

In your example, the language composition is smooth because there's only one language there: Lisp. HTML has capitulated.

willismichael · on Aug 21, 2014

That's the point, HTML is just a hobbled lisp with an awkward syntax. If you're building a template engine in a lisp, it's much nicer to just use the native lisp syntax instead.

judk · on Aug 21, 2014

HTML is a subset of lisp with a different syntax. The rest of your point holds without lashing out at HTML.

In different environments, different syntaxes show strength. What happens if you misplace a ) vs misplacing an </tag> ?

kazinator · on Aug 21, 2014

What happens when you misplace a tag, and it's around 1995, is that browsers somehow parse it anyway without complaining, rendering the page anyway. Users then believe that this is the correct syntax: closing tags is optional, as is the order in which it is done. Twenty years later, you end up with a mess of a situation where every browser has to support not only its own historic buggy behaviors but those of other browsers.

You cannot "misplace" a closing parenthesis, at best you can have too many of them not enough. Of course, you can enclose the wrong amount of material, but to see what is being enclosed, you just need parenthesis matching support in the editor. This is more common than XML tag matching support. E.g.

http://stackoverflow.com/questions/500989/jump-to-matching-x...

kazinator · on Aug 21, 2014

> "In a sense, when programming in a text editor I think in trees; mentally flatten that tree structure into sequences of ASCII characters that I can type into the text editor; and then have the parser recreate the tree structure I had in my head when I wrote the program."

This is quite an amazing article in that the author really "gets" Lisp and yet at the same time doesn't get it.

(It isn't mentioned; don't waste time looking.)

If you admit you're already thinking in terms of that tree structure already, just freakin' use it instead of this charade of encoding into in some completely rearranged way that has to be unraveled back to what you were originally thinking of anyway (or something else).

jamii · on Aug 21, 2014

The same thing is true for lisp. Writing a tree-based interaction mode like paredit that doesn't die horribly on malformed text is a non-trivial problem (http://mumble.net/~campbell/emacs/paredit.el - 2625 lines of emacs lisp just to edit a tree).

mtdewcmu · on Aug 21, 2014

The trend as you move away from simple sequences, such as binary numbers, to state machines, toward trees and general graphs, is that problems become rapidly less tractable. The situation degrades strikingly fast: numbers have sophisticated algebras, regular languages are fairly comprehensible; but on the other end, graphs are replete with NP-complete (intractable) problems. Intractable problems aren't something you have to solve, obviously, because you won't get anywhere. But I think that not only are the hard problems in graphs intractable, the hardness of graphs carries over to the simple problems. Nothing is simple on graphs, actually, and trees are not far from graphs. So if unstructured text seems almost too simple, we should feel glad, because trees are already nettlesome even to write straightforward code for. I feel that hierarchies, as seductive as they seem, almost invariably bite; I think it's a good principle to rule out all flat solutions before considering hierarchies as solutions to any problem.

innguest · on Aug 21, 2014

I feel the same way.

I had to write a library to transform JSON from one shape into another, carrying over the values but placing them in different nodes on a new tree. I found that thinking in terms of tree transformations was incredibly hard. The way I solved it was by flattening the JSON, transforming the flattened version, and when I'm done, inflating it back up.

In other words, working with the simpler, "2D" version of a JSON was much simpler than its "3D" tree structure, and writing flatten/unflatten independent from the transforming code was a nice modularity bonus.

It's really all about decomposing the problem and solving its parts.

kazinator · on Aug 21, 2014

If you think in terms of the operations that perform the tree transformations, of course you will have difficulty because, you're using a different tree: the syntax tree representing the code of those transformations! Not the syntax tree of the source and destination structure. So you are "one removed" from the real problem. Moreover, since both domains are trees, it's confusing.

It sounds like you could have benefited from a way to express pattern matching using JSON syntax, for destructuring a JSON object, and to represent the synthesis of new JSON using a "JSON quasiquote".

E.g. Lisp transformation with classic destructuring-bind and backquote:

    (destructuring-bind (a (b (c &rest d) e) f (g h)) some-obj
      `((,a ,b) ,c (,@d 1 2 3 ,e) (,f ,g, h)))

There are pattern matching libraries nowadays that do a lot more than destructuring-bind, but it illustrates the basic point.

The destructuring-bind macro writes the code to pull apart some-obj according to the tree picture with embedded variables. The backquote syntax generates the code whose evaluation synthesizes the new tree object according to a template, with the values of expressions indicated by , and ,@ substituted and spliced into the template.

innguest · on Aug 21, 2014

I never thought of using destructuring-bind together with quasiquoting in such a direct way. It does seem that that's a simpler way than what I have done. I'll have to think about it some more. Though we have to consider I had to write this in Ruby, where I don't have access to CL niceties. :)

It did get me thinking, though, so I appreciate the advice. I'm wondering now how to port that idea to Ruby.

kazinator · on Aug 22, 2014

More commonly, this destructuring is done in macros, by their "macro lambda lists". destructuring-bind is just a binding gadget which performs almost the same destructuring as macro lambda lists.

(Why almost: because "destructuring lambda lists" don't support the environment parameter of macro lambda lists!)

abecedarius · on Aug 21, 2014

The editor in http://en.wikipedia.org/wiki/Interlisp did work directly on list structure instead of on top of a character-string editor. (I never studied it carefully since at least at first it worked through a tty interface and that's pretty inherently a pain, like Unix ed.) I think I read somewhere that Deutsch's PDP-1 Lisp pioneered that editing model, though I may be completely misremembering.

agumonkey · on Aug 21, 2014

When looking at pictures of PDP-1 http://de.wikipedia.org/wiki/PDP-1, one may have a hard time believing someone thought of structure based editing at all.

ps: I rarely hear about http://en.wikipedia.org/wiki/L_Peter_Deutsch but his work is impressive.

abecedarius · on Aug 21, 2014

It seems to make sense: the interpreter already stores list structure to interpret, there's no space in RAM for a redundant second representation, and going to external storage for the text would practically force batch mode when the store is paper tape or punch cards. That's similar to 70s/80s home computers keeping BASIC code in RAM in tokenized form, except there you had to type a line over again to edit it.

(But on larger machines in the 70s I get the impression Interlisp stayed with structure-editing for its advantages, not just inertia. It was killed off by Common Lisp which owed more to the east-coast Lispers who gave us Emacs.)

agumonkey · on Aug 21, 2014

Good point. Nowadays the constraint of communicating between processes makes text serializing valuable, when back in the days direct manipulation (I believe there was no processes and no isolation) was the norm ?

I thought emacs text buffer genes came from being implemented on and for *nix ..

taeric · on Aug 21, 2014

I think the point is that there is only the tree structure in lisp. Even though you can embed things with the reader macro, most of the time this is frowned upon specifically because it hides the actual structure.

There are some difficulties in this, of course. Sometimes I don't think in terms of trees of information. This is relatively rare, though.

al2o3cr · on Aug 21, 2014

Interesting idea, but the code example used completely puts me off - it's literally the opposite of well-factored code, with HTML, Python and SQL all smeared together.

tephra · on Aug 21, 2014

Here are the slides [1] from ECOOP 2014 presenting his research. It was a really interesting talk. He also did a demo which was cool.

[1] http://ecoop14.it.uu.se/programme/ecoop_ss_tratt.pdf

sparkie · on Aug 21, 2014

Additionally, they published this paper last year with an early demo of the editor linked on Youtube. I've posted it on here a few times but it hasn't had much response.

http://soft-dev.org/pubs/pdf/diekmann_tratt__parsing_compose...

https://www.youtube.com/watch?v=LMzrTb22Ot8

jimbobimbo · on Aug 21, 2014

Here's how Visual Studio 2013 handles HTML, C# and JavaScript syntax coloring in one Razor view: http://i.imgur.com/he1ieG9.png

Guvante · on Aug 21, 2014

Interesting idea, and any research into getting away from text editors is positive.

However in order to compete with text, which has a bountiful number of editors available. You need to beat the programmers favorite editor at writing the program. Doing this given the competition size is incredibly hard.

While I think the has merit at a high level, poor implementation isn't the biggest thing holding back non-text implementations. Being the best editor for everyone has been.

On a related note, is there a reason they didn't go with some form of escaping? Similar to CDATA in XML.

mtdewcmu · on Aug 21, 2014

>> any research into getting away from text editors is positive.

Any research at all? I'm curious: what it is that's unworkable about text in your view?

Guvante · on Aug 21, 2014

> Any research at all?

Other people spending money to figure out if an alternative idea will work is good for me.

> what it is that's unworkable about text in your view?

Nothing, it just feels like a local maximum. Typing is incredibly efficient, so I don't think it is going anywhere.

There are plenty of ways of showing that in a certain situation non-text can be more efficient, so I believe there is a way to introduce non-text that can make the general case more efficient.

rtpg · on Aug 21, 2014

The issue with escaping is that the inner language might have escape tokens of a similar form. Though if they're mechanically inserted, I'm sure _some_ sequence of characters will be obscure enough to be confident of lack of conflict (except maybe for APL). Something like $@#/></!* might work.

sparkie · on Aug 21, 2014

The issue isn't just which characters are obscure enough to use - it's that they must be unique for each combination of languages and we could have potentially infinite combinations. Using the CDATA example, ]>> is pretty rare and we're unlikely to see it used in other languages - the problem is it used in one language - XML itself. If you happen to have some XML data embedded in a CDATA, and that XML also has CDATA of its own, you've broken the parser, because it will match the inner ]>> as the closing tag for the outer CDATA. To solve this, we'd either need to modify the parser for the inner XML to use something other than ]>> for it's CDATA, or we'd need a parser which ensures these escaping delimiters are evenly balanced - and our regular parsing algorithms are unsuitable, so we'd need SDE.

Of interest are languages like Nemerle, which allow you to define additional syntax for the language through macro systems - however, there are dedicated delimiters which can be used to enclose sequences to ensure unambiguity. (Nemerle uses a PEG based parser).

Another interesting approach is taken by Wyvern, which was just posted recently - which uses different indentation levels to disambiguate different languages in the same text file.

What's interesting about Tratt and Diekmann's model (Language Boxes), is they do not specify a storage format - although they use a tree based format in the implementation of eco, one could in theory, spit out plain text in languages like Nemerle and Wyvern - where the editor can do the job of selecting the right escape delimiters for each embedded language - and the result is just a plain text file which can be accepted by the normal compilers, but that's still open for research.

Guvante · on Aug 21, 2014

You could try supporting a subset of languages to fit in your requirements. For instance disallowing tab characters and using them for control.

Additionally problem character replacement isn't so bad if it is machine controlled, since it can be more complex and making it human editable isn't a problem since your current format isn't human editable.

jamii · on Aug 21, 2014

It is effectively just a form of magic bracket that never needs to be escaped.

mtdewcmu · on Aug 21, 2014

XML is about as good a format as any could be for this kind of application. XML, after all, stands for eXtensible Markup Language, and brackets that provide side-channel data on programming language are nothing if not markup. It would be a mistake to abandon text as a storage format when suitable formats are not only popular but ubiquitous.

npatrick04 · on Aug 21, 2014

I've been using Org Mode with emacs for writing literate programs. You could write a simple front to Org Mode to decrease the clutter...but the hierarchical nature of the Org Mode files tends to reduce the issue of clutter already, so it may not be worthwhile.

http://orgmode.org/manual/Working-With-Source-Code.html#Work...

pfraze · on Aug 21, 2014

I will almost certainly be giving this a try tonight.

rch · on Aug 21, 2014

nice.