Hacker News new | past | comments | ask | show | jobs | submit login
S-Expressions (1997) (csail.mit.edu)
125 points by panic on Feb 5, 2018 | hide | past | favorite | 45 comments



Is it impossible to improve on s-expression notation?

https://srfi.schemers.org/srfi-110/srfi-110.html#cant-improv...


The problem with these schemes is that they're at odds with metaprogramming. One huge advantage of s-expressions is that the code is expressed using data literals. You can take any piece of code transform it as you would any regular data and then run it.

The biggest practical improvement that I've seen was Clojure notation that expands the number of data literals to include vectors, maps, and sets. I think this greatly helps with readability as it provides additional visual cues, while still preserving homoiconicity.


> The problem with these schemes is that they're at odds with metaprogramming.

Sweet-expressions (SRFI-110) support metaprogramming just fine. Sweet-expressions are general and homoiconic (just like S-expressions are), and they're backwards-compatible with traditional S-expressions as they are normally used.

You're right that many other "readable" notations don't support metaprogramming, and I completely agree that most schemes are failures because of it. But once you agree that generality and homoiconicity are necessary, it's quite possible to create improved notations. I also think backwards compatibility is a (practical) must, but that is possible too.

For more, see: https://readable.sourceforge.io/


> The biggest practical improvement that I've seen was Clojure notation that expands the number of data literals to include vectors, maps, and sets.

Common Lisp has literal syntax for vectors (and it wasn't the first Lisp to). Its standard representation of a set is just a list, but if someone were to add proper sets to the language they could add read syntax for them easily

CL also has read syntax for maps (as has every Lisp) in the form of alists; it doesn't have literal hash tables, but those aren't terribly useful. Alists perform better for lookup for less than 20 or so keys (their writes are always constant-time), and they aren't going to be a bottleneck until you have at least several hundred keys, at which point you probably won't be writing those as literals in your source code. As well, alists have several other nice properties over hash tables, like not being tied to any equality predicate.

Alists are a standard Lisp map with literal syntax that's been around since the beginning and are often preferable to hashes, but if you really did want hash literals specifically, it's a trivial read macro.


Sweet-expressions are supposed to map easily onto lists, no?

Anyways, there are other ways to do metaprogramming. Haskell manages without homoiconicity.


As far as I know, Haskell's metaprogramming uses template haskell which works but isn't a super elegant/beautiful solution relative to lisp?


Metaprogramming is trivial when you have a) lazy evaluation, b) a very smart optimizing compiler.

That said, I do love me the Lisp macro system, but homoiconicity alone is not that useful if the representation mechanism makes it difficult to denote lots of implied things (like type inferencing).


> The problem with these schemes is that they're at odds with metaprogramming. One huge advantage of s-expressions is that the code is expressed using data literals.

I don't think it is necessary for there to be a single uniform syntax for code to be able to fully leverage metaprogramming.

Take for instance Mathematica (barring complexities of the evaluator itself...) The underlying type for almost everything in the language is the "expression," essentially a tagged list. The front-end language, called "standard form," can be converted to and from expressions. Expressions can more-or-less be directly serialized as "full form," which looks like McCarthy's M-expressions, and standard form extends this notation with operators and precedence rules. (Technically, standard form is given in a 2d graphical notation that is parsed into expressions, and "input form" is linear text. All the forms are interconvertible.) Programming in Mathematica seems to amount to writing lots of transformation rules for expressions, and it usually works out just writing the code you want to match against, but with pattern variables as holes. You are also free to write in full form for patterns if it helps make intent more precise.

I think many cases of code transformation involve a pattern to match against. A language with few syntax classes and with metavariables can support metaprogramming easily enough. There are a few examples I've seen, and, although I can't remember the names, I think MetaOCaml was one.

I guess what I'm saying is, so long as you make it so there is a "code literal," no matter the purported "homoiconicity" properties, you're good for metaprogramming.

> while still preserving homoiconicity

I think Alan Kay using the word "homoiconic" for Lisp was a mistake. The syntax of Lisp really is not the same as the internal representation, unlike the Tcl (which I understand actually executes by manipulating strings). This is a reason structured editors for Lisp don't seem to work out.

Here is a new word: autoiconic ("self representing"). A language is autoiconic if there are composable code literals. It is not necessary for an autoiconic language to have "eval," but it is necessary to be able to read in code and then write out equivalent code. Furthermore, the code literals must allow for metavariables and composability: replacing a metavariable with a code literal must result in syntactically valid code when written. This excludes Javascript even though it has template literals.

It is questionable whether Clojure is actually homoiconic, but it is arguably autoiconic since it has quasiquotation. (However, the way the quasiquoter works, a read/write cycle will result in code with a bunch of list manipulation functions. The supposed code literal disappears!)


The problem with "homoiconic" is that it is about a storage mechanism: it says that function definitions are input into the image in a textual format and are stored in more or less that format so that they can later be recalled and edited.

"homoiconic" languages can regurgitate their definitions. E.g. the set command in the POSIX shell, or LIST in classic BASIC.

Those features are helpful interactively in that they allow a human to transform the code right inside the image by recalling, changing and reinstalling definitions.

In Lisp, a data structure can be regurgited, rather than definitions. That is helpful in interactively debugging automated, rather than manual transformations.

Moreover, the data structure is not regurgitated from a homoiconic form. It is isomorphic to the text in some way so that the data structure can be visualized from the text easily and vice versa. The programmer writes transformations which manipulate the data. These transformations take place in pipeline-like successive steps, where it is useful to intercept the process at different stages.

The programmer of a homoiconic language doesn't do this; the programmer manipulates only the text. A BASIC programmer never works with the internal tokenized lines, for instance, in which a word like PRINT is condensed down to one byte. There is no access to that representation via any API from the language.

Homoiconic languages don't place any transformation between the programmer input of a definition and the definition's storage; to do so would mean not to be homoiconic!

ANSI Lisp has a very vaguely defined support for homoiconicity: the function ed which is intended to recall a function definition and allow the programmer to edit it. This feature is not very prominent; it is not the basis of Lisp interactivity. Almost every aspect of the function is implementation-dependent. To support ed, an implementation's defun operator has to store the un-expanded form somewhere.


This is a reasonable characterization of homoiconicity.

Interestingly, by this definition, v8's implementation of Javascript is homoiconic since functions can be converted to strings, recovering the actual source text for its definition.

I once took advantage of this to make a Smalltalk-like source editor for Javascript: http://tmp.esoteri.casa/grip/ (things might no longer work since I last was using it four years ago; I lightly edited it tonight to get it to start up without a file server. Only tested in Chrome!)

It's based on an class-based object system I made that can serialize all of its class definitions. This drives both the class browser and the code serializer/saver.

(Right now it has two code browsers because I was in the process of making a nicer one. Somehow, I had the idea that I was going to program a whole webapp with this thing, but I lost motivation.)


> I think Alan Kay using the word "homoiconic" for Lisp was a mistake. The syntax of Lisp really is not the same as the internal representation, unlike the Tcl (which I understand actually executes by manipulating strings). This is a reason structured editors for Lisp don't seem to work out.

Indeed. Christophe Grand had a talk at EuroClojure 2012 on this very topic: http://confreaks.tv/videos/euroclojure2012-not-so-homoiconic


> I think Alan Kay using the word "homoiconic" for Lisp was a mistake. The syntax of Lisp really is not the same as the internal representation, unlike the Tcl (which I understand actually executes by manipulating strings). This is a reason structured editors for Lisp don't seem to work out.

A Lisp interpreter executes internalized s-expressions.


If that's not observable to the program, so what? At a math lecture I once went to, the speaker said "sure, define it that way, but show me the theorems!" Everyone seems to have their own idea for what homoiconicity means, but I've yet to see anyone draw a compelling inference from such a distinction. There are vague suggestions that it's important for metaprogramming, but in my previous comment I was instead suggesting that "autoiconicity" is more definable and useful.

So "s-expression" means two things to Lispers:

1) the textual language that the reader converts into

2) an inductively defined tree form.

The original McCarthy paper calls the second s-expressions. It'd be possible to program entirely in the tree form with the appropriate editor, but this is unpopular to do. You would miss out on numerous features of the reader:

1) comments

2) quasiquotation (for Common Lisp; in Scheme this is better represented)

3) writing numeric literals with the syntax you want

4) writing symbols in the form you want (that is, symbol or package synonyms)

Anyway, the point is that Lisp source code is a sequence of characters, not a tree. Sure the parsing routine is slight, but why does everyone want to confuse the issue and say that they are literally the same thing? (Or, at the least, insist that it is special that the AST reflects the structure of the source code. Is Java's AST really so different from Java source?) Lisp doesn't need to be homoiconic to be the powerful language it is.

Alan Kay is quite a thinker and visionary, though, and he is on to something, but this particular idea needs some more refinement.

(Thought experiment: it's not impossible to make an interpreter for most languages that evaluates code by directly manipulating a string containing the original source file. Doesn't that make every language homoiconic? Code is data!)


The interface for EVAL and COMPILE is code as data, not text.

The macro facility operates on code as data.

The pretty-printer operates on code as data.

Tools like code-walkers and macroexpanders operate on code as data.

Interpreter operate on code as data. This enables sophisticated tracing and stepping (stuff which I would need to instrument in compiled code), self-modifying and inspecting code, easy access to source forms in the debugger, etc.

There are many ways, Lisp implementations make use of code as data, editing for some reasons does not seem to be especially attractive. Though structure editors exist, like the later one for Medley/Interlisp-D. Comments then are a part of s-expressions - they are not textual comments outside.

Quasiquotation is in Common Lisp not fully a feature of the reader, since the representation of quasiquote expressions is undefined. A Common Lisp implementation is free to preserve quasiquote expressions and some actually do that.

The code as string idea is possible, but the general meaning of code as data (or the less useful idea of homoiconicity) means code as data, but not as a string. Strings are the trivial uninteresting case.

No, s-expression-based Lisp code is not what I would consider to be an AST.


I have to say, I do not understand what you are addressing in my comment: whether you are rebutting, agreeing, expanding, informing, thinking out loud, or simply expressing what you think is special about Lisp.

> The interface for EVAL and COMPILE is code as data, not text.

Is that essential? or is it just convenient? I'm not aware of anything in Common Lisp (other than the spec) that requires eval to accept s-expressions rather than strings. I'm assuming objects have string forms that when read give an object that is eql.

An aside: one could say eval and compile are hacks to deal with the fact that in Lisp, code is merely a kind of data. In a language like Mathematica, data is also a kind of code (there is no difference between code and data). This has logical issues, however, so I wouldn't take code=data as a design goal.

> The macro facility operates on code as data. ... code as data ... code as data

Sure, Lisp has a data type for code and it has an expressive language to manipulate code (what I was calling "autoiconicity"). Lisp also has staged metaprogramming that seamlessly invokes such code manipulators before evaluation/compilation. Many languages, with less expressivity, can also load, manipulate, and write code -- with the code represented as structured data of course.

> (or the less useful idea of homoiconicity) means code as data, but not as a string

But if you read my comments, you'd know I think "homoiconicity" does not mean "code as data," but "essentially the same representation internally and externally." It's a property of an implementation of a language and not the language itself. Alan Kay defines the word offhandedly and asserts Lisp is homoiconic in his thesis, and, except for some implementations of Lisp, it doesn't apply in general. This is the point of my thought experiment: every language has some implementation that is homoiconic. (And so what? That's also my point.)

> Interpreter operate on code as data. This enables sophisticated tracing and stepping (stuff which I would need to instrument in compiled code), self-modifying and inspecting code, easy access to source forms in the debugger, etc.

SBCL compiles everything to machine code, even with eval. It is not essential for an interpreter to directly operate on s-expressions, and whether one does or not is not observable. These things can all be implemented by a transformation of the code after macro expansion and before compilation. Or perhaps JIT compilation like PyPy through RPython.

On the other hand, we can take most languages and create interpreters that evaluate the AST, giving us such tracing, stepping, and access to source in a debugger.

(By the way, do you really mean self-modifying and self-inspecting code? Can you give a useful example of this in Common Lisp? I wasn't aware that Common Lisp requires an implementation to be able to give the definition of a function. I thought it was a property of the implementation and not the language.)

> No, s-expression-based Lisp code is not what I would consider to be an AST.

Great. Now would you care to define AST in a way that excludes s-expressions (and in a way that isn't ad-hoc)? It seems to me that s-expressions are a tree representing syntax abstractly; I'm not sure how it's not, or what you gain (inference-wise or practically) by excluding s-expressions. Are you drawing the distinction that it is only when an s-expression is paired with an interpretation that you get an AST? (By the way, this is a powerful idea that's used for nanopass compilers: lots of stages each with a slightly different interpretation for the constituent s-expressions.)


> Homoiconicity ... same representation internally and externally

That's why it makes little sense. Lisp does not use the same representation internally and externally. Externally programs may be characters in a file or on a display. They are interned by reading them into data structures. Running programs may use different representations and one popular is the internalized representation - which is used by Lisp interpreters.

> SBCL compiles everything to machine code...

If you use its compiler. If you use its interpreter, it doesn't.

http://sbcl.org/manual/index.html#Interpreter

Sure, using an interpreter is observable. That's the reason why they exist. They offer different services to the user. Many Lisp systems offer hooks into the interpreter, which let you watch it working.

> s-expressions are tree representing syntax...

Actually s-expressions are a data format, not a format of the programming language Lisp. S-expressions know very little about Lisp - basically it provides an elaborate/hierachical form of a tokenizer output. The reader parses numbers into number objects, symbols into symbol objects, etc.

But the reader does not know anything about the syntax of Lisp, which is defined on top of s-expressions. Thus the s-expression does for example NOT represent any syntactic informations about the source code: after READ it is not known what a symbol actually stands for. Is it a data object, a function, a special operator, a macro, a variable, a local function, a local variable, a package name, a goto tag, ...? The s-expressions does not tell us. We also don't know what are function calls and what not. Etc.

From an AST I would expect that it represents syntactic information of the programming language Lisp.

To contruct this one needs to traverse the s-expressions: using compilers, interpreters and/or code walker (tools which can traverse code, collect information, maybe transform it).

The ANSI CL spec even lacks a way to access the syntactic information in macros. The facility for that didn't make it into the standard, though implementations provide a quasi-standard way.

http://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node102.html#S...


I wasn't aware of SBCL's interpreter mode, but it does say "in SBCL interpreted code is not safer or more debuggable than compiled code." I haven't used it myself yet, though, so it's possible there are interesting differences to the user. But to be clear, when I said "observable," I meant "observable to the running program." The standard I'm using: if an implementation detail is not observable to the program (with some restrictions, like a program can't just measure how long it took to execute), it's not a property of the language itself.

> That's why it makes little sense. Lisp does not use the same representation internally and externally. Externally programs may be characters in a file or on a display.

It's a bit confusing to have you reiterate what I've said in this thread without your saying you are doing so and are in agreement. Let's go back to the very beginning of our discussion:

> > I think Alan Kay using the word "homoiconic" for Lisp was a mistake. The syntax of Lisp really is not the same as the internal representation

> A Lisp interpreter executes internalized s-expressions.

"Syntax" as in "what the user of the language types." At the least, I meant sequence of characters vs s-expressions. But, I also did the confusing thing of saying "the internal representation" to mean a property that all Lisp implementations share. Some Lisp interpreters directly examine s-expressions, sure, but, given the context, I assumed that your pointing this out meant you thought it was evidence that I was mistaken to think Alan Kay was mistaken.

> Actually s-expressions are a data format, not a format of the programming language Lisp.

I take it the answer is "yes" to my question "Are you drawing the distinction that it is only when an s-expression is paired with an interpretation that you get an AST?"

> Thus the s-expression does for example NOT represent any syntactic informations about the source code

In Lisp, there are two levels of syntax: syntax classes from the source code, like symbols, numbers, lists; and syntax classes from the s-expressions, like what you go on to say.

(Meta-question: am I coming across as someone who's ignorant about the design and implementation of Lisp? My experience observing lispers communicate is that they seem to assume nobody else is familiar with the details of Lisp and how it works!)


> "in SBCL interpreted code is not safer or more debuggable than compiled code."

It's mentioned there because that would be the usual expectation: an interpreter has different features, it usually enables stepping and tracing for example.

> observable to the running program

The interpreter is a part of the running program. In many Lisp systems the interpreter has slightly different semantics which are easy to observe: for example the interpreter usually will macroexpand code before each macro call. Where the compiler will do it BEFORE the code runs and it will do it once. The interpreter usually also offers hooks, which allow code to be executed, which changes the environment of each evaluation. Which may also easy to observe from the program. For example the interpreter could collect call statistics while running the code and the code could itself access that.

> it's not a property of the language itself

Lisp is as much a runtime as it is language. It's a combination of that. This makes the interpreter and compiler part of the execution environment and allows the program to do introspection, runtime code generation, observing its own execution in the interpreter and so on.

Common Lisp has a standard language spec, but it spans a room of very different implementations. The standard is very vague about the execution, it defines a bunch of interfaces (EVAL, ...) but says very little about their implementations. Garbage collection, a basic feature almost of every implementation isn't even mentioned.

The concrete language I'm using is not ANSI CL, but it is Clozure Common Lisp, CLISP, LispWorks, etc. - each with different implementation strategies. One might not use an interpreter, the next one has the interpreter more prominently. If you type to the SBCL REPL, everything by default gets compiled. If I type to a LispWorks REPL, everything gets by default interpreted.

> Some Lisp interpreters directly examine s-expressions

If they don't use s-expression, they are not a Lisp interpreter. Running off internalized s-expressions is the definition of a Lisp interpreter.

> I take it the answer is "yes" to my question "Are you drawing the distinction that it is only when an s-expression is paired with an interpretation that you get an AST?"

A Lisp interpreter does not need to use an explicit AST as an actual tree. It may just traverse the code according to the hard coded rules for the various built-in language constructs (function calls, sequence of calls, non-local transfers, quotation, ...) and build some data structures it needs.

> n Lisp, there are two levels of syntax: syntax classes from the source code, like symbols, numbers, lists; and syntax classes from the s-expressions, like what you go on to say.

In Lisp there are two levels of syntax: s-expressions as a data syntax. Numbers, symbols, lists, arrays.

Then there is the syntax of the programming language Lisp and its language constructs.

For example the EBNF-like syntax for the special operator LET is this:

  let ({var | (var [init-form])}*) declaration* form* 
> My experience observing lispers communicate is that they seem to assume nobody else is familiar with the details of Lisp and how it works!

I had the impression that you don't know what a Lisp interpreter actually does, how it is implemented and that it is a part of a Lisp system - and thus of the running program.

Take for example the LispWorks interpreter/compiler.

  CL-USER 1 > (defvar *hooklevel* 0)
  *HOOKLEVEL*

  CL-USER 2 > (defun hook (x) 
                (let ((*evalhook* 'eval-hook-function)) 
                  (eval x)))
  HOOK

  CL-USER 3 > (compile 'hook)
  HOOK
  NIL
  NIL

  CL-USER 4 > (defun eval-hook-function (form &rest env) 
                (let ((*hooklevel* (+ *hooklevel* 1))) 
                  (format *trace-output* "~%~V@TForm:  ~S" 
                          (* *hooklevel* 2) form) 
                  (let ((values (multiple-value-list 
                                 (evalhook form 
                                           #'eval-hook-function 
                                           nil 
                                           env)))) 
                    (format *trace-output* "~%~V@TValue:~{ ~S~}" 
                            (* *hooklevel* 2) values) 
                    (values-list values))))
  EVAL-HOOK-FUNCTION

  CL-USER 5 > (compile 'eval-hook-function)
  EVAL-HOOK-FUNCTION
  NIL
  NIL

  CL-USER 6 > (hook '(cons (floor *print-base* 2) 'b))

    Form:  (CONS (FLOOR *PRINT-BASE* 2) (QUOTE B))
      Form:  (FLOOR *PRINT-BASE* 2)
        Form:  *PRINT-BASE*
        Value: 10
        Form:  2
        Value: 2
      Value: 5 0
      Form:  (QUOTE B)
      Value: B
    Value: (5 . B)
  (5 . B)

Thus I can write a user level program, which uses the Lisp interpreter to observe the program running. It tells me what forms it is running and the result values.

The program itself could now for example observe its evaluation level:

  CL-USER 9 > (hook '(list *hooklevel* (list *hooklevel*)))

    Form:  (LIST *HOOKLEVEL* (LIST *HOOKLEVEL*))
      Form:  *HOOKLEVEL*
      Value: 2
      Form:  (LIST *HOOKLEVEL*)
        Form:  *HOOKLEVEL*
        Value: 3
      Value: (3)
    Value: (2 (3))
  (2 (3))


I have the impression that you feel like you need to correct what I say because I am not saying things exactly the way that you think about them. Please forgive me: I am not interested in having this kind of discussion.


It's kinda perfect. Sexps are readable, very easily and quickly navigable and mobile (i.e. moving some expression(s) around in the program source) w/ something like paredit, very easy to partially evaluate, parse, and they are easy to write. The complexity of the program is almost entirely contained within the tree of function calls instead of partly in the syntactic representations of the operations and literals. It's very easy to teach: we have lists, they contain symbols (i.e. names), numbers, strings (i.e. text), and other lists; symbols stand for themselves when quoted, or treated as variable names if not and evaluate to the values they name; if we don't quote the lists, they are function calls, the initial item must then be a name of a function, and the rest are arguments to that function. The entire program is made up of function calls, some evaluated at compile time, some run time. That is a fairly complete definition of a Scheme and CL like syntax. Furthermore, I find the Polish notation way of writing down mathematical expressions (albeit I seldom use them) as sexps is superior to the infix notation because any possibility of ambiguity is effortlessly dealt with. Basically there's no reason to further complicate a simple and useful thing when the complexity added does not offer substantial benefits while keeping those that the simpler solution offers.


> any possibility of ambiguity is effortlessly dealt with

"(+ 1)" is rather ambiguous and evaluates to 1, compared to "1 + ", which is very straight forward-- it's an incomplete expression and no evaluation can be done.


The use of + as shorthand for sum (Σ would be clearer, but alas, not ASCII) is perhaps not ideal, but that's not an S-expression problem, but a function naming problem.


I'm curious how it is ambiguous. What two meanings do you see?


Perhaps the complaint is that there's a sort of cold-start problem -- it's not obvious that we're starting from zero. With addition, it's hard to imagine what else to start from, but perhaps starting from 1 would make sense in some contexts, where `(+ 1)` could evaluate to `2`.


In addition/summation the base case is zero. And I find it quite apt that the plus operator stand for summation in a language where operator precedence is not a debate.


I've written a bit of Scheme in production. What I've found is that S-Expressions are unambiguous, and easy to learn, but Scheme doesn't railroad you with them. There's other options as well.

A more recent project used the traditional syntax for macro-files. We seperated those out, a bit like C-header files, because it was easier for a macro-traceback library we created.

Most of the main files were in Wisp, which decreased the learning time for our Pythonistas.

Finally our hot-paths tended to be Cish. That is, they were compiled C at release, but Scheme with a C reader macro at development. (Based around Racket's c-lang).

Of all of these, I prefer Scheme's traditional syntax, maybe with the curly infix extension for dealing with OO when I have to (like interfacing with Java).

I know a lot that swear by Wisp, however.

Finally, a lot of people like Cs syntax.

I think syntax is something that people look for familiarity in, just as much as useability.

Reducing cognitive load is important, and background is highly influential.

For me, I've yet to see something beat S-Expressions. But I don't expect that can be true of everyone.

I wish pluggable syntax appeared in a wider range of languages.


  This is a weak argument. Practically all languages 
  compound symbols made from multiple characters, such as
  >=; there is no shortage of symbols. Also, nearly all
  programming languages have a function-call notation, but
  only Lisp-based languages choose s-expressions to notate
  it, so saying “we need function call notation” do not
  excuse s-expressions.
I'm ironically going to complain that this is a weak rebuttal.

For one thing, I find it hilarious to see how well this predicted the explosion of silly operators in some of the current languages. (Haskel and Scala, I'm looking in your general directions...)

Second, the point is that all function calls now look the same. Both in serialized form and in parsed form. That is, the homoiconic nature is the point, no? To treat that as a somewhat byproduct is somewhat odd.

They then go on to posit the use of meaningful whitespace. Something that is quite disgusting to me. It is dreadfully easy to mistakenly read whitespace. To the point that I do think programs are textual art, and care should be used to structure them in a pleasant and easy to read way. However, I cannot bring myself to like meaningful whitespace. YAML has given me enough headaches in life in a scant year, that I am pretty firmly on the opposite side of that fence now.


> They then go on to posit the use of meaningful whitespace. Something that is quite disgusting to me.

Clearly not everyone agrees with this viewpoint, since Python depends on it and is extremely popular.

But if you don't want meaningful whitespace, that's fine. We actually devised three notations, each one a superset of the previous ones:

* curly-infix-expressions (c-expressions)

* Neoteric-expressions (n-expressions)

* Sweet-expressions (t-expressions)

Only the last one (sweet-expressions) is whitespace-sensitive.

More info here: https://readable.sourceforge.io/


I'd be surprised if python is winning folks over because of the significant whitespace. More that it just is not a deciding factor. (Note, I'm specifically not saying it is popular despite the whitespace.)

That is to say, yes, I recognize my opinion as just an opinion. Most of it is colored by bad experiences with large YAML files, I confess.

My problem with the code is precisely that it tears down my notion of the code being nothing more than a list of data. As an example, in:

  (define (factorial n)
    (if (<= n 1)
      1
      (* n (factorial (- n 1)))))
Everything that goes together is in the same list. Some things branch off of because they are part of another list. But all subexpressions are easily visible as just lists. I can take one out and try it somewhere else. How do I know they are lists? Because they look like every other list in the program. Begin and end with a paren.

The "improved one":

  define factorial(n)
    if {n <= 1}
      1
      {n * factorial{n - 1}}
How do I know it is over? How do I trace back to what part of the code is finished when I evaluated {n * factorial{n - 1}}? Hopefully, the code is short enough that these are obviously answered. However, I am well aware that sometimes this doesn't work. And, quite frankly, the brain power to find the matched paren is much less than the brain power to determine what lexical unit the compiler would have been in, and to scan up for the matching lexical unit.

Then there is the contrarian in me that is probably just getting annoyed that they labeled the proposal as the "improved" piece of code. :D


> How do I know it is over?

In an indentation sensitive syntax, you simply look at the next line and see if it has more or less or equal indentation. Then you're done. That turns out to be extremely fast, because the eye/brain combination is exceedingly good at determining if something is left or right or something else.

In traditional s expressions, to match things up you have to count the parentheses, which the brain is extremely bad at. In practice, the brain is so bad at this that we have to depend on complicated tooling to do the formatting for us. In a typical editor, that means that we depend on things like left-right distance anyway. Since people depend on indentation in the first place, making it syntactically relevant simply applies what people already do to the syntax itself.


You somewhat skip my point. You agree that you know it was over by parsing the structure. Which you have to do in both cases. In one, though, the parser rules are more than, scan for last paren. Gets more complicated once you have stuff like the <* symbols resetting indentation and blank line rules. Or the $.

Not to say it can't be done. But I could easily argue the brain is bad at juggling rules. The more rules, the more likely mistakes. Since I am typically building rules in the logic, having extra rules in parsing seems to require more of where I will likely make mistakes.

To be clear, I am presenting my preference. I am not trying to claim superiority.


> But I could easily argue the brain is bad at juggling rules. The more rules, the more likely mistakes.

The brain isn't great at LEARNING rules, but the vast majority of programming languages use more rules than Lisp, and since most people use those other languages instead, it's clearly possible to learn more rules than the ones Lisp provides. Indeed, Lisp's syntax currently insists that you not use infix notation, which means in practice people have to learn a new syntax. Practically every other language (with the exception of Forth) supports infix notation.

The "$" has essentially the same meaning as Haskell, so if you know Haskell's rule, you already know it. The "blank line ends the block" is what most people would assume anyway. The <* ... *> ones are unusual, but they don't take long to learn or master, and counter a complaint about indentation ("stuff marches off to the right").

It's hard to argue preferences. Ideally we'd have some scientific studies really examine these kinds of issues! Sadly, those kinds of studies are in short supply.

Thanks for the commentary.


> Then there is the contrarian in me that is probably just getting annoyed that they labeled the proposal as the "improved" piece of code. :D

Well, we think it's improved. Of course, what matters is "how you measure improvement" :-). Like anything else in engineering, there are always trade-offs, and you have to figure out what you value in the trade.

That said, here are a few ways I'd argue it is better: 1. It uses standard infix notation, instead of non-standard prefix notation. The standard worldwide notation for mathematics uses infix, as do practically all programming languages. That means practically all math books use infix, and most adults have over a decade of training in infix. The majority of programmers are never going to switch to a language that doesn't support infix notation. 2. Similarly, it uses a more-standard notation for function calls, e.g., "factorial(n)" instead of "(factorial n)". I don't think that's as big a deal, but it helps. It doesn't use "," for parameter separation (it could, but the extra commas are unnecessary and dangerously hide other uses of ","). 3. You don't need to figure out "did I match all the parens" when reading the code - instead, the visual layout tells you everything. Many developers don't like all the parens and find them to be an impediment to understanding (Google "Lots of Irritating Silly Parentheses"). If you don't like the whitespace layout, fine, you can use n-expressions instead of sweet-expressions.

The broad impact of these changes is that these notations retain all of Lisp's power (macros, quasiquoting, etc.), but look much more like a modern programming language. That matters, because software development today is normally done in groups - anything that makes code harder to read is a serious problem.


I agree in general that sexps provide the best editing experience currently, but that's just a matter of tooling, it's not an inherent property of sexps vs infix or whitespace-dependent syntax.

Check out [0], a structured editing mode for Haskell, a whitespace-sensitive language.

Incidentally, while I have no problem with Lisps (I've used Clojure for about a year), I do prefer infix notation, precisely because it's not homogeneous.

[0] https://github.com/chrisdone/structured-haskell-mode


You likely prefer infix because it is familiar. It is amazing how much familiarity influences preference.

And yes, tooling can help. For either case. I'm leaning towards simpler tooling, which means me towards s-exps. Mainly because a lot of time has passed, and evidence is structured editing just doesn't take off in other languages. And I don't see requiring tooling as an improvement.


Yes, familiarity influences preference, and infix is familiar.

Almost all adults who develop software have had 12 to 20 years of practice using infix notation as a standard part of mathematics, and it's also a standard part of practically all programming languages except Lisp and Forth. It's possible to notate calculations without infix (obviously!), but that doesn't mean that people WANT to do it. It's absurd to fight against the worldwide standard of infix notation for standard math operations like "+".

For most people, {2 * {3 + 4}} is easier to understand than (* 2 (+ 3 4)). If a Lisp reader simply treats {...} as a list, where the operator is the even positions, then you have the best of both worlds: standard infix notation combined with Lisp's ability to do quasiquoting, macros, etc.


This looks really nice! Are there any largish examples written with sweet-expressions? (Seeing large chunks of code could help with forming a gestalt impression)'

Edit: found one - https://sourceforge.net/p/readable/code/ci/master/tree/src/s...


>However, I cannot bring myself to like meaningful whitespace. YAML has given me enough headaches in life in a scant year, that I am pretty firmly on the opposite side of that fence now.

The only time YAML gives me a headache with respect to whitespace is when I'm using block literals which have trailing whitespace. I don't think any other serialization format really deals with that problem well though - either you have a long, nicely readable block of text that looks ambiguous with trailing whitespace or you have a nasty escaped "long string with \n\n \n\'s" everywhere.

  Block literals: |
    are a killer feature, too.
    They are so damn useful when you have strings
    spanning multiple lines.
Meaningful whitespace means that syntactic clutter can be removed. That makes the markup much more readable because there's less distracting syntactic noise. YAML diffs are very readable, for instance, but JSON diffs (as well as diffs for every other kind of serialization format without meaningful whitespace), are very much not.


I can't remember the parts that pained me most. Some were in somewhat odd sounding "lists of single items" versus "list of single dict items" and such. In particular, the vast difference between:

  baz:
  bat: 
    - foo: bar
and

  baz:
    bat: 
    - foo: bar
is not at all obvious at a glance to me. If I convinced everyone to move to 4 or 8 character indents, it would stick out a little more, but I don't see myself winning that fight.

Worst of all, this mistake happened when I was moving some template from one place to another in a different file. Should be easy. Copy, paste, fix whitespace. The problem is that "fix whitespace" has to know about the copy/paste. You can't just reformat after the fact. Yes, tooling could "paste with indentation for current cursor" but you literally cannot "paste, then fix whitespace." It is like the worst of outlooks "paste with formatting".


I actually wrote a YAML parser that closed this loophole by raising an exception on YAML files that have inconsistent indents. Somebody reported exactly that as an issue on my issue tracker:

https://github.com/crdoconnor/strictyaml/issues/23

I'm aware that there are a bunch of annoyances like that, but I'm still convinced the core is good and making a new standard which is stricter solves most of these issues.

FWIW, I haven't run into this problem personally but it's pretty obvious that it is a problem.


Rebol/Red have more readable s-expressions as their syntax but the drawback is more complex evaluation rules.


Seems like this definition is tied to US-ASCII and that we'd need another spec (even if trivial) for any S-expressions that would intend to allow raw Unicode.


Looks awfully similar to Mark


Canonical forms seem a big win. And teeny parsers.


There's always a trade-off. These parsers aren't very large anyway, in part because they are general (sweet-expressions, like S-expressions, aren't tied to any particular semantic).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: