> The problem with these schemes is that they're at odds with metaprogramming. One huge advantage of s-expressions is that the code is expressed using data literals.
I don't think it is necessary for there to be a single uniform syntax for code to be able to fully leverage metaprogramming.
Take for instance Mathematica (barring complexities of the evaluator itself...) The underlying type for almost everything in the language is the "expression," essentially a tagged list. The front-end language, called "standard form," can be converted to and from expressions. Expressions can more-or-less be directly serialized as "full form," which looks like McCarthy's M-expressions, and standard form extends this notation with operators and precedence rules. (Technically, standard form is given in a 2d graphical notation that is parsed into expressions, and "input form" is linear text. All the forms are interconvertible.) Programming in Mathematica seems to amount to writing lots of transformation rules for expressions, and it usually works out just writing the code you want to match against, but with pattern variables as holes. You are also free to write in full form for patterns if it helps make intent more precise.
I think many cases of code transformation involve a pattern to match against. A language with few syntax classes and with metavariables can support metaprogramming easily enough. There are a few examples I've seen, and, although I can't remember the names, I think MetaOCaml was one.
I guess what I'm saying is, so long as you make it so there is a "code literal," no matter the purported "homoiconicity" properties, you're good for metaprogramming.
> while still preserving homoiconicity
I think Alan Kay using the word "homoiconic" for Lisp was a mistake. The syntax of Lisp really is not the same as the internal representation, unlike the Tcl (which I understand actually executes by manipulating strings). This is a reason structured editors for Lisp don't seem to work out.
Here is a new word: autoiconic ("self representing"). A language is autoiconic if there are composable code literals. It is not necessary for an autoiconic language to have "eval," but it is necessary to be able to read in code and then write out equivalent code. Furthermore, the code literals must allow for metavariables and composability: replacing a metavariable with a code literal must result in syntactically valid code when written. This excludes Javascript even though it has template literals.
It is questionable whether Clojure is actually homoiconic, but it is arguably autoiconic since it has quasiquotation. (However, the way the quasiquoter works, a read/write cycle will result in code with a bunch of list manipulation functions. The supposed code literal disappears!)
The problem with "homoiconic" is that it is about a storage mechanism: it says that function definitions are input into the image in a textual format and are stored in more or less that format so that they can later be recalled and edited.
"homoiconic" languages can regurgitate their definitions. E.g. the set command in the POSIX shell, or LIST in classic BASIC.
Those features are helpful interactively in that they allow a human to transform the code right inside the image by recalling, changing and reinstalling definitions.
In Lisp, a data structure can be regurgited, rather than definitions. That is helpful in interactively debugging automated, rather than manual transformations.
Moreover, the data structure is not regurgitated from a homoiconic form. It is isomorphic to the text in some way so that the data structure can be visualized from the text easily and vice versa. The programmer writes transformations which manipulate the data. These transformations take place in pipeline-like successive steps, where it is useful to intercept the process at different stages.
The programmer of a homoiconic language doesn't do this; the programmer manipulates only the text. A BASIC programmer never works with the internal tokenized lines, for instance, in which a word like PRINT is condensed down to one byte. There is no access to that representation via any API from the language.
Homoiconic languages don't place any transformation between the programmer input of a definition and the definition's storage; to do so would mean not to be homoiconic!
ANSI Lisp has a very vaguely defined support for homoiconicity: the function ed which is intended to recall a function definition and allow the programmer to edit it. This feature is not very prominent; it is not the basis of Lisp interactivity. Almost every aspect of the function is implementation-dependent. To support ed, an implementation's defun operator has to store the un-expanded form somewhere.
This is a reasonable characterization of homoiconicity.
Interestingly, by this definition, v8's implementation of Javascript is homoiconic since functions can be converted to strings, recovering the actual source text for its definition.
I once took advantage of this to make a Smalltalk-like source editor for Javascript: http://tmp.esoteri.casa/grip/ (things might no longer work since I last was using it four years ago; I lightly edited it tonight to get it to start up without a file server. Only tested in Chrome!)
It's based on an class-based object system I made that can serialize all of its class definitions. This drives both the class browser and the code serializer/saver.
(Right now it has two code browsers because I was in the process of making a nicer one. Somehow, I had the idea that I was going to program a whole webapp with this thing, but I lost motivation.)
> I think Alan Kay using the word "homoiconic" for Lisp was a mistake. The syntax of Lisp really is not the same as the internal representation, unlike the Tcl (which I understand actually executes by manipulating strings). This is a reason structured editors for Lisp don't seem to work out.
> I think Alan Kay using the word "homoiconic" for Lisp was a mistake. The syntax of Lisp really is not the same as the internal representation, unlike the Tcl (which I understand actually executes by manipulating strings). This is a reason structured editors for Lisp don't seem to work out.
A Lisp interpreter executes internalized s-expressions.
If that's not observable to the program, so what? At a math lecture I once went to, the speaker said "sure, define it that way, but show me the theorems!" Everyone seems to have their own idea for what homoiconicity means, but I've yet to see anyone draw a compelling inference from such a distinction. There are vague suggestions that it's important for metaprogramming, but in my previous comment I was instead suggesting that "autoiconicity" is more definable and useful.
So "s-expression" means two things to Lispers:
1) the textual language that the reader converts into
2) an inductively defined tree form.
The original McCarthy paper calls the second s-expressions. It'd be possible to program entirely in the tree form with the appropriate editor, but this is unpopular to do. You would miss out on numerous features of the reader:
1) comments
2) quasiquotation (for Common Lisp; in Scheme this is better represented)
3) writing numeric literals with the syntax you want
4) writing symbols in the form you want (that is, symbol or package synonyms)
Anyway, the point is that Lisp source code is a sequence of characters, not a tree. Sure the parsing routine is slight, but why does everyone want to confuse the issue and say that they are literally the same thing? (Or, at the least, insist that it is special that the AST reflects the structure of the source code. Is Java's AST really so different from Java source?) Lisp doesn't need to be homoiconic to be the powerful language it is.
Alan Kay is quite a thinker and visionary, though, and he is on to something, but this particular idea needs some more refinement.
(Thought experiment: it's not impossible to make an interpreter for most languages that evaluates code by directly manipulating a string containing the original source file. Doesn't that make every language homoiconic? Code is data!)
The interface for EVAL and COMPILE is code as data, not text.
The macro facility operates on code as data.
The pretty-printer operates on code as data.
Tools like code-walkers and macroexpanders operate on code as data.
Interpreter operate on code as data. This enables sophisticated tracing and stepping (stuff which I would need to instrument in compiled code), self-modifying and inspecting code, easy access to source forms in the debugger, etc.
There are many ways, Lisp implementations make use of code as data, editing for some reasons does not seem to be especially attractive. Though structure editors exist, like the later one for Medley/Interlisp-D. Comments then are a part of s-expressions - they are not textual comments outside.
Quasiquotation is in Common Lisp not fully a feature of the reader, since the representation of quasiquote expressions is undefined. A Common Lisp implementation is free to preserve quasiquote expressions and some actually do that.
The code as string idea is possible, but the general meaning of code as data (or the less useful idea of homoiconicity) means code as data, but not as a string. Strings are the trivial uninteresting case.
No, s-expression-based Lisp code is not what I would consider to be an AST.
I have to say, I do not understand what you are addressing in my comment: whether you are rebutting, agreeing, expanding, informing, thinking out loud, or simply expressing what you think is special about Lisp.
> The interface for EVAL and COMPILE is code as data, not text.
Is that essential? or is it just convenient? I'm not aware of anything in Common Lisp (other than the spec) that requires eval to accept s-expressions rather than strings. I'm assuming objects have string forms that when read give an object that is eql.
An aside: one could say eval and compile are hacks to deal with the fact that in Lisp, code is merely a kind of data. In a language like Mathematica, data is also a kind of code (there is no difference between code and data). This has logical issues, however, so I wouldn't take code=data as a design goal.
> The macro facility operates on code as data. ... code as data ... code as data
Sure, Lisp has a data type for code and it has an expressive language to manipulate code (what I was calling "autoiconicity"). Lisp also has staged metaprogramming that seamlessly invokes such code manipulators before evaluation/compilation. Many languages, with less expressivity, can also load, manipulate, and write code -- with the code represented as structured data of course.
> (or the less useful idea of homoiconicity) means code as data, but not as a string
But if you read my comments, you'd know I think "homoiconicity" does not mean "code as data," but "essentially the same representation internally and externally." It's a property of an implementation of a language and not the language itself. Alan Kay defines the word offhandedly and asserts Lisp is homoiconic in his thesis, and, except for some implementations of Lisp, it doesn't apply in general. This is the point of my thought experiment: every language has some implementation that is homoiconic. (And so what? That's also my point.)
> Interpreter operate on code as data. This enables sophisticated tracing and stepping (stuff which I would need to instrument in compiled code), self-modifying and inspecting code, easy access to source forms in the debugger, etc.
SBCL compiles everything to machine code, even with eval. It is not essential for an interpreter to directly operate on s-expressions, and whether one does or not is not observable. These things can all be implemented by a transformation of the code after macro expansion and before compilation. Or perhaps JIT compilation like PyPy through RPython.
On the other hand, we can take most languages and create interpreters that evaluate the AST, giving us such tracing, stepping, and access to source in a debugger.
(By the way, do you really mean self-modifying and self-inspecting code? Can you give a useful example of this in Common Lisp? I wasn't aware that Common Lisp requires an implementation to be able to give the definition of a function. I thought it was a property of the implementation and not the language.)
> No, s-expression-based Lisp code is not what I would consider to be an AST.
Great. Now would you care to define AST in a way that excludes s-expressions (and in a way that isn't ad-hoc)? It seems to me that s-expressions are a tree representing syntax abstractly; I'm not sure how it's not, or what you gain (inference-wise or practically) by excluding s-expressions. Are you drawing the distinction that it is only when an s-expression is paired with an interpretation that you get an AST? (By the way, this is a powerful idea that's used for nanopass compilers: lots of stages each with a slightly different interpretation for the constituent s-expressions.)
> Homoiconicity ... same representation internally and externally
That's why it makes little sense. Lisp does not use the same representation internally and externally. Externally programs may be characters in a file or on a display. They are interned by reading them into data structures. Running programs may use different representations and one popular is the internalized representation - which is used by Lisp interpreters.
> SBCL compiles everything to machine code...
If you use its compiler. If you use its interpreter, it doesn't.
Sure, using an interpreter is observable. That's the reason why they exist. They offer different services to the user. Many Lisp systems offer hooks into the interpreter, which let you watch it working.
> s-expressions are tree representing syntax...
Actually s-expressions are a data format, not a format of the programming language Lisp. S-expressions know very little about Lisp - basically it provides an elaborate/hierachical form of a tokenizer output. The reader parses numbers into number objects, symbols into symbol objects, etc.
But the reader does not know anything about the syntax of Lisp, which is defined on top of s-expressions. Thus the s-expression does for example NOT represent any syntactic informations about the source code: after READ it is not known what a symbol actually stands for. Is it a data object, a function, a special operator, a macro, a variable, a local function, a local variable, a package name, a goto tag, ...? The s-expressions does not tell us. We also don't know what are function calls and what not. Etc.
From an AST I would expect that it represents syntactic information of the programming language Lisp.
To contruct this one needs to traverse the s-expressions: using compilers, interpreters and/or code walker (tools which can traverse code, collect information, maybe transform it).
The ANSI CL spec even lacks a way to access the syntactic information in macros. The facility for that didn't make it into the standard, though implementations provide a quasi-standard way.
I wasn't aware of SBCL's interpreter mode, but it does say "in SBCL interpreted code is not safer or more debuggable than compiled code." I haven't used it myself yet, though, so it's possible there are interesting differences to the user. But to be clear, when I said "observable," I meant "observable to the running program." The standard I'm using: if an implementation detail is not observable to the program (with some restrictions, like a program can't just measure how long it took to execute), it's not a property of the language itself.
> That's why it makes little sense. Lisp does not use the same representation internally and externally. Externally programs may be characters in a file or on a display.
It's a bit confusing to have you reiterate what I've said in this thread without your saying you are doing so and are in agreement. Let's go back to the very beginning of our discussion:
> > I think Alan Kay using the word "homoiconic" for Lisp was a mistake. The syntax of Lisp really is not the same as the internal representation
> A Lisp interpreter executes internalized s-expressions.
"Syntax" as in "what the user of the language types." At the least, I meant sequence of characters vs s-expressions. But, I also did the confusing thing of saying "the internal representation" to mean a property that all Lisp implementations share. Some Lisp interpreters directly examine s-expressions, sure, but, given the context, I assumed that your pointing this out meant you thought it was evidence that I was mistaken to think Alan Kay was mistaken.
> Actually s-expressions are a data format, not a format of the programming language Lisp.
I take it the answer is "yes" to my question "Are you drawing the distinction that it is only when an s-expression is paired with an interpretation that you get an AST?"
> Thus the s-expression does for example NOT represent any syntactic informations about the source code
In Lisp, there are two levels of syntax: syntax classes from the source code, like symbols, numbers, lists; and syntax classes from the s-expressions, like what you go on to say.
(Meta-question: am I coming across as someone who's ignorant about the design and implementation of Lisp? My experience observing lispers communicate is that they seem to assume nobody else is familiar with the details of Lisp and how it works!)
> "in SBCL interpreted code is not safer or more debuggable than compiled code."
It's mentioned there because that would be the usual expectation: an interpreter has different features, it usually enables stepping and tracing for example.
> observable to the running program
The interpreter is a part of the running program. In many Lisp systems the interpreter has slightly different semantics which are easy to observe: for example the interpreter usually will macroexpand code before each macro call. Where the compiler will do it BEFORE the code runs and it will do it once. The interpreter usually also offers hooks, which allow code to be executed, which changes the environment of each evaluation. Which may also easy to observe from the program. For example the interpreter could collect call statistics while running the code and the code could itself access that.
> it's not a property of the language itself
Lisp is as much a runtime as it is language. It's a combination of that. This makes the interpreter and compiler part of the execution environment and allows the program to do introspection, runtime code generation, observing its own execution in the interpreter and so on.
Common Lisp has a standard language spec, but it spans a room of very different implementations. The standard is very vague about the execution, it defines a bunch of interfaces (EVAL, ...) but says very little about their implementations. Garbage collection, a basic feature almost of every implementation isn't even mentioned.
The concrete language I'm using is not ANSI CL, but it is Clozure Common Lisp, CLISP, LispWorks, etc. - each with different implementation strategies. One might not use an interpreter, the next one has the interpreter more prominently. If you type to the SBCL REPL, everything by default gets compiled. If I type to a LispWorks REPL, everything gets by default interpreted.
> Some Lisp interpreters directly examine s-expressions
If they don't use s-expression, they are not a Lisp interpreter. Running off internalized s-expressions is the definition of a Lisp interpreter.
> I take it the answer is "yes" to my question "Are you drawing the distinction that it is only when an s-expression is paired with an interpretation that you get an AST?"
A Lisp interpreter does not need to use an explicit AST as an actual tree. It may just traverse the code according to the hard coded rules for the various built-in language constructs (function calls, sequence of calls, non-local transfers, quotation, ...) and build some data structures it needs.
> n Lisp, there are two levels of syntax: syntax classes from the source code, like symbols, numbers, lists; and syntax classes from the s-expressions, like what you go on to say.
In Lisp there are two levels of syntax: s-expressions as a data syntax. Numbers, symbols, lists, arrays.
Then there is the syntax of the programming language Lisp and its language constructs.
For example the EBNF-like syntax for the special operator LET is this:
let ({var | (var [init-form])}*) declaration* form*
> My experience observing lispers communicate is that they seem to assume nobody else is familiar with the details of Lisp and how it works!
I had the impression that you don't know what a Lisp interpreter actually does, how it is implemented and that it is a part of a Lisp system - and thus of the running program.
Take for example the LispWorks interpreter/compiler.
Thus I can write a user level program, which uses the Lisp interpreter to observe the program running. It tells me what forms it is running and the result values.
The program itself could now for example observe its evaluation level:
I have the impression that you feel like you need to correct what I say because I am not saying things exactly the way that you think about them. Please forgive me: I am not interested in having this kind of discussion.
I don't think it is necessary for there to be a single uniform syntax for code to be able to fully leverage metaprogramming.
Take for instance Mathematica (barring complexities of the evaluator itself...) The underlying type for almost everything in the language is the "expression," essentially a tagged list. The front-end language, called "standard form," can be converted to and from expressions. Expressions can more-or-less be directly serialized as "full form," which looks like McCarthy's M-expressions, and standard form extends this notation with operators and precedence rules. (Technically, standard form is given in a 2d graphical notation that is parsed into expressions, and "input form" is linear text. All the forms are interconvertible.) Programming in Mathematica seems to amount to writing lots of transformation rules for expressions, and it usually works out just writing the code you want to match against, but with pattern variables as holes. You are also free to write in full form for patterns if it helps make intent more precise.
I think many cases of code transformation involve a pattern to match against. A language with few syntax classes and with metavariables can support metaprogramming easily enough. There are a few examples I've seen, and, although I can't remember the names, I think MetaOCaml was one.
I guess what I'm saying is, so long as you make it so there is a "code literal," no matter the purported "homoiconicity" properties, you're good for metaprogramming.
> while still preserving homoiconicity
I think Alan Kay using the word "homoiconic" for Lisp was a mistake. The syntax of Lisp really is not the same as the internal representation, unlike the Tcl (which I understand actually executes by manipulating strings). This is a reason structured editors for Lisp don't seem to work out.
Here is a new word: autoiconic ("self representing"). A language is autoiconic if there are composable code literals. It is not necessary for an autoiconic language to have "eval," but it is necessary to be able to read in code and then write out equivalent code. Furthermore, the code literals must allow for metavariables and composability: replacing a metavariable with a code literal must result in syntactically valid code when written. This excludes Javascript even though it has template literals.
It is questionable whether Clojure is actually homoiconic, but it is arguably autoiconic since it has quasiquotation. (However, the way the quasiquoter works, a read/write cycle will result in code with a bunch of list manipulation functions. The supposed code literal disappears!)