Sweet.js: Hygienic Macros for JavaScript

saurik · on Sept 23, 2012

Despite spending a lot of time both in the design Wiki and in the talk discussing the importance of being able to determine whether a / indicates a regular expression literal or a division operator entirely within the lexer (as opposed to using the parser, which is how JavaScript is generally defined), the algorithm that this developer implemented does not actually work.

First off, an example where it works:

    a
    /5/
    7

If you run this through sjs you get:

    a / 5 / 7;

This is because, in JavaScript, statements continue across line boundaries until they are either explicitly terminated by a semicolon or a syntax error, in which case the parse is retried at that point as if a semicolon had been provided. In this case, that means we have a single statement that is a division of these three expressions: a, 5, and 7.

However, let's take a more difficult case:

    a = function() {}
    /5/
    7

This is also a single statement: you are entirely allowed to attempt to divide a function literal by a number, you will simply get the value NaN as output. If you take this file and run it through node, adding a "console.log(a)" to the end, that is in fact what you will get: NaN. However, when first run through sjs, you instead get "[Function]".

The reason is that sjs translated the code to:

    a = function () {
    };
    /5/;
    7;

This is incorrect, and demonstrates how difficult some of these underlying issues are when parsing languages that have intertwined lexer and parser state. :( Attempting some other test cases involving regular expressions (but not semicolon insertion) also failed: it seems a lot more work will need to be done on this before it will be able to process general input (and it is not 100% clear to me that the shortcut required is even possible: I haven't thought enough about it yet to say for certain, however).

(I work on the JavaScript parser for a compile-to-JS language used by people doing jailbroken-iOS development for live introspection of running processes, and thereby that was the first thing I was interested in: how well the parser worked. ;P I have intentions to add reader macros, and then replace all of the extra Objective-C syntax I added with them, but I haven't gotten around to it yet. FWIW: I actually found and fixed a bug in my parser while writing this comment. ;P)

disnet · on Sept 23, 2012

Yeah there are certainly a few bugs remaining in the reader :)

It actually does the right thing if the function is named:

    a = function foo() {}
    /5/
    7

correctly translates to:

    a = function foo() {
    } / 42 / 7;

But clearly I missed the unnamed case. You mentioned finding a few other bugs? Would you mind submitting a bug report on github? I would love to fix those too!

saurik · on Sept 23, 2012

Sorry, was distracted by the other conversation with dherman, and really shouldn't be spending any time on this anyway, but I've verified your operator associativity is wrong.

To start with some example code:

    for (var a = 7 in /7/ in 9 in b);

If I run that in node I get:

    TypeError: Cannot use 'in' operator to search for '/7/' in 9

As this code is vaguely equivalent to:

    var a = 7; for (a in ((/7/ in 9) in b));

However, it gets converted by sjs to:

    for (var a = 7 in (/7/ in (9 in b)));

That's obviously different, and gives this error instead:

    ReferenceError: b is not defined

(I really should get back to actually doing my job now, though; if dherman responds again I'll totally notice and follow up: that conversation is really interesting to me.)

disnet · on Sept 23, 2012

Thanks for taking the time to write these up! I'm tracking them here [1]. The first two should be fixed and I should have the third ready soon.

[1] https://github.com/mozilla/sweet.js/issues/18

saurik · on Sept 23, 2012

function a() { return /7/; }

saurik · on Sept 23, 2012

There seemed to be some confusion during the question and answer segment regarding the relative hygiene of macros in Clojure near the end of the motivation and design talk; while it was totally off-topic for the video (and it thereby made sense to take it offline), I personally wish I had been around afterwards to ask the guy who seemed so adamant that syntax-case was fundamentally better than the Clojure solution (which he claimed didn't do it correctly) why that was the case.

I'm totally willing to believe it, but based on my understanding (which sadly is somewhat limited for Scheme, but fairly in-depth for Clojure) it isn't intuitive to me: it would seem like the way you escape hygiene in Clojure (which by default achieves correct hygiene by attaching namespaces to symbols read for macros or inside of quasi-quote) is quite similar in semantics--but simpler in practice due to being exceedingly less verbose--than using syntax->datum and datum->syntax.

dherman · on Sept 23, 2012

That was me. :) Caveat: I know more about hygienic macros than I do about Clojure, so I'm not in a position to critique Clojure specifically.

Hygiene is (roughly speaking) about getting scope right by default but has never been about forcing it on the programmer. Moreover, there are two components to it, only one of which is easy to achieve in an unhygienic system. It's easy to ensure your macro renames introduced /bindings/ by using gensym. But if your macro introduces /references/ to existing variables, it's very hard to protect against those references getting captured at the site where clients call your macro.

I believe in Clojure they get around this for some cases by letting you fully qualify a reference to a library binding, for example. But what if your macro wants to refer to a variable that's local to it? Such as an unexported library function, or simply a local variable. Again, I don't know if Clojure has an answer to this.

One concrete example: write a `define-inline` macro-defining-macro. At the call site, a user might write

    (define some-local-variable /* something */)
    (define-inline (foo x)
      (+ x some-local-variable))

In an unhygienic system, they should first of all fully-qualify the `+` to be safe (yuck!) whereas a hygienic system just gets that right by default. But more critically, how can they be sure that some client of `foo` won't write:

    (define some-local-variable "something else")
    (foo 42)

Not only will that break, it's not clear how to fix it. A hygienic system gets this right.

I do agree that the state of the art in hygienic macros is too complex. I just haven't seen another system that makes this kind of thing work. But I would like to experiment with Clojure's macros more to see if they have an answer.

Dave

saurik · on Sept 23, 2012

Clojure does not have these issues: when the macro is called, the symbols are already attributed with the full namespace qualification, and usage of quasi-quote inside of the macro definition will also apply namespace qualification to variables local to the definition of the macro; you have to go out of your way to break this. You should spend more time looking into it before claiming to people that it doesn't work correctly; you could easily have just said "that's a good question, we'll look into that after the talk" rather than telling the person that Clojure wasn't as good.

dherman · on Sept 23, 2012

> Clojure does not have these issues: when the macro is called, the symbols are already attributed with the full namespace qualification, and usage of quasi-quote inside of the macro definition will also apply namespace qualification to variables local to the definition of the macro; you have to go out of your way to break this.

Again, not a Clojure expert, but a namespace is coarser-grained than individual local scopes, right? The problem I'm talking about is when you have a local variable inside a nested scope (e.g. inside a `let`). If this is not named by a namespace, then you would still get collisions.

Regardless, Clojure's approach seems to be much closer in spirit to a hygienic macro system: it attempts to get scoping correct by default, and allows you to intentionally capture.

> You should spend more time looking into it before claiming to people that it doesn't work correctly; you could easily have just said "that's a good question, we'll look into that after the talk" rather than telling the person that Clojure wasn't as good.

Fair enough as far as it goes. I did react snappily, but you didn't hear the offline conversation (this whole thing was a dialog at my office with a friend and colleague, incidentally) where I said "I'm not entirely sure, but I believe there are things you just can't express with systems like Clojure's." And we concluded, just as you reprimanded me to do, that we should look into it further when we have time.

Dave

saurik · on Sept 23, 2012

Correct: I did not hear the private conversation. I only heard the talk that was made public along with this project that was posted here, and which was recommended as an information source, and pretty much serves as the web page and documentation for this project ;P.

> Again, not a Clojure expert, but a namespace is coarser-grained than individual local scopes, right? The problem I'm talking about is when you have a local variable inside a nested scope (e.g. inside a `let`). If this is not named by a namespace, then you would still get collisions.

Ok, so are you are concerned with the case where the person defining the macro uses a symbol from the namespace of the person using the macro but that symbol has been rebound by the user inside of a let surrounding the aforementioned usage of the macro?

If so, that requires a cyclic module dependency, which isn't allowed (as the namespace from which you are getting the symbol would need to be required, but it would have to require back to get access to the macro: it does eager name binding, so that can't happen).

If not, and you are just talking about the simpler and more obvious case of a let shadowing a binding inside of a larger scope used by a macro, that works fine. The following code prints "1100", despite the macro expanding to multiple uses of the same symbol "t".

    (def t 1000) 
    (defmacro run [x] 
        `(+ ~x t))
    (let [t 100] 
        (prn (run t)))

You might then wonder (as I have) whether this is implemented by simply renaming the variables bound by let to something random: that would be sufficient to implement this. However, if I go out of my way to unquote an unqualified symbol, I can capture: the following code prints "1200".

    (def t 1000)
    (defmacro run [x]
        `(+ ~x t ~'t))
    (let [t 100]
        (prn (run t)))

dherman · on Sept 23, 2012

> Ok, so are you are concerned with the case where the person defining the macro uses a symbol from the namespace of the person using the macro but that symbol has been rebound by the user inside of a let surrounding the aforementioned usage of the macro?

> If so, that requires a cyclic module dependency...

Not necessarily. For example (forgive the Scheme syntax), all in one module:

    (define thing "outer thing")
    ;; define-inline is the above macro-defining-macro
    (define-inline (foo prefix)
      (string-append prefix thing))
    (let ((thing "inner thing"))
      (foo "should say outer thing: "))

saurik · on Sept 23, 2012

That seems to be my "if not," case, which I provided some examples for; if this is different, can you please be more explicit? It seems like your "thing" is my "t" and your "foo" is my "run": the only difference is then that I went out of my way to make it more complex my passing the inner thing through the macro to demonstrate it wouldn't get mangled.

    (def thing "outer thing")
    (defmacro foo [prefix]
        `(str ~prefix thing))
    (let [thing "inner thing"]
        (prn (foo "should say outer thing: ")))

"should say outer thing: outer thing"

(edit:) Alternatively, maybe you are focussing on the define-inline "macro-defining macro"; you mentioned it here again as "the above", and you had used it above, but as it wasn't defined it didn't seem important. I tried to go ahead and implement it, although to be honest I feel like I did it wrong (spending more time thinking about it, I believe it is correct, modulo your definition of "inline"); that said, it "worked".

    (defmacro def-inline [[name & args] code]
        `(defmacro ~name ~(apply vector args)
            ~code))

    (def thing "outer thing")
    (def-inline [foo prefix]
        (str prefix thing))
    (let [thing "inner thing"]
        (prn (foo "should say outer thing: ")))

"should say outer thing: outer thing"

dherman · on Sept 23, 2012

Interesting. I don't see how this works. I wonder, is it different if you do this?

    (let [thing "outer thing"]
      (defmacro foo [prefix]
          `(str ~prefix thing))
      (let [thing "inner thing"]
          (prn (foo "should say outer thing: "))))

Dave

saurik · on Sept 23, 2012

That is not possible as written, because the "defmacro" is not executed to define the macro until after the outer let is already executing, which is after macro expansion of that form (and thereby its children), as it has already been read: so what I get for that is a really weird error that I'm passing too many arguments to "foo", as if it were a function (which it is not; albeit I'm not certain what it is ;P).

However, I can use the def-inline that I wrote in the edit to my earlier reply to demonstrate that if you reorganized this code in a way that was semantically equivalent but hoisted the macro, it would work the way you think it should: the definition of the thing from the let surrounding the macro-ish definition is used, not the one from the call site (or the global one in the namespace).

    (defmacro def-inline [[name & args] code]
        `(defmacro ~name ~(apply vector args)
            ~code))

    (def thing "outer thing")
    (let [thing "middle thing"]
        (def-inline [foo prefix]
            (str prefix thing)))
    (let [thing "inner thing"]
        (prn (foo "should say middle thing: ")))

"should say middle thing: middle thing"

(edit:) Oh, that wasn't semantically equivalent, as the second let is not inside of the first. However, if I do that, I get the same behavior as I get in the other case (that it doesn't actually expand the macro at all and treats the form as a function call), as I'm obviously just defining the macro again inside of the same already-read form in which I'm using it.

(further:) Okay, and the reason why that is working is that the way I wrote def-inline inlined the code from def-inline into the macro itself. That is probably not what you wanted from def-inline: this is more like def-const (or def-static or something). I thereby tried doing this instead:

    (let [thing "middle thing"]
        (defmacro foo [prefix]
            `(str ~prefix thing)))
    (let [thing "inner thing"]
        (prn (foo "should say middle thing: ")))

This, in fact, does not return the "correct" string: instead, it fails to work at all, as the "thing" used inside of the macro is supposedly not defined (and worse, if I have a global def for "thing", I get that value). So, this is is a case where the macro is unable to use bindings that are local to the time when the macro definition is executed: it can only deal with global-ish names.

For the record, I think that is unrelated to what I normally think of with relation to hygiene: the macro is able to modify the code using it without accidental capture, but humorously it, itself, is unable to take advantage of symbols that have been bound locally around it. I totally accept that this is probably a flaw (I only say "probably" as I'm willing to believe someone from Clojure can convince me otherwise; it certainly seems like a flaw, though).

(more:) I am increasing the weight of that "probably", as I'm noting that the person calling this code has absolutely no way to refer to the thing that I have access to: there is no path no matter how complex or awesome that would let it refer to my "middle thing". I can inline it with ~, but then it isn't a binding anymore; however, that's actually equivalent, as Clojure does not have setf: "thing" is a constant, and so even if I had a function inside of this let closed over that thing, I couldn't modify its value.

dherman · on Sept 23, 2012

I'd forgotten about the whole Lisp tradition of expanding + evaluating one form at a time -- once again betraying my Scheme biases. :) That's making it hard for me to think about how to compare the two. I'm really not familiar enough with the one-form-at-a-time approach.

Anyway, my takeaways here are:

- I shouldn't've said anything about Clojure specifically, because a) the design space is different and b) they have some form of hygiene-like something-or-other that I don't know enough about. Gotta go study them!

- I still don't know how to do hygiene in Scheme-like languages any other way than the approaches I've seen, but I do agree they're more complicated than I wish they were.

Dave

SchizoDuckie · on Sept 23, 2012

OK, either your description page doesn't get your point across at all, or you're missing a definition somewhere.

'Wish the function keyword in JavaScript wasn't so long? What if you could define functions with def instead?'

Erh, no? Why in the lord's name would I ? Is that your big selling point? 'I don't like function() because it's too long?'

'Want a better way to make "classy" objects?'

Why would you want to make javascript less like javascript, introduce a dependency to javascript that can read your language, and then compiles back into javascript in realtime, in a way that will obviously make debugging nearly impossible (like coffeescript)?

Am I the only one that doesnt understand the use case of this? Or should this have been presented as just another lexer/parser?

disnet · on Sept 23, 2012

> OK, either your description page doesn't get your point across at all, or you're missing a definition somewhere.

The description page (like the rest of the project at them moment) is definitely a work in process. Sorry about the confusion!

> 'Wish the function keyword in JavaScript wasn't so long? What if you could define functions with def instead?'

> Erh, no? Why in the lord's name would I ? Is that your big selling point? 'I don't like function() because it's too long?'

This is just a really basic example of what you can do. Obviously you could have the same effect with sed but we need something simple to shows what basic macro definitions look like.

> 'Want a better way to make "classy" objects?'

> <provides example that looks like php vomited on Javascript after mating with ruby>

I'm assuming you refer to the macro definition with $ and whatnot. If you have ideas for better syntax please contribute! The notation is by no means final. That said I think it's pretty good for what it needs to do, matching syntax patterns and transforming them into new bits of syntax. Certainly more readable than a full parser/compiler.

> Am I the only one that doesnt understand the use case of this? Or is it just another lexer/parser?

The idea is to provide a middle way between no syntax extensions and writing your own full compile-to-js language languages like CoffeeScript which allow you to add any kind of syntax you want, but at the expense of being able to compose extensions. If you like the class notation of CS you must buy into all the syntax choices of CS whereas macros allow you to add just the syntax you want.

Whether you want new syntax is of course a different question. Other languages like scheme and closjure have found macros useful so sweet.js is an experiment to find out if the same syntactic extensibility that they provide is useful in JavaScript.

Hope that helps!

SchizoDuckie · on Sept 24, 2012

Okay, Now that's a story that makes me understand what this is about, thanks!

How about a more clear description like above on your page with some readups into macro's, and some more complicated examples that tell me as a programmer: "Yes, I dó need macro's!" and I can use them for these and these cases?

Also, don't underestimate how smart your audience is going to be by giving them just 3-line examples. I always find it more clear when there's some actual use cases on the wiki pages. How about a proper class, or a todoMVC style demo, so that people can actually imagine what kind of possibilities this would unlock?

saurik · on Sept 23, 2012

Simple examples are good for people to understand; however, the reason people want macros is usually for much more complex situations where you otherwise end up with a lot of boilerplate or repetitive code: a macro is most simply a way for code to manipulate and generate other code.

Regardless, if you don't like macros (and your arguments are common among people who don't; that said, they are also common arguments against using higher-level languages or even functions), then you won't like this project or any other project attempting to provide similar functionality. ;P