Hacker News new | past | comments | ask | show | jobs | submit login

I hereby inaugurate Ron's third law: for any perceived shortcoming of Common Lisp you can write a macro that fixes it for less effort than it takes to complain about it.



It’s a romantic idea but it’s simply not true. I have two examples.

The first example is easy: Implement an O(1) jump table in Common Lisp that respects the lexical environment. (You can’t, unless CASE is itself optimized to a jump table, which it usually isn’t.)

The second example: Consider the SAME-FRINGE [0] problem. You’ll have a hard time macro-ing your way out of that unless your Lisp implementation has a built-in API for coroutines [1], or you write extraordinarily difficult macros to CPS-convert the entirety of Common Lisp to allow for CALL/CC. The latter is itself a hefty project to do well.

This is not to say SAME-FRINGE can’t be solved in Lisp. It can. The page I linked has several solutions in Common Lisp. But idiomatic solutions are consy and inefficient. For instance, the usual SICP hack to write lazy sequences with thunks “works”, but doesn’t deeply integrate with the rest of Lisp so easily. And coroutine/CPS libraries in Lisp often have a bag of gotchas (e.g., they don’t work with high-order functions like MAPCAR).

While I understand this is subjective, the most natural solution to this problem uses a control mechanism that just doesn’t exist in Common Lisp.

[0] http://wiki.c2.com/?SameFringeProblem

[1] Many Lisps used to, but it has fallen by the wayside since Lisps began to support native OS threads.


> The first example is easy: Implement an O(1) jump table in Common Lisp that respects the lexical environment. (You can’t, unless CASE is itself optimized to a jump table, which it usually isn’t.)

Perhaps I don't understand your objection: you can certainly store any object in an array and funcall it which would be O(1); the object could be a lambda that captured the lexical environment. I was doing that 35 years ago on the lisp machine

Interestingly I did this also on the D-Machine Interlisp and each closure forked the stack, so it ground the machine to a halt, a design bug later fixed. Both examples I'm talking about predate Common Lisp standardization.


For the true jump table experience Tagbody / go would also work just fine for this. You can put the ‘go’s in lambdas in an array and funcall them. This sounds like fun.


Not so. The table of lambdas have to be built at runtime, making it O(N). You can’t use LOAD-TIME-VALUE because that doesn’t respect the lexical environment, which TAGBODY tags live in.


Well, you have to make the go statement in lexical scope in order for the tagbody to work... that seems reasonable enough to me. You specifically stated:

> Implement an O(1) jump table in Common Lisp that respects the lexical environment.

You don't know the lexical environment until there's a lexical environment to know, you can't have your cake and also eat it (or not have your cake and also know it).

Interestingly, try/catch also solves this problem fairly elegantly without distastefully consing up a list or array at run-time.

  (defun jump2 (i)
    (let ((location nil))
      (tagbody
         (catch 'a
           (catch 'b
             (catch 'c
               (catch 'd
                 (setf location (elt #(a b c d) i))
                 (throw location nil))
               (when location
                 (print "D!")
                 (go end)))
             (when location
               (print "C!")
               (go end)))
           (when location
             (print "B!")
             (go end)))
         (when location
           (print "A!")
           (go end))
       end)))
I'll leave the relevant macro to the reader, it shouldn't be that difficult... (I'm hoping I didn't just do someone's homework).

edit: removed some cruft from experiments and fixed formatting


Well here you can replace the tagbody with a block and replace the (go end) with an appropriate (return) instead do you can get results out of your jump table.

But that’s beside the point because in practice this is probably O(n) not O(1). I think most CL compilers are going to need to push things to the control stack for each catch. Even if the compiler is clever enough to eliminate the catches, it still mightn’t implement the computation of which catch catches as O(n). So even in the best case of a very smart compiler, your code is basically equivalent to:

  (case i
    (0 (print "A!"))
    (1 ...))
Which isn’t guaranteed to be O(1).

The point the GP is making is that the only solution which looks like it might O(1) (and indeed is O(1) if there is no (or just one) lexical scope to capture) is the array-of-lambdas-doing-go, but that this isn’t guaranteed by the language to be O(1) when the lexical scope changes each time because in a naive evaluation, the array of lambdas must be consed before jumping and this is O(n).

The reason that it is reasonable to complain about this is that what one needs for a jump table is weaker than this. The lambdas do capture the lexical scope but they only have dynamic extent and don’t extend the scope at all so a lambda of the form (lambda () (go foo)) only has to be consed each time you hit the jump table because you can only reference foo once you are inside the tagbody. However the code to actually jump is likely to be static for most compilers.

For guaranteed O(1) performance (which doesn’t rely on the cleverness of the compiler) you’d need to introduce a new special form, e.g. (jump n tag1 tag2 ... tagn) which evaluates n and jumps to the nth tag in its arguments in O(1) time.


This is an excellent breakdown and is exactly right. Paul Khuong, an SBCL developer, took exactly this approach you mentioned in a purely experimental implementation of a computed-GO; he essentially added a special form to the guts of the compiler. But that required exactly what you might think: hand-coded assembly and new nodes to SBCL’s IR.

I think this is a fine demonstration of not being able to “macro your way” out of a problem.

As for a sufficiently smart compiler, it probably can’t optimize CATCH/THROW since those have dynamic extent to all callees. The compiler will certainly not be able to prove that a function won’t possibly throw a tag down deep somewhere.


Regarding catch/throw, I don’t believe any current CL implementation could optimise this but:

- if the prints were changed to go outside of the catches before executing (ie so the compiler could know that they wouldn’t throw) then I think a sufficiently smart compiler could prove that nothing else could throw here (it would be allowed to know that elt wouldn’t throw)

- If I recall correctly, there are java compilers which can optimize throw-inside-catch code to not need the catch on the control stack. (Ie the throw is just compiled into a goto)

- some languages allow one to throw things that are unique to the function call which gives a better chance of them being unique.


Btw C++ has the same catch/throw issue. Herb Sutter has proposed a clever approach (essentially using tag bits in the return value) to mitigate the cost.


O(n) and O(1) are an inappropriate and incorrect way of describing what you guys are taking issue with. These functions all run in constant O(1) time. The argument being made by you all is extremely confusing because of the incorrect terminology. What you all are actually complaining about is the branching complexity and performance of the underlying assembly code.

This is further confused by the inclusion of 'lexical scope' as a priority. Lexical scope is a thing, and of course the jump table will have to do more work if handling lexical scope involved. If you aren't managing the stack, you aren't dealing with lexical scope appropriately. You have incompatible requirements here.

If you were simulating the equivalent of a C jump table, you would simply have an array of functions, and you'd call the function that you want based on a reference.

This is very easy to do in common lisp, trivial in fact, so I'm a little confused about what the commotion is about, in that case.

  (defvar *jump-table* (vector (lambda () 'a) (lambda () 'b) (lambda () 'c) (lambda () 'd)))
   
  (defun jump-table-caller (i)
    (funcall (aref *jump-table* i)))
   

  (let ((jump-table (vector (lambda () 'a) (lambda () 'b) (lambda () 'c) (lambda () 'd))))
    (defun jump-table-caller2 (i)
      (funcall (aref jump-table i))))
Is the issue then, that it doesn't produce the correct machine language instructions? That seems to be an implementation detail of the compiler honestly more than anything else.


Neither of your examples allow the branch bodies to capture the lexical environment seen at the time of the jump. Nobody disputes that it’s possible to make a vector of function pointers.

Write your second example where the lambda bodies depend on the function parameter i, or indeed any other binding in the scope of the “jump”. This is a very normal and natural thing to do in other languages, such as C. Forget big-O, complexity, whatever. In a language such as C, you incur no runtime penalty for using local scope in the branches.

In Common Lisp, you incur a runtime penalty dependent on the size of the table. This penalty rears its head either in the number of comparisons that must be made to find the branch (dependent on the size of the table), or in terms of memory allocation (allocating closures for each branch). Either of those penalties can be expressed as a linear (or at least non-constant) function in the size of the table.

You are incorrect that lexical scope is at odds with a jump table. This can be formally proven, but it has also been empirically shown in actual (non-portable) implementations of a jump table.


Here you go.

  (defvar *my-variable* nil)
  
  (let ((jump-table (vector (lambda () (list 'a *my-variable*))
                            (lambda () (list 'b *my-variable*))
                            (lambda () (list 'c *my-variable*))
                            (lambda () (list 'd *my-variable*)))))
    (defun jump-table-caller3 (i)
      (let ((*my-variable* i))
        (funcall (aref jump-table i)))))
It's getting late for me so I'm headed to sleep, but this has been interesting, thanks!


This is dynamic scope, not lexical. They are not equivalent. You might further attempt to solve this by locally binding a lexical variable of the same name to the special variable, in order to “recreate” the lexical environment. (A quicker way to do this would be to pass these values to the branch functions directly.)

You will quickly find that this also will not work, for if a closure was made inside of each branch, the environment would not be shared among them.

The original intent of this exercise was to demonstrate you can’t macro your way out of all deficiencies in Lisp. And certainly in this case, even if we were to accept this dynamic variable rigamarole, we would find that we could not write a macro that, e.g., replaces CASE.


Do note that THROW/CATCH are dynamically established, and each CATCH setup + THROW invocation take a small amount of runtime. THROW usually takes linear time in the number of enclosing CATCH forms. So, despite how it looks, this solution is still linear in both space and time in the number of branches. (This is the same if you were to bind a special variable N times. The setup and tear down aren’t statically or lexically determined.)


The lambdas, a set of lexical closures, have to built at runtime, in order to capture the environment, and then stored in the array. That’s an O(N) operation.


No, it's O(1). Time complexity is about how the execution time grows with the size of the input, not the size of the program. In this case, each of those lambdas is a piece of the program text; the size of the jump table is therefore a constant for any particular program. The time complexity here is the time taken for one jump through the table as a function of the number of jumps, and clearly that's a constant also.

In fact, it would still be a constant if the 'case' were implemented as a chain of 'if's. Time complexity, since it concerns asymptotic growth of runtime as a function of input size, is not the correct notion to invoke here. You're just interested in performance in the ordinary sense: you want your program to run as fast as reasonably possible, and you know that a chain of 'if's leaves something to be desired in that regard.

Fair enough. But let's look at the array-of-closures implementation again. Assuming that the number of jumps made through one of these tables, once you've created it, is much larger than the number of times you have to create a table, it's a perfectly workable implementation.

I think your second point is more substantive. I too have come to the conclusion that coroutines were an unfortunate omission from the CL spec.


I stand by my claim. The size of the jump table of N entries takes O(N) time to construct at runtime. This construction cannot be done at compile time if the lexical environment must be accessible. A linear chain of N if-expressions constitutes an O(N) dispatch at runtime as well.

The point of a jump table is to compile N branches, laying them out in memory at addresses known at compile-time, and allowing a fixed jump to occur during program execution, equivalent to incrementing a program counter by some static amount. This cannot be done portably in ANSI Common Lisp.

The best you can do is construct a binary tree of IF branches to achieve O(log N) time.

You cannot construct the vector of N function entries without first constructing the vector (can be done at compile time), without second constructing N closures representing the code you’d jump to (must be done at runtime to capture the lexical environment), and without third populating the vector (must be done at runtime since the data being populated has been constructed at runtime).

Once the jump table is constructed, the actually vector dereference and call is O(1), but that’s beside the point if each invocation requires linear construction.

The number of possible inputs is proportional to the size of the table (i.e., number of entries) for which it can discriminate. It may be that the input is an integer N of unbounded length (say, K bits), which means that the time complexity is O(2^K) = O(N) > O(1).

If you don’t believe me, please show me how you would implement this.


> Once the jump table is constructed, the actually vector dereference and call is O(1), but that’s beside the point if each invocation requires linear construction.

Not necessarily. What matters is the ratio of calls through the table to the number of times the table is constructed. If that ratio is bounded, for any size input, then you have a point. My response is simply that programs for which such bounds exist are relatively rare in practice; the common case is that such a table will be called through many more times than it is constructed, and furthermore that the ratio of calls to constructions will increase without bound as the size of the input increases. Then the fraction of time that the program spends constructing the tables asymptotically approaches zero.

Admittedly, this argument is of little comfort if you're actually writing a program that constructs many such tables, only to call through each a small (and bounded) number of times. But the force of that fact as a criticism of a language has to take into account the likelihood of needing to write such a program in the first place.


> I too have come to the conclusion that coroutines were an unfortunate omission from the CL spec.

There was no consensus on multiprocessing at the time of standardization and language support was very uncommon.

But still it’s weird: coroutines were over 20 years old by then, and newer ideas like the condition system and CLOS were included. And while nowadays the CL runtime is considered small, at the time it was criticized for being too large.


Coroutines are not quite threads, but then, they don't require a scheduler. They're also not quite firstclass continuations, but then, they're simpler to implement, simpler to use, and don't have the same implications for the rest of the language and implementation design. Still, they can do some of the things you might otherwise use threads or firstclass continuations for; and as reikonomusha points out, those things can be quite difficult to do otherwise in a clean way.

I'm sure some of the implementors (though not Symbolics, of course) would have been unenthusiastic about their inclusion, but still I wish it had been done.


Coroutines and first-class continuations make the implementation of non-local transfer of control much more complicated.

Their inclusion would have made staples like unwind-protect, conditions, and underlying them try/catch, much more difficult to implement, if not nearly impossible to get right.

I personally think that those language features are worth it and that this particular trade-off was well made.


I think it’s pretty obvious what the GP means, and quibbling about big-O notation changes the argument. The GP didn’t really mean “it’s impossible to build an n-branch case which captures the lexical scope and operates in constant time” because you can do that in any language. The GP means “it’s impossible to build an n-branch case which captures the lexical scope where the time taken to choose a branch doesn’t depend (modulo Professor caches) on the number of branches.

Therefore to reply as if the GP were making a statement about O-notation is to reply to a point which was not made.

What GP means is slightly fiddly to define formally as you need to formalise the jump operation that processors can do in constant time, but informally I think it’s completely obvious what this means. It’s also obvious you can do this in C (with the extension that lets you make jump tables) or assembly for basically any modern ISA, so I think it is reasonable to complain that you can’t do it in CL.


> The GP means “it’s impossible to build an n-branch case which captures the lexical scope where the time taken to choose a branch doesn’t depend (modulo processor caches) on the number of branches.

But the array-of-closures implementation satisfies that requirement! reikonomusha's objection to that proposal is that even though the time for one branch is constant, there is a setup time dependent on the number of branches. This is true. The essence of my reply is that it is, however, unlikely to matter much in practice, because the whole thing still runs in amortized constant time [0] per jump.

If you're doing something that's really so performance-sensitive that amortized constant time per operation isn't good enough for you, then you're probably going to have to work in C or assembler, because any higher-level language — even C++ — will have constructs with linear setup times (consider 'make-array' in CL, or 'std::vector' in C++).

> It’s also obvious you can do this in C (with the extension that lets you make jump tables)

Complaining that CL is deficient because a nonstandard extension exists in some other language that allows you to do something, or that you can do it in assembler, seems like a rather weak criticism. Besides, this seems to me more like a quality-of-implementation issue: as reikonomusha mentioned, it's entirely possible for a CL compiler to compile 'case' into a jump table, given reasonable constraints on the case values. I don't know which ones do this, if any, but it seems like the kind of thing the SBCL maintainers could be persuaded to do if someone had a clear need for it.

[0] https://stackoverflow.com/questions/200384/constant-amortize...


I think that simulating coroutines by explicitly building the call stack they would have had isn't too ugly. This solution only conses to build those call stacks:

    (defun leaves (tree)
      "Return a function that returns the leaves of TREE one by one when called repeatedly."
      (let ((work (list tree)))
        (flet ((schedule (x) (when x (push x work))))
          (lambda ()
            (loop for task = (pop work)
                  until (atom task)
                  do (schedule (cdr task))
                     (schedule (car task))
                  finally (return task))))))

    (defun same-fringe-p (t1 t2)
      (loop with l1 = (leaves t1) and l2 = (leaves t2)
            for x1 = (funcall l1) and x2 = (funcall l2)
            while (or x1 x2) always (eql x1 x2)))


> (You can’t, unless CASE is itself optimized to a jump table, which it usually isn’t.)

That's surprising. Even the brain-dead compiler in my TXR Lisp (a dialect that has some things in common with Common Lisp) does it.

  1> (disassemble 
        (compile-toplevel
          '(caseql x (1 'a) (2 'b) (3 'c) ((4 5 6) 'd) (7 'e) (8 'f) (9 'g))))
  ** warning: (expr-1:3) unbound variable x
  data:
      0: #:nohit
      1: a
      2: b
      3: c
      4: d
      5: e
      6: f
      7: g
  syms:
      0: x
      1: integerp
      2: <=
      3: sys:b-
  code:
      0: 04020002 frame 2 2
      1: A0800000 getlx v00000 0
      2: 30810400 movsr v00001 d000
      3: 24010003 gcall t003 1 v00000
      4: 08000001
      5: 48000026 if t003 38
      6: 00000003
      7: 38050004 movrsi t004 1
      8: 38250005 movrsi t005 9
      9: 24030003 gcall t003 2 t004 v00000 t005
     10: 00040002
     11: 00050800
     12: 48000026 if t003 38
     13: 00000003
     14: 38050005 movrsi t005 1
     15: 24020004 gcall t004 3 v00000 t005
     16: 08000003
     17: 00000005
     18: 54090004 swtch t004 24 26 28 30 30 30 32 34 36
     19: 001A0018
     20: 001E001C
     21: 001E001E
     22: 00220020
     23: 00000024
     24: 30810401 movsr v00001 d001
     25: 44000025 jmp 37
     26: 30810402 movsr v00001 d002
     27: 44000025 jmp 37
     28: 30810403 movsr v00001 d003
     29: 44000025 jmp 37
     30: 30810404 movsr v00001 d004
     31: 44000025 jmp 37
     32: 30810405 movsr v00001 d005
     33: 44000025 jmp 37
     34: 30810406 movsr v00001 d006
     35: 44000025 jmp 37
     36: 30810407 movsr v00001 d007
     37: 30030801 movsr t003 v00001
     38: 4C00002A ifq v00001 d000 42
     39: 08010400
     40: 30020000 movsr t002 nil
     41: 4400002B jmp 43
     42: 30020801 movsr t002 v00001
     43: 10000002 end t002
     44: 10000002 end t002
  instruction count:
     32
  #<sys:vm-desc: 8522ca8>

And, oh, the work is done by ... the case macro!

  2> (macroexpand '(caseql x (1 'a) (2 'b) (3 'c) ((4 5 6) 'd) (7 'e) (8 'f) (9 'g)))
  (let ((#:test-0034
         x)
        (#:swres-0036
         '#:nohit))
    (and (integerp #:test-0034)
      (<= 1 #:test-0034
          9)
      (set #:swres-0036
        (sys:switch (- #:test-0034
                       1)
          #(('a) ('b) ('c)
            ('d) ('d) ('d)
            ('e) ('f) ('g)))))
    (if (eq #:swres-0036
            '#:nohit)
      (progn) #:swres-0036))
The sys:swtch special form is then spun into the jump table code.


I have a coroutine implementation that GB and I wrote for CCL. It runs on top of native threads and is rather kludgy. It's not nearly as elegant as an implementation using stack-groups. Now I prefer to solve problems like same-fringe with general composable streams (like SICP lazy sequences but using CLOS rather than closures).


You could do the jump tables if your CL gives you some access to the lexenv at compile time.

What you do is modify the individual cases so that instead of accessing the surrounding runtime lexenv directly, they go through reader and writer thunks. These thunks are then passed in as a separate argument to the lambda for each case. The table mapping case values to these lambdas is a compile time constant.

You might complain that this takes time proportional to the number of accessed objects in the lexenv, to set up the thunks. And that's true, but it's constant in the number of cases.


Yes, the missing feature of something of equivalent power to call/cc is my biggest missing feature of CL. I've considered implementing green threads for SBCL as that's fairly straightforward to implement at the runtime level.

There is a working delimited continuation library, but it's not exactly trivial.


Won't tagbody/go with an array of symbols work for jump tables (assuming GO is O(1)?):

    (tagbody
      (go (aref #(a b c d) N))
      a
      ...
      b
      ...
      c
      ...
      d))


Alas, no; 'go' is a special form — its argument is unevaluated.


Indeed it appears so. Strangely enough, the hyperspec doesn't specify that it's not evaluated.


For special forms I think it’s implied that the arguments aren’t evaluated (as they are special) unless stated otherwise.


At least some forms say explicitly evaluated or not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: