> The first example is easy: Implement an O(1) jump table in Common Lisp that re...

ohyes · on Aug 24, 2019

For the true jump table experience Tagbody / go would also work just fine for this. You can put the ‘go’s in lambdas in an array and funcall them. This sounds like fun.

reikonomusha · on Aug 24, 2019

Not so. The table of lambdas have to be built at runtime, making it O(N). You can’t use LOAD-TIME-VALUE because that doesn’t respect the lexical environment, which TAGBODY tags live in.

ohyes · on Aug 25, 2019

Well, you have to make the go statement in lexical scope in order for the tagbody to work... that seems reasonable enough to me. You specifically stated:

> Implement an O(1) jump table in Common Lisp that respects the lexical environment.

You don't know the lexical environment until there's a lexical environment to know, you can't have your cake and also eat it (or not have your cake and also know it).

Interestingly, try/catch also solves this problem fairly elegantly without distastefully consing up a list or array at run-time.

  (defun jump2 (i)
    (let ((location nil))
      (tagbody
         (catch 'a
           (catch 'b
             (catch 'c
               (catch 'd
                 (setf location (elt #(a b c d) i))
                 (throw location nil))
               (when location
                 (print "D!")
                 (go end)))
             (when location
               (print "C!")
               (go end)))
           (when location
             (print "B!")
             (go end)))
         (when location
           (print "A!")
           (go end))
       end)))

I'll leave the relevant macro to the reader, it shouldn't be that difficult... (I'm hoping I didn't just do someone's homework).

edit: removed some cruft from experiments and fixed formatting

dan-robertson · on Aug 25, 2019

Well here you can replace the tagbody with a block and replace the (go end) with an appropriate (return) instead do you can get results out of your jump table.

But that’s beside the point because in practice this is probably O(n) not O(1). I think most CL compilers are going to need to push things to the control stack for each catch. Even if the compiler is clever enough to eliminate the catches, it still mightn’t implement the computation of which catch catches as O(n). So even in the best case of a very smart compiler, your code is basically equivalent to:

  (case i
    (0 (print "A!"))
    (1 ...))

Which isn’t guaranteed to be O(1).

The point the GP is making is that the only solution which looks like it might O(1) (and indeed is O(1) if there is no (or just one) lexical scope to capture) is the array-of-lambdas-doing-go, but that this isn’t guaranteed by the language to be O(1) when the lexical scope changes each time because in a naive evaluation, the array of lambdas must be consed before jumping and this is O(n).

The reason that it is reasonable to complain about this is that what one needs for a jump table is weaker than this. The lambdas do capture the lexical scope but they only have dynamic extent and don’t extend the scope at all so a lambda of the form (lambda () (go foo)) only has to be consed each time you hit the jump table because you can only reference foo once you are inside the tagbody. However the code to actually jump is likely to be static for most compilers.

For guaranteed O(1) performance (which doesn’t rely on the cleverness of the compiler) you’d need to introduce a new special form, e.g. (jump n tag1 tag2 ... tagn) which evaluates n and jumps to the nth tag in its arguments in O(1) time.

reikonomusha · on Aug 25, 2019

This is an excellent breakdown and is exactly right. Paul Khuong, an SBCL developer, took exactly this approach you mentioned in a purely experimental implementation of a computed-GO; he essentially added a special form to the guts of the compiler. But that required exactly what you might think: hand-coded assembly and new nodes to SBCL’s IR.

I think this is a fine demonstration of not being able to “macro your way” out of a problem.

As for a sufficiently smart compiler, it probably can’t optimize CATCH/THROW since those have dynamic extent to all callees. The compiler will certainly not be able to prove that a function won’t possibly throw a tag down deep somewhere.

dan-robertson · on Aug 25, 2019

Regarding catch/throw, I don’t believe any current CL implementation could optimise this but:

- if the prints were changed to go outside of the catches before executing (ie so the compiler could know that they wouldn’t throw) then I think a sufficiently smart compiler could prove that nothing else could throw here (it would be allowed to know that elt wouldn’t throw)

- If I recall correctly, there are java compilers which can optimize throw-inside-catch code to not need the catch on the control stack. (Ie the throw is just compiled into a goto)

- some languages allow one to throw things that are unique to the function call which gives a better chance of them being unique.

gumby · on Aug 25, 2019

Btw C++ has the same catch/throw issue. Herb Sutter has proposed a clever approach (essentially using tag bits in the return value) to mitigate the cost.

ohyes · on Aug 25, 2019

O(n) and O(1) are an inappropriate and incorrect way of describing what you guys are taking issue with. These functions all run in constant O(1) time. The argument being made by you all is extremely confusing because of the incorrect terminology. What you all are actually complaining about is the branching complexity and performance of the underlying assembly code.

This is further confused by the inclusion of 'lexical scope' as a priority. Lexical scope is a thing, and of course the jump table will have to do more work if handling lexical scope involved. If you aren't managing the stack, you aren't dealing with lexical scope appropriately. You have incompatible requirements here.

If you were simulating the equivalent of a C jump table, you would simply have an array of functions, and you'd call the function that you want based on a reference.

This is very easy to do in common lisp, trivial in fact, so I'm a little confused about what the commotion is about, in that case.

  (defvar *jump-table* (vector (lambda () 'a) (lambda () 'b) (lambda () 'c) (lambda () 'd)))
   
  (defun jump-table-caller (i)
    (funcall (aref *jump-table* i)))
   

  (let ((jump-table (vector (lambda () 'a) (lambda () 'b) (lambda () 'c) (lambda () 'd))))
    (defun jump-table-caller2 (i)
      (funcall (aref jump-table i))))

Is the issue then, that it doesn't produce the correct machine language instructions? That seems to be an implementation detail of the compiler honestly more than anything else.

reikonomusha · on Aug 25, 2019

Neither of your examples allow the branch bodies to capture the lexical environment seen at the time of the jump. Nobody disputes that it’s possible to make a vector of function pointers.

Write your second example where the lambda bodies depend on the function parameter i, or indeed any other binding in the scope of the “jump”. This is a very normal and natural thing to do in other languages, such as C. Forget big-O, complexity, whatever. In a language such as C, you incur no runtime penalty for using local scope in the branches.

In Common Lisp, you incur a runtime penalty dependent on the size of the table. This penalty rears its head either in the number of comparisons that must be made to find the branch (dependent on the size of the table), or in terms of memory allocation (allocating closures for each branch). Either of those penalties can be expressed as a linear (or at least non-constant) function in the size of the table.

You are incorrect that lexical scope is at odds with a jump table. This can be formally proven, but it has also been empirically shown in actual (non-portable) implementations of a jump table.

ohyes · on Aug 25, 2019

Here you go.

  (defvar *my-variable* nil)
  
  (let ((jump-table (vector (lambda () (list 'a *my-variable*))
                            (lambda () (list 'b *my-variable*))
                            (lambda () (list 'c *my-variable*))
                            (lambda () (list 'd *my-variable*)))))
    (defun jump-table-caller3 (i)
      (let ((*my-variable* i))
        (funcall (aref jump-table i)))))

It's getting late for me so I'm headed to sleep, but this has been interesting, thanks!

reikonomusha · on Aug 25, 2019

This is dynamic scope, not lexical. They are not equivalent. You might further attempt to solve this by locally binding a lexical variable of the same name to the special variable, in order to “recreate” the lexical environment. (A quicker way to do this would be to pass these values to the branch functions directly.)

You will quickly find that this also will not work, for if a closure was made inside of each branch, the environment would not be shared among them.

The original intent of this exercise was to demonstrate you can’t macro your way out of all deficiencies in Lisp. And certainly in this case, even if we were to accept this dynamic variable rigamarole, we would find that we could not write a macro that, e.g., replaces CASE.

reikonomusha · on Aug 25, 2019

Do note that THROW/CATCH are dynamically established, and each CATCH setup + THROW invocation take a small amount of runtime. THROW usually takes linear time in the number of enclosing CATCH forms. So, despite how it looks, this solution is still linear in both space and time in the number of branches. (This is the same if you were to bind a special variable N times. The setup and tear down aren’t statically or lexically determined.)

reikonomusha · on Aug 24, 2019

The lambdas, a set of lexical closures, have to built at runtime, in order to capture the environment, and then stored in the array. That’s an O(N) operation.

ScottBurson · on Aug 25, 2019

No, it's O(1). Time complexity is about how the execution time grows with the size of the input, not the size of the program. In this case, each of those lambdas is a piece of the program text; the size of the jump table is therefore a constant for any particular program. The time complexity here is the time taken for one jump through the table as a function of the number of jumps, and clearly that's a constant also.

In fact, it would still be a constant if the 'case' were implemented as a chain of 'if's. Time complexity, since it concerns asymptotic growth of runtime as a function of input size, is not the correct notion to invoke here. You're just interested in performance in the ordinary sense: you want your program to run as fast as reasonably possible, and you know that a chain of 'if's leaves something to be desired in that regard.

Fair enough. But let's look at the array-of-closures implementation again. Assuming that the number of jumps made through one of these tables, once you've created it, is much larger than the number of times you have to create a table, it's a perfectly workable implementation.

I think your second point is more substantive. I too have come to the conclusion that coroutines were an unfortunate omission from the CL spec.

reikonomusha · on Aug 25, 2019

I stand by my claim. The size of the jump table of N entries takes O(N) time to construct at runtime. This construction cannot be done at compile time if the lexical environment must be accessible. A linear chain of N if-expressions constitutes an O(N) dispatch at runtime as well.

The point of a jump table is to compile N branches, laying them out in memory at addresses known at compile-time, and allowing a fixed jump to occur during program execution, equivalent to incrementing a program counter by some static amount. This cannot be done portably in ANSI Common Lisp.

The best you can do is construct a binary tree of IF branches to achieve O(log N) time.

You cannot construct the vector of N function entries without first constructing the vector (can be done at compile time), without second constructing N closures representing the code you’d jump to (must be done at runtime to capture the lexical environment), and without third populating the vector (must be done at runtime since the data being populated has been constructed at runtime).

Once the jump table is constructed, the actually vector dereference and call is O(1), but that’s beside the point if each invocation requires linear construction.

The number of possible inputs is proportional to the size of the table (i.e., number of entries) for which it can discriminate. It may be that the input is an integer N of unbounded length (say, K bits), which means that the time complexity is O(2^K) = O(N) > O(1).

If you don’t believe me, please show me how you would implement this.

ScottBurson · on Aug 28, 2019

> Once the jump table is constructed, the actually vector dereference and call is O(1), but that’s beside the point if each invocation requires linear construction.

Not necessarily. What matters is the ratio of calls through the table to the number of times the table is constructed. If that ratio is bounded, for any size input, then you have a point. My response is simply that programs for which such bounds exist are relatively rare in practice; the common case is that such a table will be called through many more times than it is constructed, and furthermore that the ratio of calls to constructions will increase without bound as the size of the input increases. Then the fraction of time that the program spends constructing the tables asymptotically approaches zero.

Admittedly, this argument is of little comfort if you're actually writing a program that constructs many such tables, only to call through each a small (and bounded) number of times. But the force of that fact as a criticism of a language has to take into account the likelihood of needing to write such a program in the first place.

gumby · on Aug 25, 2019

> I too have come to the conclusion that coroutines were an unfortunate omission from the CL spec.

There was no consensus on multiprocessing at the time of standardization and language support was very uncommon.

But still it’s weird: coroutines were over 20 years old by then, and newer ideas like the condition system and CLOS were included. And while nowadays the CL runtime is considered small, at the time it was criticized for being too large.

ScottBurson · on Aug 25, 2019

Coroutines are not quite threads, but then, they don't require a scheduler. They're also not quite firstclass continuations, but then, they're simpler to implement, simpler to use, and don't have the same implications for the rest of the language and implementation design. Still, they can do some of the things you might otherwise use threads or firstclass continuations for; and as reikonomusha points out, those things can be quite difficult to do otherwise in a clean way.

I'm sure some of the implementors (though not Symbolics, of course) would have been unenthusiastic about their inclusion, but still I wish it had been done.

ohyes · on Aug 25, 2019

Coroutines and first-class continuations make the implementation of non-local transfer of control much more complicated.

Their inclusion would have made staples like unwind-protect, conditions, and underlying them try/catch, much more difficult to implement, if not nearly impossible to get right.

I personally think that those language features are worth it and that this particular trade-off was well made.

dan-robertson · on Aug 25, 2019

I think it’s pretty obvious what the GP means, and quibbling about big-O notation changes the argument. The GP didn’t really mean “it’s impossible to build an n-branch case which captures the lexical scope and operates in constant time” because you can do that in any language. The GP means “it’s impossible to build an n-branch case which captures the lexical scope where the time taken to choose a branch doesn’t depend (modulo Professor caches) on the number of branches.

Therefore to reply as if the GP were making a statement about O-notation is to reply to a point which was not made.

What GP means is slightly fiddly to define formally as you need to formalise the jump operation that processors can do in constant time, but informally I think it’s completely obvious what this means. It’s also obvious you can do this in C (with the extension that lets you make jump tables) or assembly for basically any modern ISA, so I think it is reasonable to complain that you can’t do it in CL.

ScottBurson · on Aug 25, 2019

> The GP means “it’s impossible to build an n-branch case which captures the lexical scope where the time taken to choose a branch doesn’t depend (modulo processor caches) on the number of branches.

But the array-of-closures implementation satisfies that requirement! reikonomusha's objection to that proposal is that even though the time for one branch is constant, there is a setup time dependent on the number of branches. This is true. The essence of my reply is that it is, however, unlikely to matter much in practice, because the whole thing still runs in amortized constant time [0] per jump.

If you're doing something that's really so performance-sensitive that amortized constant time per operation isn't good enough for you, then you're probably going to have to work in C or assembler, because any higher-level language — even C++ — will have constructs with linear setup times (consider 'make-array' in CL, or 'std::vector' in C++).

> It’s also obvious you can do this in C (with the extension that lets you make jump tables)

Complaining that CL is deficient because a nonstandard extension exists in some other language that allows you to do something, or that you can do it in assembler, seems like a rather weak criticism. Besides, this seems to me more like a quality-of-implementation issue: as reikonomusha mentioned, it's entirely possible for a CL compiler to compile 'case' into a jump table, given reasonable constraints on the case values. I don't know which ones do this, if any, but it seems like the kind of thing the SBCL maintainers could be persuaded to do if someone had a clear need for it.

[0] https://stackoverflow.com/questions/200384/constant-amortize...