I believe that unless you put the inline pragma on the Lisp function, the compiler isn't allowed to optimize the recursion.
In general, Common Lisp doesn't favor recursive functions (as opposed to Scheme), and as far as I know, most implementations don't put huge amounts of effort into optimizing it.
I'd be interested to see whether the same written using the loop (or iterate:iter) macros would still be as slow.
If you were a non LISP programmer and thought "Fibonacci using recursion", you might think you'd have something like fib(a)=fib(a-1)+fib(a-2), and scoff at its possible performance, and also wonder how that would be tail call optimized. But as a LISP programmer, you'll think of this in the way a Java programmer would think of a loop, and just treat this as a count over two values n times, adding along the way. And this is why I don't like LISP: because I have programmed assembly and built up from there, and doing loops using tail call recursion seems to be a lie. It is a faulty abstraction. It is a loop. You'd be better off doing it in a loop, in any sane language, and if you do it right in LISP, the code will generate a loop in assembly language. Just the fact that this thread devolved into how to specify compiler options so that it actually generates that loop shows how absurd the whole thing is.
Do "for" and "while" constructs also seem silly? After all they "compile to a loop" as well.
The purpose of all these loop constructs is to place constraints on some essential aspect of the loop up front, thus narrowing the possible range of effects of that wicked goto, by encoding common patterns. "For" loops guarantee the number of iterations. "While" loops guarantee the exit condition. And tail recursion guarantees what state gets modified in the loop.
It is strange the way you say "better off doing it in a loop". As you say, it is a loop. What construct would you prefer? GOTO?
I think the appeal of expressing loops with recursion is that it lets you to avoid mutation. But I agree with the point that Rust folks frequently make, about how being able to mutate things safely is a lot nicer than avoiding all mutation.
WEB> (time (nth-fibonacci 9999))
Evaluation took:
0.005 seconds of real time
0.004520 seconds of total run time (0.000168 user, 0.004352 system)
100.00% CPU
13,120,881 processor cycles
4,716,256 bytes consed
Yep, that's how you'd do it. So long as your CL implementation supports tail call optimization, that will be on par with the same algorithm using loop or another looping construct.
If I recall you have to set the optimization level prior to compiling the function using `(declaim (optimize xxx))` - where xxx is something I've forgotten. Perhaps someone can come along and point out what xxx should be?
Looks like it's doing tail call optimization to me, this is without doing anything special with declaim. Note that where it returns to the top is with JMP not CALL.
Ah, so as long as optimize is 2 or less you should get tail call recursion. I found when I tried it (several years ago) the default was debug optimize.
I wonder if different installs (or perhaps it's Slime) sets this value to different levels.
WEB> (defun nth-fibonacci (n &optional (a 0) (b 1))
(if (= n 0)
a
(nth-fibonacci (- n 1) b (+ a b))))
WARNING: redefining LOBE/SRC/WEB::NTH-FIBONACCI in DEFUN
NTH-FIBONACCI
WEB> (time (nth-fibonacci 9999))
Evaluation took:
0.002 seconds of real time
0.001887 seconds of total run time (0.001887 user, 0.000000 system)
100.00% CPU
5,476,446 processor cycles
4,715,760 bytes consed
I haven't looked at trying and don't really have time to, so I don't know, but with lisp, you can make anything fast as a general rule, very very fast. (at least with compiled sbcl)
If I absolutely had to write a recursive solution, something like this (should have all recursive calls in a tail position, may require trading debugability for speed):
In general, Common Lisp doesn't favor recursive functions (as opposed to Scheme), and as far as I know, most implementations don't put huge amounts of effort into optimizing it.
I'd be interested to see whether the same written using the loop (or iterate:iter) macros would still be as slow.