Show HN: Python decorator that enables arbitrarily-deep tail/non-tail recursion

tylerhou · on Dec 20, 2021

Hi HN, I wrote this.

Admittedly Fiber is not the best name — in the current implementation we don't yield to the scheduler except for recursive function calls, and our "scheduler" only runs a function to completion. These things could be fixed of course, but I didn't have time.

I called it Fiber because I thought coroutine + scheduler is approximately equal to a fiber implementation, and other schedulers could be added in the future with more clever behavior. Plus, you could change the yielding behavior to match what is necessary: for example, to yield at an arbitrary point one could modify the decorator to emit "pause" returns whenever a special `yield_()` function is called. Or you could create a list of special functions (e.g. functions associated with normal syscalls) that cause the coroutine to yield, and have a scheduler that will resume the function when the syscall completes.

One point behind this project is to demonstrate that if you want to pause and resume functions, you don't actually need runtime (kernel, VM) support; some hacky AST rewrites are also a feasible (if slow) implementation. One thing to note is that our overhead is mostly per pause/resume; hence the "sum a list" benchmark is really a worst-case benchmark. If you have some recursive function that does a lot of non-recursive work that needs to be paused and resumed (hint hint, React rendering and reconciliation) the performance gap will be much narrower.

Tistron · on Dec 20, 2021

I don't generally write Python code, so maybe that is it, but should't the first example:

> print(trampoline.run(cache, [1010]) % 10 * 5)

reference `fib` somehow, rather than `cache`?

tylerhou · on Dec 20, 2021

You're right, fixed.

KolenCh · on Dec 21, 2021

May be compare to tco which is not as general.

a-dub · on Dec 20, 2021

neat idea, but i'm guessing that a rewritten function would be nightmarish to debug... or no?

wonder if it would be possible to do automated rewrites of recursive functions as iterative ones. specifically for cases where stack memory that grows in a recursive implementation can be replaced by a few long lived temporaries... like the good ol' fibonacci function.

vitus · on Dec 20, 2021

Fib is a weird case because it's the poster child of dynamic programming. The challenge in this case is that each call spawns multiple recursive calls, and you're ultimately just adding 1's and 0's (so your runtime is comparable to the number you compute). Meanwhile, the DP / iterative approach understands the repeated structure of your recursion and can use that to eliminate redundant computation.

Compare this to tail-recursive functions, where you can basically update the stack variables and jump back to the beginning of the function.

(Note that you can approximately achieve the performance of the iterative solution by memoizing the results if your functions are pure and your input space is small, e.g. via @functools.cache, but you're paying for it with space, whereas the iterative solution understands when you can discard the intermediate results.)

falcor84 · on Dec 21, 2021

>Fib is a weird case because it's the poster child of dynamic programming.

It's actually got a closed-form solution, so finding the nth fib number is O(1) (assuming O(1) arithmetic). https://en.wikipedia.org/wiki/Fibonacci_number#Closed-form_e...

tylerhou · on Dec 21, 2021

Technically not correct, because when you get to large numbers you have to take into account that multiplication of those numbers takes O(nlogn) at best. Fib grows exponentially, so fib(n) takes O(n) bits to write. Python uses Karatsuba’s, which multiples two n-bit numbers in approximately n^1.59 time.

Second, to compute the result accurately, you would have to compute many many digits of floating point precision with the closed form solution. That is more expensive than alternative solutions.

The fastest freely available, practical implementation that doesn’t have floating point error is probably in libgmp, which uses fast powering (which can be derived from repeated matrix exponentiation). https://gmplib.org/manual/Fibonacci-Numbers-Algorithm

This takes something like O(multiplication * logn) time. With Karatsuba’s, approximately O(n^1.59 * logn).

(Now I see you said assuming O(1) arithmetic after I wrote this post, but that’s not a reasonable assumption…)

patrick451 · on Dec 21, 2021

> Technically not correct, because when you get to large numbers you have to take into account that multiplication of those numbers takes O(nlogn) at best.

This seems disingenuous. You could make the same claim about any algorithm that involves multiplication.

hansvm · on Dec 23, 2021

Yes you could. For a fixed-width register machine you can go even further and say that there don't exist non-trivial constant time algorithms (it takes at least logarithmic time to read the entire input). This includes bread-and-butter classics like hashtables.

In practice we usually hand-waive those details away because they don't apply on the domain of interest; rarely is somebody meaningfully exceeding a 64-bit space for their hashtables. You can formalize that a bit with something like the "Word RAM" model of computation -- assume that you have fixed width registers as wide as your inputs.

For the Fibonacci sequence those details are a little harder to hand-waive away because of how quickly the output grows. As soon as the input is bigger than ~64 on a modern computer you'll start to see the effects of that linear runtime. On most clusters you'd have to stream Fib(1<<64) just to see all the bits, and it'd take ages.

The reason that idea doesn't (usually) get applied to algorithms involving multiplication is because usually we're operating implicitly in a model that includes constant-time multiplication of "big enough" numbers. In cryptography and other fields operating on large integers though that analysis absolutely does come into play.

The overarching theme is that reality is hard to capture both succinctly and accurately, and you want as simple of a model as possible that adequately describes the behavior you're curious about. Often that includes constant-time multiplication, but that'd seriously deceive you in an analysis of Fibonacci runtime.

tylerhou · on Dec 21, 2021

No. Specifically, in this case we are multiplying exponentially large numbers meaning that they have linearly many bits with respect to the input, so you must account for the multiplication costs as they are not insignificant (e.g. O(n^1.59)).

For normal computations, you usually deal with numbers that have logarithmically many bits wrt. the input, so the runtime is negligible (something like O(log^1.59(n)), so we can treat it as a constant. It's technically not a constant, but it's so close to the time to actually read the input (log(n) or n) that we can treat it as one.

patrick451 · on Dec 22, 2021

That's unconvincing. You could say the same thing about matrix multiplication with huge elements.

tylerhou · on Dec 22, 2021

Yes, you could. But normally when people analyze the runtime of matrix multiplication, they assume that the elements have constant size (hence constant runtime arithmetic) and the matrix size scales. But you could also write a runtime that scales with respect to the size of the elements as well; assuming that we are using integer elements that have size at most M your runtime would be something like O(n^3log^2(M)M) in the naive case.

In Fibonacci it is not valid to assume that your arithmetic is constant time because you are doing arithmetic on integers that grow exponentially. In general, it is not valid to assume that you can do constant-time arithmetic on all numbers because then you could solve many many problems much faster than possible on a word-RAM machine by encoding your problem as a real number, and multiplying/adding.

a-dub · on Dec 24, 2021

> In general, it is not valid to assume that you can do constant-time arithmetic on all numbers because then you could solve many many problems much faster than possible on a word-RAM machine by encoding your problem as a real number, and multiplying/adding.

i miss theory of computation. the reasoning was so simple and the proofs were so fun.

tylerhou · on Dec 20, 2021

Yeah, debugging the written function is not easy, but that could be improved by adding the normal tools for rewrites (sourcemap). Also, you can put print/logging statements inside the function; the rewrite preserves those.

In theory the rewrite shouldn't change behavior, so you could also unittest both the decorated function and the undecorated function.

> possible to do automated rewrites of recursive functions as iterative ones.

This is basically what we're doing, except the stack is inside the trampoline. If you inlined the trampoline into the function itself then that is exactly a recursion -> iteration transform.

a-dub · on Dec 20, 2021

> This is basically what we're doing, except the stack is inside the trampoline. If you inlined the trampoline into the function itself then that is exactly a recursion -> iteration transform.

yeah but aren't you basically creating a stack on the heap, and then running the recursive algorithm there?

i mean more like rewriting the unbounded space recursive version of the fibonacci function as the constant space two temporary iterative version.

tylerhou · on Dec 20, 2021

Ah, this is definitely possible as well; you could statically analyze the function and then topologically sort the recursive calls. But that falls into an optimization category which requires some sophisticated analysis.

That type of optimization is trivial for fib, but it's really difficult in general. Also, it only easily works where the recursive parameter that is changing is an int/index. You could also make it work with trees/graphs, but that requires some substructural analysis...

thaumasiotes · on Dec 21, 2021

> wonder if it would be possible to do automated rewrites of recursive functions as iterative ones.

Of course.

> specifically for cases where stack memory that grows in a recursive implementation can be replaced by a few long lived temporaries... like the good ol' fibonacci function.

Rewriting a recursive Fibonacci function to use a loop with an explicit stack doesn't do anything about the growth of memory. You'll need just as many stack elements. It's just an explicit stack that you've assigned to a variable rather than being the function call stack.

coldtea · on Dec 21, 2021

>Rewriting a recursive Fibonacci function to use a loop with an explicit stack doesn't do anything about the growth of memory.

Sort of. It means the amount of memory you can use is much bigger, and not constrained by the stack size.

jhanschoo · on Dec 21, 2021

> wonder if it would be possible to do automated rewrites of recursive functions as iterative ones. specifically for cases where stack memory that grows in a recursive implementation can be replaced by a few long lived temporaries.

Yes. Some Lisp implementations do this automatically. See https://en.wikipedia.org/wiki/Continuation-passing_style .

Note that this differs from the POC of OP in that in OP's POC, only one function is defined, and it is transformed into a tree with if-statements to determine where in the code execution should be resumed, but in CPS it's an anonymous function in the tail being called with all context still relevant to it. Then the tail-call can be optimized out.

Note that anonymous functions can be seen as just regular top-level functions with a hidden parameter containing the context relevant to it.

chrisseaton · on Dec 20, 2021

> i'm guessing that a rewritten function would be nightmarish to debug... or no?

I'm going to guess that it's likely irreducible! So yes a problem for both human and machine understanding!