It's not really tail recursion, it's just similar in that you get the recursion into a particular form in order to enable an optimization that avoids the need for intermediation.
For tail recursion, that form is the last thing executed in a recursive method should be "return recursive_call(...)". It allows you to tell the sub-call to give its result directly to the current method's caller, avoiding the need for the current method to act as an intermediary bouncing the result from its callee to its caller.
In this paper the special form is "output += recursive_call(...)", where output is a mutable reference to the location where the result should be stored. This also allows the current method to avoid acting as an intermediary between its callee and its caller, since the callee will take a mutable reference to (a subsection of) the output and directly mutate it.
If I'm understanding you correctly (and I haven't RTFA yet, sorry) this is a bit like Operational Transform, allowing you to run some or all steps in parallel and achieve eventual-consistency. As opposed to TCO, which is simply restating recursion as iteration, but still requires the operations to run in sequence.
The thing I'm describing doesn't allow the steps to run in parallel. In fact it prevents it, because each of the three recursive calls require exclusive access to overlapping regions of the output.
For tail recursion, that form is the last thing executed in a recursive method should be "return recursive_call(...)". It allows you to tell the sub-call to give its result directly to the current method's caller, avoiding the need for the current method to act as an intermediary bouncing the result from its callee to its caller.
In this paper the special form is "output += recursive_call(...)", where output is a mutable reference to the location where the result should be stored. This also allows the current method to avoid acting as an intermediary between its callee and its caller, since the callee will take a mutable reference to (a subsection of) the output and directly mutate it.