the call is still in tail position whether or not it reuses the stack frame. there are also more involved ways to do tail call optimization than a direct single-jump compilation when you leave ret behind entirely, such as in forth-style threaded interpreters
You don’t need recursion to make use of tail call elimination. In Scheme and SML all tail calls are eliminated. GCC also does it, but less often. Still, it’s not recursion that triggers it.
i only meant that "optimized/eliminated tail call" is more useful terminology than an uneliminated tail call not counting as "a tail call". i find this distinction useful when discussing clojure, for instance, where you have to explicitly trampoline recursive tail calls and there is a difference between an eliminated tail call and a call in tail position which is eligible for TCO
i'm not sure how commonly tail calls are eliminated in other forthlikes at the ~runtime level since you can just do it at call time when you really need it by dropping from the return stack, but i find it nice to be able to not just pop the stack doing things naively. basically since exit is itself a threaded word you can simply¹ check if the current instruction precedes a call to exit and drop a return address
in case it's helpful this is the relevant bit from mine (which started off as a toy 64-bit port of jonesforth):
.macro STEP
lodsq
jmp *(%rax)
.endm
INTERPRET:
mov (%rsi), %rcx
mov $EXIT, %rdx
lea 8(%rbp), %rbx
cmp %rcx, %rdx # tail call?
cmovz (%rbp), %rsi # if so, we
cmovz %rbx, %rbp # can reuse
RPUSH %rsi # ret stack
add $8, %rax
mov %rax, %rsi
STEP
¹ provided you're willing to point the footguns over at the return stack manipulation side of things instead