I'm reminded of the Philosophy of Computer Science entry in the Stanford Encyclopedia of Philosophy [0], which briefly considers what it means for two programs to be identical.
"... it has been argued that there are cases in which it is not possible to determine whether two programs are the same without making reference to an external semantics. Sprevak (2010) proposes to consider two programs for addition which differ from the fact that one operates on Arabic, the other one on Roman numerals. The two programs compute the same function, namely addition, but this cannot always be established by inspecting the code with its subroutines; it must be determined by assigning content to the input/output strings"
"The problem can be tackled by fixing an identity criterion, namely a formal relation, that any two programs should entertain in order to be defined as identical. Angius and Primiero (2018) show how to use the process algebra relation of bisimulation between the two automata implemented by two programs under examination as such an identity criterion. Bisimulation allows to establish matching structural properties of programs implementing the same function, as well as providing weaker criteria for copies in terms of simulation."
(Of course, it isn't surprising that this would be relevant, because proofs and programs themselves are isomorphic).
This technique seems rather stricter than what Gowers has in mind, but it seems helpful as a baseline.
I think it's also important to make a distinction between a pair of programs which compute the same function using an identical amount of space and time and a pair of programs which compute the same function with different amounts of either space or time (or both). Two programs might compute the same function and be considered formally identical in that sense but may be in radically different complexity classes [O(1) vs O(n) vs O(n^2) vs O(2^n)].
Formally we may not be interested in this distinction but practically we definitely are. One program may be extremely practical and useful whereas the other might not finish computing anything before the heat death of the universe on anything but trivial-sized inputs.
On the other hand, compiler tricks like tail call optimization can e.g. reduce an O(n) algorithm to an O(1) algorithm. Is it a “different program” if the same source code is compiled with a new compiler?
Yes, same source code; different program. The point of compilers is to output a different program that (hopefully) has similar enough I/O but it better suited to the hardware. The program that runs when a human has to adds 1+1 is still largely unknown to us but the source '1+1' isn't.
Why use a new compiler if your program isn't meaningfully changed by it? I'd consider running on two completely different machines to already constitute "meaningfully different"
Tail call elimination is not an optimization because it changes the semantics of the program. The feature can take a program which would previously fail to terminate due to a stack overflow and cause it to terminate without error.
Perhaps TCO is better thought of as a language extension.
That seems no different than any other optimization: very directly tons of optimizations would reduce stack usage which would then change a given input from a stack overflow to a successful execution. Similarly anything that reduces heap memory usage or code size would also do the same.
That's irrelevant - your assertion was that a change that changes semantics by preventing stack overflow cannot be called an optimization. The commenter showed that is false, and gave reasons why.
Whether tail call goes from O(n) to O(1) doesn't change any of the above.
> Computing f(0)=0; f(n)=f(n-1) is O(n log n) without tail calls because you need O(log n) addresses to hold your stack frames.
There are two principal ways of applying asymptotic analysis to algorithms: time or memory used. In both, your procedure is O(n) without TCO. With TCO it is O(n) for runtime (though further optimization would reduce it to O(1) since it's just the constant function 0, but TCO alone doesn't get us there) and O(1) for space since it would reuse the same stack frame.
What O(log n) addresses do you need to hold the stack frames when there are O(n) stack frames needing O(n) addresses (without TCO, which, again, reduces it to O(1) for memory)?
Also, regarding "without tail calls", your example already has tail calls. What do you mean by that?
If we don't treat almost all integers in an algorithm as fixed size then the analysis gets really messy and annoying in a way that has nothing to do with real computers.
And if the algorithm actually did anything with the value that made it grow or shrink with the recursion, the TCO version would stop being O(n) under such a framework. This only works because it's passing 0 around every iteration. And this probably already applies to the TCO version's flow control depending on how you expect to run it.
Regardless, the comment I replied to is fundamentally confused (presented a tail recursive algorithm and said it didn't have tail calls, presented a linear algorithm that uses a linear amount of memory and claims it's O(n log n) for some reason but no clarification if it's in time or space). I'd rather hear from the person I responded to than whatever guesses the rest of us can come up with because it needs several points of clarification to be understood.
> If we don't treat almost all integers in an algorithm as fixed size then the analysis gets really messy and annoying in a way that has nothing to do with real computers.
I guess nobody does numerical computing or distributed systems with real computers anymore, huh.
Wrong way around. Almost all the numbers in those systems are fixed-size. There's a few types of variable width payload, but all the little ancillary types are 64 bits or whatever.
> There are two principal ways of applying asymptotic analysis to algorithms: time or memory used.
It is usually both, but I meant time, because to be able to address your stack frame you need* a stack pointer that can take as many distinct values as your nesting depth, so it must have o(log n) width.
It is easy to dismiss this as irrelevant because your integers are usually fixed-width, but then you'd need to parameterize your algorithm on the size of input you're willing to handle (at which point you're no longer doing asymptotic analysis), since arbitrary-precision arithmetic really does just work this way normally.
> Also, regarding "without tail calls", your example already has tail calls. What do you mean by that?
I mean "without tail call optimization", or if you're particularly keen on getting into a pedant-off, "with the function call in the tail position not implemented as a tail call".
From complexity analysis we can adopt the concept of polynomial-time reducibility and might define a type of equivalence relation where two algorithms are equivalent only if both are pt-reducible to each other. Intuitively it’s not a sufficient condition for "sameness" – otherwise, for example, all NP-complete problems are the "same" and solvable with the "same" algorithm – but it’s arguably a necessary one.
In the most general case there is no technique that can determine if two programs are equivalent other than running both programs on some set of inputs and verifying that the outputs (after termination) are the same. Every other technique must cut out all possible sources of non-termination to get around the halting problem in order to make the resulting equivalence relation on the set of programs effectively computable and constructively provable.
You've misunderstood the difference of "sameness" those two works are trying to address. Knuth is not using an equivalent idea of "sameness" as discussed above. Two people each going by Philosophy of Computer Science and the Art of Computer Programming exercise notion would not always agree if two programs are "the same".
It's also a bridge too far to make a blanket statement of Knuth "already answered this" (it's also not accurate to attribute that idea to Knuth; e.g. Church did work exactly on this decades before). Meaningful discourse and mathematical analysis requires nuance and degrees of equivalence and mixing them up makes it needlessly confusing and difficult.
Sameness as "same input same output" is NOT the same as the "isomorphic equivalence" which is more difficult, strict, and abstract. In this sense they are the same if they follow the same "steps" (also isomorphic between programs), i.e. they are different representations of the same algorithm.
> NOT the same as the "isomorphic equivalence" which is more difficult, strict, and abstract.
This is what the exercise is about. So I would recommend reading before making assumptions.
Of course there are more than one definition of equivalence and isomorphism, but the one explored there is just as interesting as the comment I’m replying to.
drpossum you stop with these gotcha posts? This is our 3rd encounter.
"... it has been argued that there are cases in which it is not possible to determine whether two programs are the same without making reference to an external semantics. Sprevak (2010) proposes to consider two programs for addition which differ from the fact that one operates on Arabic, the other one on Roman numerals. The two programs compute the same function, namely addition, but this cannot always be established by inspecting the code with its subroutines; it must be determined by assigning content to the input/output strings"
"The problem can be tackled by fixing an identity criterion, namely a formal relation, that any two programs should entertain in order to be defined as identical. Angius and Primiero (2018) show how to use the process algebra relation of bisimulation between the two automata implemented by two programs under examination as such an identity criterion. Bisimulation allows to establish matching structural properties of programs implementing the same function, as well as providing weaker criteria for copies in terms of simulation."
(Of course, it isn't surprising that this would be relevant, because proofs and programs themselves are isomorphic).
This technique seems rather stricter than what Gowers has in mind, but it seems helpful as a baseline.
0. https://plato.stanford.edu/entries/computer-science/