Hacker News new | past | comments | ask | show | jobs | submit login
How to compare two functions for equivalence, as in (λx.2*x) == (λx.x+x)? (stackoverflow.com)
176 points by tosh on Sept 14, 2017 | hide | past | favorite | 63 comments



See also Morte: an intermediate language for super-optimizing functional programs

"Now suppose there were a hypothetical language with a stronger guarantee: if two programs are equal then they generate identical executables. Such a language would be immune to abstraction: no matter how many layers of indirection you might add the binary size and runtime performance would be unaffected. Here I will introduce such an intermediate language named Morte that obeys this stronger guarantee."

http://www.haskellforall.com/2014/09/morte-intermediate-lang...

https://en.wikipedia.org/wiki/Superoptimization


Morte isn't actually a superoptimiser, it's a supercompiler.

A supercompiler performs as much evaluation as it can at compile-time; it can be thought of as inlining every function call, and constant folding every literal.

A superoptimiser is a search procedure which generates programs, e.g. starting with individual machine instructions, then pairs of machine instructions, and so on, and checks whether any of them behave the same as the given program; if so, they're used as the output.

Supercompilation is 'top down': we refine the given program again and again until there's nothing more to do. Superoptimisation is 'bottom up': we build up from nothing until we have a binary which implements the given program.


So then the task is to transpile the source functions into Morte, compile them, and see if the executables are equal; and somehow we have solved an undecidable problem (?) There must be a catch somewhere.


Morte isn't Turing complete.


Morte by itself also doesn't support arithmetics.


Urbit does this cool trick where you use pure lambda calculus arithmetic (church encoding), use this theoretically correct impl as a basis of your proofs and reasoning, and then at runtime provide a polyfill that uses hardware arithmetic operators. The runtime impl is thus not equal since it can for example overflow, but in practice that doesn't inhibit our ability to make useful programs.

https://en.wikipedia.org/wiki/Church_encoding

Facebook does this also with GraphQL - GraphQL queries are theoretically dynamic and defined by the client, but you can bake them when you ship, and then provide optimized implementations of the query that also have knowledge of the UI and database that they aren't supposed to otherwise have.


I am reliably informed by many Internet commenters... and can there be any higher authority than that?... that programming doesn't actually involve "math" very often at all, so I'm sure languages that don't support that ivory tower "arithmetic" stuff will be suitable for a wide variety of useful every day tasks, such as scraping web APIs and re-implementing Reddit in a weekend.


Unfortunately, beta-eta reduction on Church-coded terms (as in Morte) is much weaker than what is commonly understood as supercompilation, and it is false that "no matter how many layers of indirection you might add the binary size and runtime performance would be unaffected". This kind of statement can only theoretically hold for something like simple type theory with strong finite products and coproducts, for which (quasi) normalization has been recently shown in

https://arxiv.org/abs/1610.01213


It was my understanding that you just had to provide a safe language without infinite loops (so no ⊥) - is this not true?

https://en.wikipedia.org/wiki/Bottom_type


Undecidable in general, but there are two approaches I've seen work based on AST manipulation and comparison:

1. Solvers that search for applicable re-write rules to transform X into Y, such as Cossette for SQL. These may not terminate because undecidability lol http://cosette.cs.washington.edu/

2. Canonicalization of the AST. This is a form of #1 but much more restricted, and the hope is that functions that are equivalent end up canonicalized in the same way. LLVM and GCC do this for a variety of reasons. In the example given, you'd hope that both functions get canonicalized to either the left or right hand side. https://gcc.gnu.org/onlinedocs/gccint/Insn-Canonicalizations... http://llvm.org/docs/MergeFunctions.html


One way is to use Functional Extensionality, which is to say that two functions are equal if for all possible inputs they return the same value.

In Coq for instance.

  Axiom functional_extensionality: forall {X Y: Type} {f g : X -> Y},
    (forall (x: X), f x = g x) -> f = g.

  Theorem x2_eq_xplus:
    (fun x => 2 * x) = (fun x => x + x).
  Proof.
    apply functional_extensionality. intros.
    destruct x.
    - reflexivity.
    - simpl. rewrite <- plus_n_O. reflexivity.
  Qed.


Does that work for X being the real numbers?


Yes. When we see a universal quantifier (e.g. 'forall (x : X)...') it's tempting to think that we're talking about the thing being quantified over (i.e. 'X'), but in fact what we're saying is that the quantified thing is irrelevant.

In the case of something like `\x. 2 * x` and `\x. x + x`, we're saying "forget about what 'x' might be, because it's irrelevant; look, I'll show you that we can use semantics-preserving rewrite rules to turn one of these syntax trees into the other".

The resulting theorems work for all 'x' precisely because the proof doesn't need to know or care what the value of 'x' is, since it's just shuffled around as a symbol (part of the syntax tree).


I haven't got up to proving things about floating point numbers in Coq. But my guess would be probably not.

But Haskell's quickCheck doesn't seem to find a problem.

  λ quickCheck @(Float -> Bool) $ \x -> 2 * x == x + x
  +++ OK, passed 100 tests.
  λ quickCheck @(Double -> Bool) $ \x -> 2 * x == x + x
  +++ OK, passed 100 tests.
It doesn't like associativity though.

  λ quickCheck @(Double -> _) $ \a b c ->
    (a + b) + c == a + (b + c)
  *** Failed! Falsifiable (after 6 tests and 4 shrinks):
  20.0
  3.5741489348898856
  2.894651135185324


OK, so those techniques would work equally well on arbitrarily complicated and intractable code examples -- which is useful in practice but not in the spirit of a formal determination of equality.

From the perspective of an Agda-phobe: The "2*x vs x+x" example could be a case of a general question about arithmetic expressions, or it could even just be about multiplication and addition. Since multiplication of integers can be defined as repeated addition, proving equality in that particular case for any numeric type just takes a rewrite of both sides in terms of addition only. If the coefficient ("2") was not a natural number, things would be a little more complicated (as other comments mention, you'd have to introduce some way of getting things into a canonical form). I guess that would be an "intensional" approach.

The best story I have about extensional definitions is a true one. At a class on bike repair, somebody asked what a fixed-wheel bike was. The instructor started to give an extensional definition: "It's like... a unicycle." Presumably he could have followed this with other examples such as a Penny Farthing -- but the audience seemed satisfied. It would obviously have been more helpful to give an intensional definition ("no gears".)


> I haven't got up to proving things about floating point numbers in Coq. But my guess would be probably not.

It would work with Coq's real number type but as the same properties wouldn't always hold with hardware float types, you may have trouble running the program if you extracted it to e.g. Haskell (as your example shows)?

The QuickCheck tests would pass if you allowed a small difference between them to account for floating point inaccuracies I imagine.


In IEEE 2 * x == x + x, which follows from the basic premise that these operations are correct to the nearest float.


It works in the sense that there are finite number of them on a finite-size machine, but exhaustion would be infeasible in almost all possible cases.


This is one of the core reasons dependently typed languages are difficult to make practical which I find never gets mentioned. Getting your program to type check requires proving arbitrary properties are true like (λx.2*x) == (λx.x+x), so when this is undecidable it cannot be automated meaning the user has to help by providing the proof which can be arbitrarily difficult (think of pen-and-paper maths proofs but more rigorous and it's a very different skill to programming).


I disagree. In Coq (without axioms), it is not the case that (λx.2*x) == (λx.x+x), and programs type check fine anyway.

Speaking more generally, one of the desiderata when designing a dependent type system is to set things up so that type checking is a simpler problem than unrestricted theorem proving. Of course, if you want to prove an arbitrary theorem that can be arbitrarily hard, but that has nothing to do with using dependent types.


> I disagree. In Coq (without axioms), it is not the case that (λx.2*x) == (λx.x+x), and programs type check fine anyway.

Type checking may be decidable in Coq but now the user has to do the work of putting things into a form that type checks. You're just pushing the problem elsewhere. Even for trivial Coq programs you have to write a large number of proofs compared to how much code you wrote.

> Speaking more generally, one of the desiderata when designing a dependent type system is to set things up so that type checking is a simpler problem than unrestricted theorem proving.

Do you know of any examples of this? I know DML limits you to linear arithmetic so type checking is decidable but that's a big limitation. All other languages like Coq, Idris and Agda require you to write proofs.


You do not even need to use dependent types. You could just restrict yourself to polymorphic types if you wanted to. And you could use dependent types just for the things that don't require any "proving" work. Returning multiple types from a function, implementing type-safe `printf`, checking out-of-bound errors at compile time, things like that.

Dependent types allow you to express things such as `a == b` as a type, and proving that requires some work, but you don't need any of that at all. Idris is a superset of Haskell, you may say.


> checking out-of-bound errors at compile time

Even that quickly becomes undecidable as non-linear arithmetic is undecidable.


Proponents of this approach would probably argue that non-linear arithmetic doesn't show up frequently in index computation. I don't know whether this is true or not, would be curious to see data. Based on my own experience I'd agree with it, with caveats for things like a[i%a.length] but you could probably do something there like replace % with an approximation that is linear and sound.


Even if it was infrequent (I'd be interested in data too!), what do you do when it does come up? It's very difficult to constrain yourself to decidable domains when you want to write arbitrary programs and capture arbitrary program properties. You could let the proof goal go through without a proof but you'd have to replace it with a runtime check otherwise if the goal happened to be false all your static guarantees are out of the window (like doing a bad type cast in C++ for instance).


For C the first approach I would try is just using a general software verification tool. Write a test driver which feeds the two functions two different non-deterministic inputs and assert that their results are the same. A verification tool will attempt to prove that the assertion can not be violated. The yearly software verification competition (SV-COMP) [1] has some very capable tools for verifying C programs. If I remember correctly, SMACK [2] and Ultimate Automizer [3] both do very well over all.

[1] https://sv-comp.sosy-lab.org/2017/index.php

[2] https://github.com/smackers/smack

[3] https://monteverdi.informatik.uni-freiburg.de/tomcat/Website...


I actually did some research in this area for my thesis. Obviously algorithm equivalence is largely undecidable, but it turns out that determining if two SSA representations of some function (maybe one is pre optimization and the other is post) can be reduced to a a graph isomorphism problem


If you have a system that allows you to write down the semantics of these expressions you can easily prove equivalence. You can use many of the techniques posted by others to construct such a proof for example functional extensionality is one method.

This is why dependent type theory is useful for reasoning about program behavior and program equivalence. For example it becomes possible show that a compiler is semantics preserving by demonstrating that the input and output program are "equivalent" in this way.

One issue with blindly translating a subset of expressions to SMT is that you may give them behavior different then the original source program. SMT solvers like Z3 have a builtin semantics for the logic, which may or may not reflect the source program you wrote. Verification languages that use SMT solvers for automated reasoning must be very careful about how they "compile" the source code into SMT queries to ensure that the query actually corresponds to the original program.


Depending on the context, it might be a good idea to use the probabilistic approach: feed it a predefined number of variables and see if the results match.

A similar method is used in matrices to compute AB =? C in O(n^2) time (you check a predefined number of rows). You get O(n^2.something) if you do the actual multiplication


From 2014: "There are only 4 billion floats - so test them all!" https://randomascii.wordpress.com/2014/01/27/theres-only-fou...


Short functions can be equivalent for all but a vanishingly small subset of inputs:

  f1(x) = (x==7239829489) ? 0 : x
  f2(x) = x
so randomized tests would probably say they are equivalent.


This is the same problem for unit tests even with 100% code coverage.


very good counterexample, thank you!


One of the key results on the role of randomness in polynomial identity testing is what's called the Schwartz–Zippel lemma.[1]

As the wiki article states, "Currently, there is no known sub-exponential time algorithm that can solve this problem deterministically. However, there are randomized polynomial algorithms for testing polynomial identities."

[1]https://en.wikipedia.org/wiki/Schwartz–Zippel_lemma

(Note that this is in the context specifically of polynomial function equivalence.)


To adapt the joke[1]:

That's fine if you just need a high enough confidence for military encryption or financial transactions. But what if you're trying to prove a theorem about them, and just can't take any chances?

[1] Believe it was in Aaronson's Quantum Computing Since Democritus


The way I see it there's a whole spectrum of methods for checking program correctness.

On the one end are "point tests", manually written tests that check correctness in several predetermined cases. That's what most software engineers mean by "testing".

Then comes randomized tests/fuzz testing/QuickCheck where properties are defined to hold for all values of some variables and then can be checked on randomly generated values. This offers better correctness guarantees than manually created "point tests" but might still miss failing cases since it's sampling and not exhaustively searching through the whole space.

Finally, on the far end are proofs, where one formally proves that the formerly mentioned properties always hold. I cannot imagine a stronger guarantee that something holds than an actual mathematical proof of it.

Dependently typed languages (Agda, Idris, Coq, Lean, etc) push software development into that far end where correctness can actually be proved. Imo that's where the future of software lays for certain classes of programs where correctness is a must.

Here's a proof of the original question in Lean:

  def foo : (λ x, 2 * x) = (λ x, x + x) :=
  begin
    apply funext, intro x,
    cases x,
    { refl },
    { simp at *,
      dsimp [has_mul.mul, nat.mul],
      have zz : Π a : nat, 0 + a = a := by simp at *,
      rw zz }
  end


>Depending on the context, it might be a good idea to use the probabilistic approach: feed it a predefined number of variables and see if the results match.

for any input space in which a finite number of checks has greater than literally 0% (measure zero) of being representative this problem is trivial (finite space).


Trivial in the math sense. Not trivial in the real-world programming sense. Very different things.


This is an important point: even a finite search space quickly gets out of hand in practical terms. I've worked a bit on bounded verification which comes down to comparing loop-free programs with finite inputs for equality. It is not easy in practice! Modern approaches use sophisticated proof search algorithms (ie SMT solvers) and still struggle to scale beyond relatively small and simple programs. I don't know the numbers off-hand because verification was not the bottleneck in my application, but I suspect that even sophisticated modern methods scale poorly beyond a few hundred instructions or for large inputs (ie algorithms with complex behavior over arrays).

If you have a function that takes a single 32-bit floating point input and runs in, say, 100 cycles, you can test all the possible inputs in a couple minutes or so. If you have multiple floats as inputs or the function is more complex, you can exhaustively check it with an SMT solver or the like. If your function depends on hundreds or thousands of inputs and has complex looping logic, you probably can't exhaustively check it with any automated solution.

Real programs, of course, are orders of magnitude more complex than even the last example. Verifying them automatically is effectively impossible even if they have finite inputs, guaranteed bounds on loops and are "trivial" in the mathematical sense.


Depending on the language, this is undecidable. See the function equality post on cs-syd.eu



So it could be [efficiently?] decidable if you forced the function specification to adhere to certain rules that ensured a standard, easily-comparable format?


It depends on how restrictive your "certain rules" are. It's quite possible to restrict things so much that functions will be comparable in terms of both structure and functionality. But usually that's way too much restriction.

Consider the example of writing a function that does the collatz sequence computation and another function that just returns one. How would you compare them? It will most likely return Equal but even the greatest mathematicians can't say for sure.

Now consider another example where you restrict the function definition to only include simple arithmetic (no control flows i.e. no conditionals no loops no function calls). A good CAS system can already simplify these functions and you can compare them quite easily (though not trivially).

Or you can omit arithmetics in your language and use a strongly normalizing language like [Morte](http://www.haskellforall.com/2014/09/morte-intermediate-lang...).


Not if the language was (still) Turing-complete.

On the other hand, if the rules are restrictive enough to stop the language from being Turing-complete, you could make it work.

There's an interesting design problem there: how do you make a language that's powerful and expressive enough to be useful but simple and restricted enough to solve problems like this reliably?


> There's an interesting design problem there: how do you make a language that's powerful and expressive enough to be useful but simple and restricted enough to solve problems like this reliably?

See theorem proving systems like Coq which use languages that are total. The majority of loops you write while coding obviously terminate (for item in items...) and Coq lets you prove more complex forms of looping terminate so it's not as restrictive as you'd think.


Coq and similar languages are definitely quite expressive, but they're still complex enough to involve non-trivial manual intervention. Verifying something with Coq is possible but it's hard; whole PhD theses have been written around verifying code for a single program. I know there is some proof-search-based automation available (ie auto), but I understand it's still pretty limited.

Coq, Agda and similar languages are restricted enough to make verification possible, but complex—and expressive—enough to make it very difficult to automate.

The opposite extent would be domain-specific languages that are very restrictive but still powerful enough for their domain. Designing languages like that would let us verify non-trivial properties fully automatically and can save a lot of work for the user, but we end up needing to design a language for each class of task that we care about.

As I said: it's an interesting design problem.

One recent addition to this is the Lean theorem prover, which tries to be similarly powerful and general as Coq but with more provisions for automation via the Z3 SMT solver. I haven't played with it myself but from what I've heard it's a very promising development in making formal verification more accessible to non-experts.


> Coq, Agda and similar languages are restricted enough to make verification possible, but complex—and expressive—enough to make it very difficult to automate.

Pretty much every theorem proving system out there is expressive enough to state whatever program property you want. You can't escape having to automate proofs though.

> Coq and similar languages are definitely quite expressive, but they're still complex enough to involve non-trivial manual intervention. Verifying something with Coq is possible but it's hard; whole PhD theses have been written around verifying code for a single program.

Yes, it's the most challenging aspect of programming I've ever tried. It's completely impractical and too expensive for most projects right now outside of domains where mistakes cost even more like aviation, medical devices, operating systems, CPU etc.

> The opposite extent would be domain-specific languages that are very restrictive but still powerful enough for their domain.

Well, see DML where you can only write linear arithmetic constraints so it's limited but automated: https://www.cs.bu.edu/~hwxi/DML/DML.html

I'm not really sure what the sweet spot is to be honest or if there is one when you require full automation. Even non-linear arithmetic is undecidable (basically arithmetic where you're allowed to multiply variables together) so you quickly get into undecidable territories for what seem like basic properties. No clever language design idea is going to let you side-step that proof automation is hard which has been worked on for decades. Even writing program specifications requires a skillset most programmers will have to learn.



When you add restrictions to make equalities like this decidable, it usually means less things are equal to each other if that makes sense so the problem is being pushed elsewhere somewhat.

For example, Coq has decidable equality which is crudely described as "simplify both sides as much as possible and if they're syntactically the same then they're equal". So this could be equal depending on how you define plus:

     plus(1, plus(1, x)) = plus(2, x)
but this wouldn't be:

     plus(1, plus(1, x)) = plus(x, 2)
Long story but "plus" is usually defined by recursion on the first argument and when you put the "x" as the first argument the type checker cannot simplify "plus(x, 2)" any further so it isn't able to simplify both sides to the same form, therefore they're not equal. So this makes equality decidability but now things you intuitively think of as equal aren't equal.


Thank you! That's what I was looking for!


I'm pretty sure you can define commutive and associative relations in Coq.


What do you mean? You can write theorems that show plus is associative and commutative for example but that doesn't make the example I gave "equal" in Coq in the sense you normally mean. As I said, it's not intuitive as we're usually very loose in what we mean by "equal".


Isn't this basically the halting problem?


Almost. Its answer is a direct consequence of Rice's theorem (“A Turing machine can't decide non-trivial properties on Turing machines given as input”), which is itself proven using the undecidability of the Halting Problem (“A Turing Machine can't decide whether Turing Machines given as input stop.”).


Sure, two partial functions are equivalent over a set d iff they halt on the same subset of inputs and have equal outputs whenever each terminates. Then, if you could decide equality of two functions, f, g, over a finite domain d, then let g be a function that loops infinitely for any input, and check equality of f and g over a singleton containing some input.

If f doesn't halt, then they're equal (since they both don't halt over the same set) and if f halts then they're not equal since the sets they halt on don't agree.


... why a question I asked 4 years ago is on the first page of Hacker News?


Because someone found it interesting :)


Does someone know a generic way to do this, perhaps with wolframalpha.com? I was thinking of perhaps having it re-solve for a certain variable and inspect them visually but there may be no guarantee that it comes up with the same simplified forms for both equations.



By providing a heuristic that doubling could be defined as x+x and 2*x. That one plus one is the same as one two times.


Undecidable in general, but you can compile many to SMTLIB and let Z3 crunch them.


Another interesting thought:

Don't x+x and x*2 compile to different CPU instructions?


An optimizing compiler will most likely compile both to the same instruction. (If my intuition is correct, a bitshift.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: