Hacker News new | past | comments | ask | show | jobs | submit login
Functional Algorithms, Verified (functional-algorithms-verified.org)
167 points by haskellandchill on Oct 15, 2021 | hide | past | favorite | 43 comments



How does one learn and get involved with formal verification outside academia? What sort of mathematical background is required.

I always get lost with pure functional languages and the related subjects, even in the introductory material. Though I’ve been able to understand the usefulness of both some functional concepts (through playing with various impure languages like F# or Erlang) and of formal verification (from looking at SPARK with Ada) but all the notation and vocabulary and such of languages like Isabelle’s HOL and Idris confuse me. I even find myself lost among Scala developers.


My advice: swing wide. If you don't know where to start, then judge a book by its cover and read it until you hit a knowledge gap, then slither down that gap and keep digging for resources until you either get bored or achieve something.

I recommend starting with geometric algebra or topology. The aspiring selfstarter has a bevy of great options. Sidney A. Morris' "Topology Without Tears" is fine androgogy, as is the widely-acclaimed undergrad textbook "Intro to Topology: Pure and Applied" by Colin Adams.

The biggest thing, in my opinion, is to realize that there are precious few things you need permission to think, mathematically.

The fundamental mathematical advantage is abstraction. If you can manage your abstractions and approach the math as nothing more than especially pedantic philosophy- it helps.


I'm sorry, how is geometric algebra and basic topology even relevant to formal verification? Your suggestions sound more related to your own interests.

The commenter is better served browsing through some introductory formal verification material like Concrete Semantics by one of the same authors as the linked document.


Not sure about geometric algebra, but topology comes up because it studies the abstract formal structure that connects your problem with other (possibly solved) problems.

For example, a couple weeks ago I was musing about optimal vertex renumberings so that "the (0/1) adjacency matrix of a non-densely-connected acyclic digraph's transitive closure" scores well according to a domain-specific compressibility metric.

Specifically, at column i and row j is a 1 if and only if (using the post-remumbering indices) vertex i was reachable from vertex j in the original input graph.

The compression results from only storing start and end indices of contiguous runs of 1s in a row. [Edit2: the score of a specific vertex renumbering is the the number of such runs for the complete matrix. Only uninterrupted runs of 1's count.] It is easy to see that a renumbering that results in a triangular matrix won't be terrible, and simple deepest-branch-first depth-first enumeration of the input DAG's undirected "version" (one just ignores edge direction for this step) is at least fairly good for sub-quadratic in vertex count (assuming an average degree much smaller than the number of vertices).

[Edit1: I forgot to point out that this is extremely close to plain run-length-encoding of the matrix in row-major order, except for the fact that RLE cares about a row-crossing run and I don't.]

But exhaustive renumbering testing is O(n!) for n vertices.

I think topology could help me, but I'd need to know some topology to find me way around the literature.


I think combinatorics might be more relevant here.

I'll note that I mentioned basic topology in my reply, because topology does indeed have deep connections with type theory that most formal verification systems are based on, interpreting types / theorems are spaces, terms / programs / proofs are continuous functions within them.[1] But basic undergrad topology typically studies more well-behaved (read: more like the real number line) spaces than this.

[1]: https://www.andrew.cmu.edu/user/avigad/Talks/fields_hott.pdf


Sorry, you're correct, and I should have made it clearer. This is general advice for getting into math, which is what I took the parent to be seeking advice for.


This is really bad advise. Geometric Algebra and Topology has nothing to do with proving algorithms correct. Better advise would be to learn Agda and read examples of,proving things in Agda.


1) For the "how to get started with formal verification outside academia", I'll first get the case of "in/for production" out of the way:

a) "use Z3[0] with some binary/source-language-of-choice/LLVM -lifting to handle relatively simple situations that need to": i) guarantee some assert (often, but not always index-out-of-bounds checks, like non-trivial Rust slice indexing) statements are unreachable ii) some bit hacks compute the same thing as the non-hacked implementation

b) "use TLA+[1] to verify concurrent stuff with race conditions": i) in lock-free data structures that don't come with a proof that the algorithms do the right thing (in those cases, just verify three implementation truly matches the proven algorithm) ii) in (network, or sufficiently network-like) protocols that aren't 100% exactly transcribed ports of existing proven algorithms or so simple a finite state machine can (and will) be used

2) For doing it more for fun than short-term gains: a) For getting a handle on pure functional program design, I recommend going through the relevant track of HtdP[2] (Prologue until just before "Arithmetic and Arithmetic" -> I -> II -> III -> Intermezzo 2: Quote, Unquote -> V {-> [if you want to write remotely-performant pure functional code, follow up with:] VI}) b) For learning how to exploit a proof assistant for fun and applied computer science, I'll refer to the guidance provided by Isabelle/HOL. Please use the IRC(/Matrix ?)[3] if you are confused on how exactly to best approach Isabelle/HOL in your case.

[0]: https://github.com/Z3Prover/z3 [1]: https://lamport.azurewebsites.net/tla/tla.html [2]: https://htdp.org/2021-5-4/Book/index.html [3]: [edit: apparently it is neither IRC nor Matrix, but rather Zulip:] https://isabelle.zulipchat.com/


I am further along in the journey, I have been learning formal verification/type theory and logic on my own.

I think these two are excellent books, SF : Logical Foundations for the basics, SF: Verified Functional Algorithms.

Unfortunately there is no royal road in type theory. This subject is incredibly demanding yet satisfying.

Software foundations are excellent books to read even if you aren't interested in verification/program proof.


Mathematical notation can be a bit odd and often not wholly consistent. Wikipedia has a good page that lists mathematical notation which can be helpful while you're reading a paper.

There are also some mathematical concepts you should have a basic understanding of. Functions and sets. This was taught to me in basic algebra. It's actually mostly super straight forward


Type theory is like this in particular. My background is mostly in classical math, and I was initially thrown off by the amount of symbols with special meaning used in CS papers. An important realization was that most of the definitions behind these symbols (many I think borrowed from logicians) are actually quite straightforward, and the papers are not necessarily conceptually difficult so much as layered in their use of abstraction.

I'll mention that Nederpelt and Geuvers' Type Theory and Formal Proof: An Introduction was recommended to me for this topic. I haven't had a chance to study it yet, but I'd say it's worth checking out.


When I was a student in University we used Formal language and automata theory to proof that algorithms are correct. But we studied computer science, not math.


I really like the book Software Abstractions


Small fact about Isabelle: that's where the pipeline operator |> comes from. Here's a post about it with a few sources: http://mamememo.blogspot.com/2019/06/a-brief-history-of-pipe...

"also" isn't an excellent name for it, but I like the idea of giving names to operators instead of using things like >>=, <$>, <*>. I have a hard time reading code with it, as I always pause to think "okay, how should I read that?". I wonder if that's something you get used to, or a difference in how people think.


This is really cool! Seeing more examples of formal verification in the wild is always exciting for me.

I'm slightly let down by how they are formally proving time complexity bounds. I was excited because proving performance bounds is a pretty tough nut to crack, but looking through the sort algorithms (and their overview in Section 1.5) it looks like they don't prove the runtime of the function directly. Rather they create a whole new cost function, assert without proof that the cost function accurately represents the original function, and then prove that the cost function has certain bounds.

I would be interested to hear if anyone has experience with formal verification frameworks that can be used to prove time and space complexity bounds of arbitrary code.


A typical way to prove an algorithm uses less than N steps is include N as a "fuel" parameter to the algorithm and decrement it by 1 at each recursive call, so if N reaches 0 then the algorithm has run "out of gas" so it fails. Then you need some proof terms to verify that N never reaches 0.

Space complexity is different and functional programming historically hasn't paid such close attention to it, since running out of memory can be seen as an external event like a hardware crash. In other words, idk ;).

You might also like Concrete Semantics, a related book: http://concrete-semantics.org/


Yeah I've heard of the fuel param (although I've only ever used fuel params as a hack to brute force "prove" termination). I'm interested in whether there are frameworks that don't require you to change the function signature. I've skimmed (very very quickly) through Concrete Semantics in the past, but never actually sat down to read it. Does it have anything on proving time or space complexity?


Concrete Semantics is yet another book that looks very good that I want to read someday but haven't. So I don't know if it says anything about either time or space complexity. I believe I learned about using a fuel parameter in an old Edward Yang blog post (http://ezyang.com/) but I don't have a direct link.

Jeff Okasaki's well known book "Purely Functional Data Structures" (highly recommended) does some complexity analyses of operations on functional data structures. They're not formalized but maybe they could be.


I've read and worked through Concrete Semantics and I cannot recall any complexity proofs. It's mostly about program semantics, static analysis, abstract interpretation and type/effect systems.


I’m finishing up Concrete Semantics right now. No mention of time complexity that I saw, but even without that it’s become one of my top programming books of all time.


Try playing with Why3. http://why3.lri.fr/ It's slightly idiosyncratic, and it has its own programming language that is ML-like. But it allows you to write arbitrary code in that language, and then prove statements about that code using SMT solvers as helpers. I haven't used it in practice yet, just played with an example, but it felt quite powerful if you get to know its capabilities.


I haven't used Why3, but I have used other SMT solver-based frameworks such as Dafny and TLAPS for TLA+ and they all seem focused on proving function behavior, rather than function runtime or memory usage. I suspect there's a way to encode runtime and memory usage theorems, but I don't have a clear conception of the best way of doing so without polluting function definitions.


If you are already familiar with Coq or even Ocaml/Haskell style programming.

I really believe, the volume 3 of SF series is a more polished and more approachable book.

https://softwarefoundations.cis.upenn.edu/vfa-current/index....

I am halfway through the first book (doing ALL exercises), I hope by next year end I would really like to finish vol 3.

This Isabelle books seems to cover more ground though.


"Functional Algorithms" is a strange topic for me since algorithms seem very imperative, compared to the declarative nature of FP.


If you're into the idea, I'd recommend Purely Functional Data Structures by Okasaki and Pearls of Functional Algorithm Design by Bird, both really interesting books IMO.


Bird's book and its prequel, Thinking Functionally with Haskell, are outstanding.

His approach of calculating algorithms by performing algebraic transformations is very interesting.

It attracted some interest during the 80s and 90s, but it has faded away as it's too tough to be followed by a human. I wonder if it could benefit from some automation by using e.g. modern synthesis techniques.


Computation is the creation of and executing of algorithms. Even if an algorithm is expressed functionally, it progresses through a sequence of states when executed.

Check out Computation and State Machines by Leslie Lamport: https://lamport.azurewebsites.net/pubs/state-machine.pdf. Specifically, in Section 3, he shows how interactive and recursive definitions of the same program are actually executing the same underlying algorithm, even though their source code looks drastically different.

Syntax is superficial, and declarativeness is really just an illusion, or maybe a convenience. But, a program is executing a sequence of states no matter what paradigm it’s expressed in. Functional programs are still executed.

In this respect, you’re talking about a very superficial aspect of algorithms.


Additionally lambda calculus, general recursive functions, combinatorial logic, Turing machines, and a number of other systems are equivalently powerful[0] and so any computable function can be expressed in any of the systems. So it's fine to refer either to purely functional algorithms or purely state-machine algorithms since they are equivalent statements.

[0] https://en.m.wikipedia.org/wiki/Church%E2%80%93Turing_thesis


Different formalisms can describe the same concept, but they don’t always make the same analyses easy.

Eg: one of the main things we do with algorithms is analyze their time and space complexity, and it’s often easier to do this in a state machine model


It might be more correct to say, that a program are the instructions, which lead to the machine transitioning throught some sequence of states.


Any imperative algorithm admits functional mathematical interpretations. While humorous, the observation that programming in C is purely functional[1] isn't wrong. Also consider, for example, that the formalism for the Turing Machine, the archetypical imperative automata, is purely functional.

More seriously, the semantics of an imperative algorithm are perhaps always best understood as mathematical functions. Predicate transformer semantics reduce verifying an imperative program to proving a first-order formula. Leslie Lamport, who most people doing so-called "formal verification"[2] know for TLA+, is also a shining light in this area of research. They are observably complementary cognitive tools for the task of writing correct programs, even if one is attracting more grants. The trick is learning to write programs that reduce to formula that can practicably be solved, automatically or not.

Granted, for many languages that first-order formula might be difficult to solve for many programs, but that doesn't change the fact that there exists a functional description of the procedure's semantics, even if the process the hardware actually executes isn't so analyzed[3]. But then that's true for Haskell too. It's just Haskell and its cousins do a comprehensive job of beating you over the head that you're reasoning about program semantics and not processes.

[1] http://conal.net/blog/posts/the-c-language-is-purely-functio...

[2] Correctly it should be called automatic verification. Formal reasoning predates the automatic computer and automatic verification is a subset of mathematically rigorous formal program verification.

[3] Not to say it necessarily can't be.


Seems the implementation is Haskell, would've been much cleaner in Ocaml. Appearently it's a grad coruse in functional algorithms but can't even see the time-complexity analysis of heapify.


From section 1.1:

> The programs in this book are written in Isabelle’s functional programming language

From what I can tell, Binomial Heaps also have a runtime analysis. Not sure what you’re referring to with “heapify.”

I would encourage others to ignore this snide comment. I have read other books from the authors, particularly Concrete Semantics, and it’s honestly incredible what can be achieved with Isabelle. Obviously it’s not going to be interesting if formal verification isn’t your thing, but if it is, the content is priceless.


This is not Haskell. Everything is implemented in Isabelle/HOL.


Maybe someone here knows: is there a standard transformation from sequential logic to combinational logic? I'm thinking something like that immediately gives me the 3sat representation of an algorithm that has an iteration step.


3sat is not Turing-complete since every 3sat problem is finite but Turing machines have infinite tapes. Combinatory logic[0] is Turing-complete but combinational logic[1] is not for the same reason.

[0] https://en.m.wikipedia.org/wiki/Combinatory_logic [1] https://en.m.wikipedia.org/wiki/Combinational_logic


Ya you're right I've confused complexity with universality. But actually it's not there actually exist TMs with infinitely long tapes so actually my question still makes some kind of sense - how do I express a finite program (that I constrain to not loop infinitely) as a boolean or combinational circuit.


My understanding is that you don't get a choice. You either allow loops, and accept that they might go infinite, or you axe them altogether. I believe this is a direct corollary of the halting problem, but please correct me if I'm wrong.


The halting problem is solvable for turing machines with a finite tape. A simple encoding of a finite turing machine into SAT is as follows:

1) Let n be the size of the tape.

2) Consider all binary strings of length n

3) For each string {A_i} run the finite turing machine until it either terminates or repeats a state. Since this model has a finite tape, it is guaranteed to do one of these. If it terminates, consider its output to be 0, prepended to the final tape state: 0|{B_i}. If it loops, consider its output to be the n+1 length string containing all 1s.

In this way, the turing machine is defined exactly as a function f : {0,1}^n -> {0,1}^n+1

All such functions can be trivially encoded into a boolean circuit. And all such circuts can be reduced to an instance of 3SAT.

Granted, this encoding is exponentional with regards to n, so isn't useful if you want to reason about complexity, but it does show that it is not computationally impossible, as the general case of the halting problem is.


Thank you! That was really intuitive, it's nice to have a little midnight thought snack before bed. That mapping and description strike me as integral-like in their description. I,ll read into SATs, any resources you'd recommend for topics like this (or related)?


Does a constructivist model allow for infinite tapes?


Loop unrolling.


That only works if there are no branches




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: