Hacker News new | past | comments | ask | show | jobs | submit login
New Paper: Theory of Programs (bertrandmeyer.com)
63 points by fermigier on July 6, 2015 | hide | past | favorite | 31 comments



> Programming, wrote Dijkstra many years ago, is a branch of applied mathematics.

This is as effective as saying "X is a branch of philosophy", where X is any discipline. It's probably true for every discipline, but what's the point?

Also every X is a branch of physics, because we live in a physical universe and everything is inside it (so even pure mathematics is a branch of physics).

And finally X is a branch of computer science because computation and information processing sometimes appears to be more fundamental than everything else. E.g., physics cannot answer the question, "are we living in a matrix-like simulation?" but if we really are, the whole physical universe is mere computation happening in some higher dimension.

I like to think of Mathematics, Computer science, Physics, and Philosophy to be deep and insightful realms of human thought, and they have many connections but also many unique attributes.


I disagree and I think you are interpreting "applied mathematics", "physics", and "computer science" overly literally - rather than labels for fields of study.

The point in the article is that programming is no longer just Applied Mathematics, it's also very much a discipline of Engineering as well. Anyone who studies these fields should understand the difference.

You are right, that these terms can apply to deep realms of human thought - but I think they are less useful terms under that definition.


If subject X is fundamentally about underlying subject Y, then reasoning using the tools of subject Y might help you to have powerful (fundamental) insights into subject X.

I take your point, nonetheless.


Programming is like mathematics, except all proofs are omitted.


By Curry-Howard, the program is the proof, albeit usually not of an interesting proposition.


Bottom is a member of every type, so all Haskell programs correspond to proofs of trivial propositions. Or that is my understanding anyway. And something similar should be true for common imperative languages.


Not quite. Bottom as a member of every type means that you cannot trust my proof (x : A) without examining it and that your examination may be non-terminating. That said, it you examine a realizer and terminate then you have a genuine proof of a potentially non-trivial theorem.

The reason why most Haskell theorems are trivial is because lacking dependent types means that we cannot ask for interesting theorems. `3 :: Int` is a perfectly nice proof that "Integers exist" which is a little silly.


In a rich type system, a program is the proof of its own specification. Which is usually as interesting as the program itself.


Sure, but as of July 2015 the programming languages (e.g. Agda, Idris) that can give full specifications of their own programms, are experimental, not mainstream.


«Which is usually as interesting as the program itself.» And ten times as hard to write. Not just saying that — I spend most of my time in Agda, and even simple things are rather difficult. But I think this will improve as we learn the right way to look at things.


Here's an interesting one:

Mathematicians think programming is mathematics. Programmers think they're simulating the mind of a mathematician!


A program is a proof of the result (theorem) computed from the inputs (axioms) supplied.


At least since 1935 [1, 2] people have been trying to formalize all of mathematics with set theory. I always thought this was fairly interesting, similar to how the real numbers can be modeled in a language using dependent types like Coq.

However, I always found Hoare logic, and its concurrent extension Rely--Guarantee from Jones, to be quite easy to understand. The more interesting part is how to do this automatically for a user. Abstract interpretation is one way to do this, but this necessarily requires mapping some programming language to your mathematical model. However, determining the formal semantics of mature programming languages, even C, is still open research (e.g., see papers on Semantics in PLDI [3] 2015).

TLDR: verification is hard.

[1] https://en.wikipedia.org/wiki/Nicolas_Bourbaki

[2] Topoi, the categorial analysis of logic. Goldblatt, Robert. http://digital.library.cornell.edu/cgi/t/text/text-idx?c=mat...

[3] http://conf.researchr.org/track/pldi2015/pldi2015-papers#pro...


    At least since 1935 people have been trying to formalize all of mathematics with set theory.
Cantor invented set theory for this purpose in the 1870s. By 1935, Goedel's incompleteness theorems had shown the limitations of this (and any other) foundation of mathematics.

Hoare logic is not a foundation of maths, but an approach towards specification and verification of programs. As such it is in the same space as Curry-Howard based programming languages like Idris and Agda with dependent types. The main differences between the two are that (1) Hoare logic is not restricted to constructive reasoning and (2) Hoare logic is not integrated with the programming language, so programming and proving are distinct activities. Depending on your philosophical outlook, either (1) or (2) or both can be seen as major advantage or disadvantage.

    how to do this automatically
What do you mean by "this"? Automatically specifying programs semantics, or automatically proving program properties?

   determining the formal semantics ... is still open research 
This is somewhat misleading. Every compiler gives a perfectly formal semantics to the programming language it compiles. What you probably mean is something like nice abstract accounts of memory models for C that at the same time capture all the optimisations modern C-compilers want to do, while still being implementable on existing CPUs. That is indeed an open problem [1].

[1] https://www.cl.cam.ac.uk/~pes20/cpp/c_concurrency_challenges...


    What you probably mean is something like nice abstract accounts of memory models for C that at the same time capture all the optimisations modern C-compilers want to do, while still being implementable on existing CPUs. That is indeed an open problem [1].
True. Although CompCert [1] is a nice effort toward that goal: a proved C compiler that covers almost all C99 with many optimizations implemented and proved sound.

[1] http://compcert.inria.fr/compcert-C.html


CompCert uses tame optimisations, and compiles to mainstream architectures nothing exotic. Most of all, CompCert does not deal with concurrency as far as I'm aware. Concurrency is where memory models matter.

The problem with memory models is not so much verifying it in a mechanised way, but inventing something suitable at all.


  The problem with memory models is not some much verifying it in a mechanised way, but inventing something suitable at all.
Agreed.

I'm aware of some work done a few years ago by people working on weak memory models and extending CompCert with some concurrency primitives: http://www.cl.cam.ac.uk/~pes20/CompCertTSO/doc/


Thanks for the interesting history on set theory. However, if you read [1] or the first chapter of [2] in my reply you'll see that the strides to formalize mathematics started in 1935 so something does not add up.

I did not intend to say Hoare logic being a foundation of mathematics, this was unclear in my post since I just abruptly changed topics. However, i would be delighted to read someone attempting to do just that.

What I intended to say is that the posted paper is basically a formulation of Hoare logic using set theory.

By this I meant both specifying program semantics and proving program properties. As in, the automated construction of Hoare pre/post annotations or for concurrency rely--guarantee annotations. To me, this is bringing languages without depends types closer to similar kinds of correctness guarantees.

Thanks for the clarifications on semantics. You hit the nail on the head.


I went ahead and skimmed the first chapter of [2] (again; I own it) and don't see it justifying that this stuff began in 1935. Indeed, it suggests, e.g., that a major difficulty of the program to formalize all of math in set theory, Russell's Paradox, was discovered in 1901 and then immediately notes that set theory itself was begun by Cantor decades before.

It also then mentions how Set Theory was formalized and fixed through the introduction of formal classes (which fixed the size concerns discovered by Russell by outlawing them---and actually Russell himself proposed a system which did exactly this, his "rammified hierarchy" almost immediately after discovering his paradox; it is the foundational model of the Principia Mathematica which was published in three volumes in 1910, 1912, and 1913) and that these were in use by the mid 1920s.

In fact, the first mention of dates in the 1930s occur in Hilbert's finitist attempts to prove consistency and Godel's destruction of that program.

What you're probably referring to is Bourbaki's work. This wasn't an initial attempt by any means but instead a broad reaching program to popularize the methods of sets that had been previously established. This was the moment of catastrophic phase change when all old mathematical concepts were suddenly shown to be suspect if they could not be axiomatized set theoretically—so certainly when a bulk of work occurred.

But all said, it is certainly the case that set-theoretic foundations of mathematics was a 1870-1930 affair.


What I couldn't find was an answer to the question "why?"

I could see this being an interesting basis for teaching Computer Science in high schools, where a text like this functions in a way similar to Euclid in high school Geometry courses. I'm pretty sure there aren't any non-exceptional examples of high school Computer Science courses (e.g., the AP CS course is better described as an Intro to Computer Programming course).

And in any case this is a nice exposition.

But aside from that, I'm not sure I see any new insights here about CS/verification, nor any suggestions for research directions that aren't already extensively explored. Perhaps I'm missing something, though.

(Edit: There's a list of suggested future work at the end of the paper. I guess I get it now; although all of these things have been done in verification/PL -- and even by non-type-theorists -- they almost always involve the development of a new logic, and aren't done in pure set theory. So certainly there's a lot of work to do if you want to do things in this style. But I'm still trying to see the benefit of this style, aside from pedagogic or philosophical benefits. Is my inexperience in this area blinding me from some obvious potential? I don't know much about non-high-school set theory.)


Building a unified framework for all of programming theory is useful, even if it provides no new insights. It provides a clearer basis for thinking about what we already know, and thereby makes the new insights easier. So even if the new insights aren't here yet, it makes it more likely that they eventually come.


I guess my question is, why set theory, as opposed to building [new] logics on top of other semantic models (e.g. operational semantics or reachability relations)?

(I don't doubt there are compelling reasons, I just don't know enough about set theory or programming theory to know what they are. Other than the clear benefit of this approach over others in "elementary" educational settings, e.g. US high schools)


Lowest total cognitive load?

That is: If I build my theory on a complicated foundation, then you have to learn the complicated foundation before you can even start to learn my theory.

On the other hand, if I build it on a simple foundation, but that simple foundation means that the theory itself has to jump through a bunch of hoops because the foundation is too simple, that can also make the total (foundation + theory) harder to learn and understand.

So the sweet spot is to use the simplest foundation that does not unduly complicate the theory. (And that may change, depending on target audience.)

Is set theory the best answer? I have no idea, but all of programming in 28 pages, built on a foundation only of set theory, is very impressive.


Thanks for your responses. I think I'm expecting interesting and impactful results out of an exposition/proposal, which is probably unreasonable.

> but all of programming in 28 pages, built on a foundation only of set theory, is very impressive

Yes, it is :-)


Anyone find the definition of `functional` a little bit odd?

Is there a reason why a "no-op" or identity function should be considered imperative? Or why a program must be imperative if at least one post-execution state is also a valid pre-execution state?


Agreed. Some definitions are really bad. I guess the cause is noble but it is executed very poorly. ( The definition of functional is also used nowhere in the document - Although it is mentioned. )

Another oddity: "Object oriented" programs are defined as functions {0,...,n} -> O where O is "a set of objects". Yet: A program is defined to be an endo-relation S<->S. Accordingly, we have O = {0,...,n}.

I don't see how this definition gives any insight on the nature of object oriented programming.

Now for the strangest thing: The definitions of "object oriented" and "procedural" involve differentiating between possible condidates for S. Yet the only property a set without any context really has is its cardinality.

So we can rephrase the definitions of "object oriented" as follows:

An "object oriented" program is a set S together with a total function S -> S with S finite. ([total function] AND [S finite])

Negating, we arrive at the definition of "procedural program":

A "procedural" program is either a relation S <-> S that is not a total function (no restrictions on the cardinality of S) OR any relation S<->S but S must be infinite. ([not a total function] OR [not S finite])

I don't see how this definition captures the notions of object oriented programming and procedural programming in any meaningful way.

But maybe the author will appear here and to clarify his ideas.


Just skimming the definitions they seem more than a bit naïve. A relation is really a more general thing than a program, since it need not be computable in any way.

The real mathematical models of programming are not all that difficult to understand. The most famous, the Turing machine, is really just a finite set of states and a function on it. The complications which arise later are real complications, describing things such as side effects and complexity — its not trivial stuff!


Meyer is trying to move up one level of abstraction; the `programs` he is talking about are synonymous with `specifications` so right away those words are being used in an atypical fashion.

Calling them `progspecs` would be gross and newspeak-y and calling them `program specifications` would be verbose. Regardless, the theory of these `programs` may indeed be useful.

Also, he's not saying that all relations are progspecs, but defining that progspecs are pairs of (relation, set).


Yes: The theory of relations/chu spaces is used in computer science.

See for example [ http://chu.stanford.edu/guide.html ]


Mathematical definitions are expected to be complete, in that if I define an A to be a pair of a B and a C, with no further restrictions, I will accept any such pair as an A.

As an example of a more abstract way of looking at programming, take a look at domain theory. A domain is a set with a relation on it. But opposed to the current topic, there are more restrictions making a beautiful theory, capturing things such as lazy evaluation.


I'd say formalizing programming is desirable, but harder than you would suspect. I'd take a look at denotational semantics and Homotopy type theory for some existing formalisms.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: