Coming Soon: Machine-Checked Proofs in Everyday Development

haskellandchill · on Dec 30, 2017

My current reading list:

- Fundamental Proof Methods in Computer Science

- Handbook of Practical Logic and Automated Reasoning

- Software Abstractions

- The Little Prover

Yes, currently I have Athena, OCaml, Alloy, ACL2 (via Dr Racket/Dracula) all up and running to work through the exercises in these books. I'm very skeptical anything will come of this, but it interests me and I can see the value.

Hopefully the speaker's next book Formal Reasoning About Programs (https://github.com/achlipala/frap) will be a nice next step, I have Certified Programming with Dependent Types but have not cracked it yet. Also keeping an eye on Verified Functional Algorithms (https://softwarefoundations.cis.upenn.edu/vfa-current/index....).

meneame2 · on Dec 31, 2017

Three extra classics:

* https://books.google.es/books?id=YseqCAAAQBAJ

* http://www.concrete-semantics.org/

* https://softwarefoundations.cis.upenn.edu/

adultSwim · on Jan 3, 2018

+1 for Software Foundations

abecedarius · on Dec 31, 2017

> The Little Schemer

Do you mean The Little Prover?

haskellandchill · on Dec 31, 2017

Yes! Thanks.

cryptica · on Dec 31, 2017

Mathematical proofs of code are not suitable for most systems.

Even unit tests which don't involve proofs can be a problem sometimes because they lock down API inputs and outputs. It's already a major problem when a developer wants to change some code and they have to spend hours updating unit tests every time. When in small teams, heavy 100% coverage unit tests slow down productivity, possibly by 5x or 10x.

I imagine that adding proofs as part of the unit test suite would further increase the productivity cost overhead by a massive factor.

In software development, you should avoid treating the code like if it's special or perfect because it's not and it will need to be changed in the near future when requirements change (and they will always keep changing). So locking it down into a specific state with proofs all around it is a bad idea for the vast, vast majority of cases.

We need more wisdom in software development and that means taking a step back, looking at the big picture and asking questions like what are the negative side effects of each new methodology that we are introducing into our projects? There are always downsides, and we're crazy to pretend that every fancy new methodology is a silver bullet.

jononor · on Dec 31, 2017

Test primarily against external APIs. Those should be stable against change anyways, so that the external consumers can rely on it.

Generally don't test against internals much, except if an area is shown to be buggy and hard to exercise externally.

Prefer (well tested) libraries for generic functions/algorithms. If you have to implement one, put tests on it as if it was external API. Consider publishing it independently to make it external. If needs change, just drop the dependency.

hmottestad · on Dec 31, 2017

It's hard to know the intent of a developer by reading their code. You could always add a bunch of comments, which would help, but in the long run you risk them getting out of date.

My unit tests are basically "comments" for my code, that show me how the code is supposed to work. When I make changes, the tests will tell me if I broke the intentions of the initial developer.

Sure, when doing big changes, just updating and maintaining the tests is a lot of work, but so would maintaining documentation and comments be. So we have less documentation and fewer comments, and more tests.

cryptica · on Dec 31, 2017

I find integration tests more useful for documentation because they show you how the system works at a higher level which is precisely what new developers need to understand.

If the code itself is written properly then the individual classes and objects used throughout should be high cohesion and easy to read without needing any comments or tests.

There are always cases where unit tests are important though like the parts of the code that deal with money.

But there is no benefit in adding unit tests if you're building a standard chat system for example; integration tests are sufficient in that case.

Also getting 100% test coverage is a complete waste of time if you don't actually test the right kinds of input and output parameters.

rectang · on Dec 31, 2017

> But there is no benefit in adding unit tests if you're building a standard chat system for example

Disagree. Finding the right amount of unit testing is an optimization problem. Some amount is nearly always useful.

For starters: Integration tests will not necessarily exercise a full range of inputs for the low-level functions. For a chat system, I would think hard about how to test low-level functions with a wide range of Unicode input.

A chat system must survive in the harsh environment of the public internet. It needs to be hardened against unexpected user input, either incidental or malicious.

crdoconnor · on Dec 31, 2017

>For starters: Integration tests will not necessarily exercise a full range of inputs for the low-level functions. For a chat system, I would think hard about how to test low-level functions with a wide range of Unicode input.

I don't see any problem at all with writing integration tests that exercise a chat system with a wide range of unicode input.

I'd view it as irresponsible not to, in fact.

rectang · on Dec 31, 2017

Low-level functions may be reused in combinations which limit what gets fed to them by higher-level tests. Perhaps there are bugs in some text processing widget, which only happen on small blocks -- and small blocks only get fed to low-level functions on disconnection from the service, which would be too costly to repeat exhaustively in integration tests.

The most reliable way to address this kind of problem is to match the tests to the function. Look at what each function does, anticipate edge cases, contemplate the core purpose, then write what you believe are an appropriate number of unit tests.

You can choose not to prioritize the problems which are revealed by unit testing but not integration testing, but you can't say that such problems don't exist.

crdoconnor · on Jan 1, 2018

>on disconnection from the service, which would be too costly to repeat exhaustively in integration tests.

Um, disconnection from the service is the exact kind of scenario I'd want to repeat exhaustively in an integration test.

rectang · on Jan 1, 2018

This reply doesn't address the principle that there must be issues which can be addressed using unit tests but not integration tests. Even if you take every example I give and claim "doesn't count, I'd test for that", you're just writing unit-tests-once-removed using an integration testing framework.

crdoconnor · on Jan 2, 2018

>This reply doesn't address the principle that there must be issues which can be addressed using unit tests but not integration tests.

Well, that's asking me to disprove a negative. All I can say is that I don't think that principle is true.

One counter-example of a scenario that is impossible to integration test but possible to unit test would be sufficient to disprove my hypothesis, but I don't think it exists and I don't think you can provide it.

>Even if you take every example I give and claim "doesn't count, I'd test for that", you're just writing unit-tests-once-removed using an integration testing framework.

I'm really not. I can test disconnection scenarios in an integration test without even coupling to the language under test by running a service that intercepts network traffic. It's obviously not a "unit test once removed" if I can literally swap out the entire language under test and not have to change the test.

rectang · on Jan 2, 2018

I already provided such an example: the disconnection. The key aspect of that theoretical example was that it would be "too costly to repeat exhaustively in integration tests". You simply wished away that trait, avoiding the issue.

crdoconnor · on Jan 11, 2018

No, I can test that.

zxcmx · on Dec 31, 2017

I think a part of the difficulty is in the language of coverage.

"100% test coverage" usually means statement coverage due to the nature of popular tooling.

If you deeply care about correctness you are usually more interested in path coverage which is more tied to things like heap shape or state space... (which depending on your system design might encompass stuff in your database). Also yeah, technically the exceptions which can be thrown at each point in your software if you work in a language with those.

Unfortunately there's not much tooling for that so it's not visible in the same way as the more naive metrics.

Anway my point is that you can waste time pushing for "100% statement coverage" measurable by naive tools but still not exercise important paths, which I think is your point.

z3t4 · on Dec 31, 2017

I want to clarify that the whole point of unit tests is to have 100% coverage, that all code paths are exercised by the tests. If you do unit testing and have code that is not covered it's either a bug or dead code that should be removed.

hmottestad · on Dec 31, 2017

Integration tests are much more brittle in my experience. A simple test for doing frontend/backend integrations tests with web driver can easily break due to race conditions or changes in other parts of the system. Such tests are great for detecting that something is wrong, but it can be very hard to figure out what's wrong and where things went wrong.

Had a lot of "fun" rewriting a bunch of integration tests for angular, because the framework used a lot of Promises which caused race conditions. Those tests were much harder to write than unit tests and were much easier to mess up.

cryptica · on Dec 31, 2017

They're not necessarily brittle but I know what you mean. When the integration tests reach across multiple services then things tend to get brittle but this can be fixed by stubbing out the external services.

Stubs are not just useful for unit tests. Service stubs are often much easier to write than module stubs because services often communicate via string based protocols which are easy to simulate.

crdoconnor · on Dec 31, 2017

>Integration tests are much more brittle in my experience.

In my experience brittleness in tests is caused by bugs in tests or bugs in code.

Most of these things can be fixed, e.g.

* Add an order by to select statements that otherwise returns data in an indeterminate order.

* Fix the interpreter/database version you are using to run your code rather than relying upon the version the package manager has installed.

* Stub out code that gets date/time/random numbers.

* Stubbing out external APIs which go down on Tuesdays.

* Fixing the bugs in your code.

* Abandoning the code you rely upon which is buggy (I'm just sayin', angular might not be the ideal framework).

Avoiding writing these tests because you know that "they'll be too buggy" doesn't exactly send out a great signal.

>Such tests are great for detecting that something is wrong, but it can be very hard to figure out what's wrong and where things went wrong.

This is alleviated by integrating debugging tools into your test environments.

pmontra · on Dec 31, 2017

Recently I found a get_info(doc) which was hard to understand. It should have been get_xrsf_token(web_page) which would have been self explanatory.

I mostly develop backends of web apps. In my experience not every web app has some code that needs to be explained. Picking the correct names is usually enough.

That said, unit tests really do what you write: provide examples, use cases and guidance.

z3t4 · on Dec 31, 2017

I find the best approach is the keep the code base small enough so you can keep everything in your head. Knowing that if you change something you also have to update this and that , and make sure feature x, y, z still works (which could be done with automated tests).

mauricioc · on Dec 31, 2017

The presentation addresses this in a way. The argument is that in Coq you can write proof _scripts_ ("repeat this simplification operation until it does nothing", "pattern match on this definition and apply this operation to every subcase", "proof tactics" and so on), and in this way the same proof script can prove different theorems.

So there is a possibility that changing the code only forces you to change theorem statements (that is, the spec), allowing reuse of the proof script itself. Now you are doing software engineering on proof scripts, and you face the same generality/flexibility problems you would face when writing unit tests. A well-written proof script would be flexible enough, and a hacky one would lock things down too much.

bsaul · on Dec 31, 2017

That part made me wonder if a day will come where some generic script would be able to work with any theorem on any code, maybe with the help of some kind of introspection ( at this level of abstraction i’m not even sure introspection is the right concept).

firethief · on Dec 31, 2017

Short answer: no. Long answer: https://en.m.wikipedia.org/wiki/G%C3%B6del's_incompleteness_...

There cannot be a program that can prove (or even help a human to prove) any true statement about programs written in a Turing-complete language.

bsaul · on Dec 31, 2017

As always with this theorem it's pretty hard to be sure if the kind of non provable statement happen often enough to be a concern in practice

hackinthebochs · on Dec 31, 2017

But most code isn't using the features that make Turing complete languages messy, i.e. evolving the executable code as executing progresses. Most real-world code exists in a subset of Turing complete languages where generic theorem proving could probably work.

adultSwim · on Jan 3, 2018

Re-think your assumptions about "mathematical proofs". Compiling software with types is a mathematical proof. We get certain guarantees "for free" without even thinking about it. It's immensely useful and flexible. We can build better type systems than are in common use. So much more is possible than just making sure we don't stick an int where a string is supposed to go. Moving forward is about raising the bar on what we get "for free".

We have a lot of experience building programs that help us program (e.g. compilers, IDEs, build systems) We can move forward without falling into traps (low performance, high programmer knowledge required, overly rigid, more work required than gained in productivity)

There are no silver bullets but we can continue to improve.

KirinDave · on Dec 31, 2017

> Mathematical proofs of code are not suitable for most systems.

It depends what kind of math and what they're doing though. Systems like Coq give you much more trustable code to put in your system, and higher degrees of trust for clients. But systems like Idris actually let you write code and use the proof engine to write trustworthy code.

Most of that "math", from the perspective of a programmer, is a small subset of graph theory, formal logic, and category theory useful for proving relationships and reflexivity so that the machine can help you.

> Even unit tests which don't involve proofs can be a problem sometimes because they lock down API inputs and outputs.

This is a much bigger problem than "hand written mocks are bad" (which they are, I agree). But it's quite out of scope to complain that because API endpoints are often sloppily deployed without versioning or introspectability, that all formal methods are wrong.

What's more, and I really want to push this point so I'll emphasize it: quickly producing broken code helps almost no one. If you're an artist playing with making a synthesizer whine? Maybe okay. If you're writing software intended to face the world and do real work? We can't just keep pretending we're all smart enough to do this without mechanical aid; we're NOT smart enough to do this without mechanical aid anymore. Programs integrate too many libraries and APIs.

> I imagine that adding proofs as part of the unit test suite would further increase the productivity cost overhead by a massive factor.

With Idris, it makes it so the program can write code for you. I've been learning it over the last 2 weeks and I'm totally fascinated by how much drudgery it takes out of total programming.

> In software development, you should avoid treating the code like if it's special or perfect

This is precisely what formal methods do! You challenge every piece of code you can and say, "Prove to me that what you've written doesn't have an obvious flaw!" You rewrite until both the human and the proof engine cannot see anymore. It is not sacred at that point (no one believes this), but at least you've acknowledged that you can write errors and started to mitigate them.

> There are always downsides, and we're crazy to pretend that every fancy new methodology is a silver bullet.

Actually, the entire industry has started to demand that developers do a lot more computation on their ends to make better software. The "prove more of your program and write total code" is just one side of the "do more work on your end" school. Another is, "Your software is in fact the output of this algorithm training from a dataset on 4 GPUs." We probably need a similar step forward for APIs (where APIs are invalid & unusable without a spec that governs them).

It is a change, but most commercial software engineers are handling user data that can be leveraged in disastrous every day. As you say, we need the wisdom to realize that it's difficult to guess the use case or security impact of the code we write on a daily basis, and we need to start taking that fact seriously. These heady days of being able to deny responsibility and shove out any software we want without consequence need to end.

abecedarius · on Dec 31, 2017

While automated reasoning tech is getting better, the barriers have been more on the demand side. Until buggy vulnerable software makes you poor and visibly correct software makes you rich, we'll keep getting the vulnerable sort. IOTA, which rolled their own hash function and then blustered about the problem, right now has a market value of ten billion dollars.

We're starting to see a change in the economics. There's quite a way to go -- I hope we get there intelligently and not as some kind of thrashing reaction to giant disasters.

KirinDave · on Dec 31, 2017

In one sense, the repeated and total fiascos of Ethereum and IOTA (which has a hell of a lot more problems that just an ad hoc has function) are good because they make the costs of our industry's lazy and self-indulgent "artist" approach to software very obvious, and the cost of it becomes clear.

abecedarius · on Dec 31, 2017

I don't think those two projects belong in the same category. I got into Ethereum tech in 2016 and found it so immature that I sold most of my ether. I'd expected problems and felt disappointed even more so. But it looks to me like Vitalik was brilliant and inexperienced then, while IOTA has basically nothing to recommend it. I'm not looking to start another cryptocurrency argument, just noting my disagreement.

Yes, real money in smart contracts is part of the economics shift I brought up.

KirinDave · on Dec 31, 2017

A fiasco is a fiasco. Vitalik is still around because he at least owns his fiascos. The IOTA folks probably won't be at this rate.

crdoconnor · on Dec 31, 2017

>But it's quite out of scope to complain that because API endpoints are often sloppily deployed without versioning or introspectability, that all formal methods are wrong.

It's not so much that they're "all wrong", but when the majority of bugs you face are not logical or algorithmic in nature but driven by incorrect specifications or problems dealing with the outside world (including but not limited to sloppy APIs), then formal methods will impose a stupidly high cost for almost no benefit.

To take the famous Mars Climate Orbiter bug as an example - had the software been formally proven then it still would have gone down in a ball of fire because the software was built upon a faulty presumption. An integration test that exercised both the data and the code in a realistic environment would have saved it though, as would better spec communication between the two teams (e.g. some form of BDD). The chief investigator, Arthur Stephenson, even explicitly called out the lack of sufficient end to end testing.

I'm sure formal methods are great for proving that, e.g. a particular sorting algorithm always works, but frankly, sorting algos are the type of thing I import and just assume works and I've yet to see a bug in the wild that invalidates that assumption, whereas a faulty interaction between two poorly specified modules is a problem I see in teams on a near daily basis.

KirinDave · on Dec 31, 2017

> It's not so much that they're "all wrong", but when the majority of bugs you face are not logical or algorithmic in nature but driven by incorrect specifications

But automation could verify this. Unit tests don't do anything if they aren't run. Specifications don't do anything if they aren't checked. Do you have CI/CD? Use it!

> then formal methods will impose a stupidly high cost for almost no benefit.

Except that lots of business logic errors, resource handling errors, and mathematical errors still exist in modern code at all levels. And you're also assuming the "cost" is "stupidly high" which people in this thread are telling you is false.

Finally, please forgive me for being so blunt... If you're delivering code you haven't tested to hit a timetable, you're part of a scam. People don't want broken code. People aren't buying security holes. People aren't asking for things that almost work but then dump user data by the score. No one wants this, but developers like us keep insisting it's fine because they're not personally responsible.

I wish that you or I could be fined if we didn't show due diligence. It'd be good for the industry.

> To take the famous Mars Climate Orbiter bug as an example - had the two individual modules of software been formally proven then it still would have gone down in a ball of fire. An integration test that exercised both modules would have saved it though, as would better spec communication between the two teams (dare I say it? BDD).

> To take the famous Mars Climate Orbiter bug as an example - had the two individual modules of software been formally proven then it still would have gone down in a ball of fire.

Cool, except that the module integrating these, if validated, would have caught this. Which brings me to my second point:

> An integration test that exercised both modules would have saved it though,

I'm not sure why you think using a computer to write proofs about your software somehow magically excludes integration tests. Excluding integration testing is something humans already do well.

If anything, running a type checker and using pre-existing proofs on an new module that integrates them is much more than most people do.

> as would better spec communication between the two teams (dare I say it? BDD).

The irony here: BDD is a less formal but similar technique to writing proofs. BDD could be integrated directly into the development piepline (as it is with Idris) and the compiler could tell you when you missed a spot, or forgot to handle something.

It seems you only dislike it when the compiler does it.

> I'm sure formal methods are great for proving that, e.g. a particular sorting algorithm always works, but frankly, sorting algos are the type of thing I import and just assume works

What is the difference between checking if a program properly closes a socket or if a sorting algorithm works? From a proof checker's perspective: it's very little.

> sorting algos are the type of thing I import and just assume works

And this is and example of the mentality that ensures that despite the fact that we've seen the O(n log n) speed limit for sorting shattered, no one knows or cares.

> whereas a faulty interaction between two poorly specified modules is a problem I see in teams on a near daily basis.

Why don't you want to make it easier to write better modules, and make it easier to verify the integration of modules? "It's hard so I won't even try," seems to be your reasoning here despite several people saying it's possible.

crdoconnor · on Dec 31, 2017

>But automation could verify this.

Yes, that's what I was arguing.

>Except that lots of business logic errors, resource handling errors, and mathematical errors still exist in modern code at all levels. And you're also assuming the "cost" is "stupidly high" which people in this thread are telling you is false.

My experience with formal proofs is it imposes a high cost. I don't think that there are people in this thread who believe that you can impose formal proofs on a system without incurring a significant cognitive overhead (and possibly other costs too). If they exist, they are simply wrong.

>I'm not sure why you think using a computer to write proofs about your software somehow magically excludes integration tests.

I'm not saying that it does. I'm saying that the kinds of bugs unveiled by formal methods are, in most applications, rare and obscure and the bugs unveiled by integration tests are, in most applications, common and important.

When I have limited resources (everybody has limited resources), I tend to take the approach that has the best ROI in finding bugs. For most applications, that isn't formal proofs.

>Cool, except that the module integrating these, if validated, would have caught this.

You still aren't getting it. The module was built upon faulty presumptions. Peter Norvig himself said that no language, no matter how type strict, would have enforced this convention.

>The irony here: BDD is a less formal but similar technique to writing proofs.

I'm sorry, but writing user stories on the back of a napkin (yes, that's BDD too) isn't a similar technique to formal proofs in any way, shape or form.

BDD is simply a means of communicating example driven specs that may or may not be turned into executable user stories.

Automated tests (a common side effect of BDD) are not really similar to proofs either.

>What is the difference between checking if a program properly closes a socket or if a sorting algorithm works?

The likelihood of bugs in the definition are significantly higher the less abstract your code gets. To wit: maybe your presumption that the socket needs to be closed is wrong.

>Why don't you want to make it easier to write better modules, and make it easier to verify the integration of modules?

I absolutely do, that's why I use the right tool for the job.

KirinDave · on Dec 31, 2017

> My experience with formal proofs is it imposes a high cost.

What is that experience within the last 5 years. Does it include Idris, Agda, and Coq?

> I don't think that there are people in this thread who believe that you can impose formal proofs on a system without incurring a significant cognitive overhead

The idea is that by asking humans to make small proofs, the remainder of the majority of additional cognitive overhead can be assumed by the machine.

This conversation has a curious pre-assumption that this cognitive load is "new". In fact, it's not. It's an alternative approach to working with the complexity already apparent in our domain.

> You still aren't getting it. The module was built upon faulty presumptions. Peter Norvig himself said that no language, no matter how type strict, would have enforced this convention.

Why is Peter's comment relevant here?

> I'm sorry, but writing user stories on the back of a napkin (yes, that's BDD too) isn't a similar technique to formal proofs in any way, shape or form.

It is if you care about the results. A lot of modern BDD process has humans as verifiers. And no one's saying that has to go away. But we might need it LESS for certain parts of the system.

Certainly in my user-facing fintech applications I have wished for tools that could do this. Sadly a lot of my work in that field predates the usability of a lot of these tools.

> I absolutely do, that's why I use the right tool for the job.

And the right tool for the job is evidently not a computer?

What I think you're missing is that these formal proof systems come with an implicit power: the ability to extend the a more generic type system at build time to reason about your domain.

This application is as radical in nature as it is classical. Everyone in every domain has suggested for half a century that we work to bring our code closer to our domain and further from the minutiae of machine implementation. Modern systems seek to do this by creating stable abstractions to build upon.

This takes 2 shapes: trained by data (classifiers, regressions, and statistical methods) or refined by logic (proofs, dependent types, type inference and verification). The line between these types is somewhat transient, and over time it's almost certain it will blur. Already we see F# using "Type Providers" that let you use samples derived directly from an API (or specs, or both). These can be obtained as part of the build process, and then the program can be checked against the spec.

There is only one real downside to this approach: it makes development require even more computational power. But given that we're solving a much harder problem than the users of our software, the asymmetry seems reasonable.

I think you and I agree in the general strokes but you don't like the word "proof", if I'm being honest. I don't know why that word bothers you so much. Did you watch the video? Have you tried Agda, Coq and Idris?

duneroadrunner · on Dec 31, 2017

Here's my hypothesis: The problem with these static verifiers is that they are applied at the wrong stage. The verification logic would be more useful in more cases if it were applied at the optimization stage.

Right now the benefits of static verifiers is rather binary. Either your program (or code unit) is sucessfully verified or it isn't. And every time you make a change to verified code you risk losing its verified status.

Imagine instead writing code that is made "safe" using run-time asserts everywhere. Unoptimized, this code might be slow. And unreliable in the sense that a run-time assert could fail, requiring some kind of recovery or abort action. But it would still be "safe".

The run-time asserts are effectively the "program specifications", and the discarding of those asserts at the optimization stage is basically equivalent to traditional static verification. The difference being that not quite fully optimized "safe" code is generally more valuable than not quite fully verified (and therefore not fully "safe") code.

> These heady days of being able to deny responsibility and shove out any software we want without consequence need to end.

If the static verification community is imploring the development community to, with some effort, change its practices to improve code safety, would that community be willing to, with some effort, move the application of their verification logic to the optimization stage where it would be more practical for a wider range of applications?

KirinDave · on Dec 31, 2017

> Here's my hypothesis: The problem with these static verifiers is that they are applied at the wrong stage.

They can be applied at lots of stages. Why not apply them to all of them?

> Either your program (or code unit) is sucessfully verified or it isn't.

This is demonstrably false. It can help you write better, safer code incrementally. Every approach that is discussed here has a "refinement" plan.

> Imagine instead writing code that is made "safe" using run-time asserts everywhere.

I can imagine this, it's the C/C++ style of code circa 2010. It's awful for users. But it's not what we're discussing here.

> If the static verification community is imploring the development community to, with some effort, change its practices to improve code safety, would that community be willing to, with some effort, move the application of their verification logic to the optimization stage where it would be more practical for a wider range of applications?

Yes. You're already seeing two major examples of this: Type Providers and derivations of dependent types fall into one camp, dependent type implementations (of which the most prominent is Idris) fall into the other. All of them offer the ability to extend the type system and make proofs useful to developers even as they also help make code total and enforcible.

I think it's further along than you seem to suggest. Can I suggest you grab a copy of Atom, Atom-Idris, and go through some Idris tutorials? You can also try F# and use Json samples pulled from an API to generate types that inform your tooling and improve code quality.

nickpsecurity · on Dec 31, 2017

As KirinDave said, we can apply our tech to as many stages as we like. If applied redundantly, has the extra benefit of verification at one stage catching problems in an earlier one. Far as propagating verification conditions or checking in middle, you might find these concerns addressed by proof-carrying code plus equivalence checks on untrusted optimizations as in VeLLVM. There's also work that equivalence checks transformations in assembly code.

https://news.ycombinator.com/item?id=16041110

jpochtar · on Dec 31, 2017

We already have a basic form of machine checked-proofs in everyday code: type systems. Type systems encode and prove simple theorems like "f(x: int) -> int returns an int, if x is an int". Typescript and Mypy show us that these proof checkers don't need to be built into a language. Even better, they show that proofs can be over subsystems of a program, since Typescript and Mypy both allow for partial/incremental typing. We just need to extend these systems to support more complex theorems.

bcherny · on Dec 31, 2017

Incremental type systems are super interesting. Lots of work being done in the last 10 years that blurs the lines between static and dynamic typing (monkeytype, reticulated Python), types and values (via the triumph of inferred types), kinds and types (powerful trait systems like Scala's), and even values and code (code synthesis like magichaskeller, or excel's flashfill).

A bit of self promotion, I touched on this in a recent blog post if you're interested in this sort of stuff: https://performancejs.com/post/hde6a33/JavaScript-in-2017:-Y...

earleybird · on Dec 31, 2017

and you end up in the neighbourhood of Idris which is, I find, a pleasant experience when doing development

qznc · on Dec 31, 2017

Machine checked useable compilers exist (CakeML, CompCert). Useable kernels exists (SeL4). A few people officially work as proof engineers.

The future is here, but not evenly distributed. ;)

haskellandchill · on Dec 31, 2017

Were you able to find a position? You are qualified, but it's very competitive, few jobs and many PhD level candidates with experience.

qznc · on Jan 1, 2018

Did not try. I'm only looking in southern Germany and afaik nobody does that here.

adamnemecek · on Dec 31, 2017

Cryptocurrencies and digital contracts are going to pump an insane amount of money into this.

KirinDave · on Dec 31, 2017

I actually genuinely agree. Trustless computing is a very strong use case for systems where a lot more verification happens up front, because the downsides of messing up are so obvious and immediate.

It's worth noting that Ethereum's "fuel" model as a 1:1 correspondence with how Idris's mostly-total IO monad works. The "infinite fuel" bit can just be turned off and some (substantial and tricky, I admit) changes written to keep top-level state around and bam; timeboxed resume-able IO

seanwilson · on Dec 31, 2017

I'm not so sure to be honest. I've been super surprised at just how few high profile hacks there have been against cryptocurrency protocols so far even when you've got clients implemented in C++. I wish I could explain why as well because you'd think all you'd need is a small flaw to attack the whole network when most nodes are running the same software.

firethief · on Dec 31, 2017

Normal security practices are applicable to the node/wallet software. Smart contracts are another story -- think of the DAO heist. Short programs whose behavior determine the fate of $$$ are exactly the kind of thing where verification excels.

20171026 · on Dec 31, 2017

I was thinking the same thing. I just got an Ethereum certification but I don't want to just develop Solidity contracts at an hourly basis that just get hacked after I leave.

Would the resources in the original post be worth looking into to build secure contracts?

Roritharr · on Dec 31, 2017

I can only ask a question, since i've got no know-how about this.

How would such a proof checker know the intent of the code of not having backdoor/insecurity? How do you define security in a logical sense?

wolfgke · on Dec 31, 2017

> How would such a proof checker know the intent of the code of not having backdoor/insecurity? How do you define security in a logical sense?

As you surely know, there is no such thing as secure/insecure, but a set of properties that are considered in IT security, which might even not be fullfillable all at the same time. The exact words vary depending on the book, but a popular classification in a German standard reference on IT Security is

General protection goals:

- Integrity

- Confidentiality

- Availability

Further goals are

- Authenticity

- Acountability

- Non Repudiation

- Sometimes Anonymity

So you first begin to formulate your protection goals. Next you can begin to forumulate this in a logical sense. Again there can exist quite different axiomatic implementations of a general protection goal.

But since you want to see examples:

A common model in literature for integrity is the Biba model: https://en.wikipedia.org/wiki/Biba_Model

Another common model for integrity is the Clark-Wilson model: https://en.wikipedia.org/wiki/Clark%E2%80%93Wilson_model

The mathematical dual of the Biba Model is the Bell–LaPadula Model for ensuring confidentiality: https://en.wikipedia.org/wiki/Bell%E2%80%93LaPadula_model

A model that ensures that the access rules will be free of conflict of interest (this of cause involves privacy and integrity aspects) is the Chinese Wall Model: https://www.cs.purdue.edu/homes/ninghui/readings/AccessContr...

Roritharr · on Dec 31, 2017

Thanks! This was exactly what I was looking for!

nickpsecurity · on Dec 31, 2017

There's a lot of work in CompSci on this that predates all the blockchain-related stuff. They usually call it "security-typed languages" or "information flow security." Here's a paper with a lot of prior work in that plus an example tool and language:

https://www.cs.cornell.edu/andru/papers/jsac/sm-jsac03.pdf

https://www.cs.cornell.edu/Projects/fabric/

The concept was taken to the binary with Necula's Proof-Carrying Code. Work like Kumar's and Davis' eliminated most of the TCB of the provers, too.

https://www.utdallas.edu/~hamlen/Papers/necula97proofcarryin...

http://www.cse.chalmers.se/~myreen/runtimes.html

In parallel, there's always work in capability-based security applied to programs or OS's. Here's an old and new one using that model.

http://erights.org

https://www.ponylang.org

Athas · on Dec 31, 2017

The main assumption is that the specification (the proposition you want to prove) is (much) simpler than the proof. This means that the user/customer can read the specification, and understand what it means, even if the proof is not comprehensible to them. Of course, the proof must then be machine-checkable.

This makes the clarify of specifications into a market-driven problem. What do you do if you have a formally proven contract, but nobody can understand what the spec is? Well, you probably won't be able to sell it, because nobody will trust you. Instead, you try to simplify the specification, so you have something that people will trust. This is very similar to established practice of mathematics, where getting a proof accepted is really a social process, in which you try to convince your fellow mathematicians that the proof is correct.

etaioinshrdlu · on Dec 31, 2017

On a somewhat related topic... let's have a decentralized market for proofs!

I wrote up what this could look like: https://pastebin.com/raw/ZAys0mYp

Animats · on Dec 30, 2017

In ten years.

That's where I thought we were in 1983 - ten years away.

wolfgke · on Dec 30, 2017

> In ten years.

> That's where I thought we were in 1983 - ten years away.

Obligatory xkcd:

> https://xkcd.com/678/

MikkoFinell · on Dec 31, 2017

> Obligatory xkcd

God I cringe every time someone posts that. It's like the "Big Bang Theory" of internet comics.

AlexCoventry · on Dec 31, 2017

I think ten years is an extremely conservative estimate, since he's giving real-world examples in the talk.

throwaway2048 · on Dec 31, 2017

They had real world examples of formally verified code in 1983 too.

ted_dunning · on Dec 31, 2017

I think that the classic ironical demonstration of the limits of proof systems comes up with the fact that CoqIDE 8.7.1 crashes immediately on OSX if you try to cut or paste anything.

The problem, it turns out that they included a new version of OCaml which was incompatible with the older version of GTK that was being used. Result: crash due to seg fault.

We really need both proofs and pragmatics. Proofs can give us confidence in the fundamentals when used appropriately for the processor, or for consensus algorithms like Raft, or key parts of the OS. Tests and statistical analysis can give us confidence that we haven't overlooked something crucial.

Some systems obviously aren't very appropriate for formal methods (take machine learning, for instance, there can be no formal spec for what a fraud looks like). There statistics and testing will have to suffice because we have clear value on average even if we cannot formally guarantee the system.

Other systems are much more appropriate for formal methods. I should hope that a heart monitor, CPU, voting machine or aircraft flight management system will have substantial amounts of formally proved code (I know that I am dreaming with the voting machines; let's settle for getting it down on paper). I don't want to collect a statistical sampling of whether an airplane works as we iteratively work out the bugs and specs in an agile fashion.

mcguire · on Dec 31, 2017

Good talk!

So, one question: Adam gives one example of a proof where the initial automation fails because it does not consider a theorem that was previously proved. He also gives an example of some complex code where the proof is too tedious to do without lots of automation. What happens when your complex code changes in such a way that you need some extra lemmas but don't know what they are?

yaantc · on Dec 31, 2017

It's manageable in practice I believe, and Adam quickly shows a feature related to this in the demo.

First, if some new features are added with associated properties, then some theorems/lemmas related to these properties will have to be added and proven. I don't think that part is problematic, because it's a "local" decision. This work is focused on the change.

The possible problems come from the impact of these changes to existing results. If you need to go and make other changes all over the place it wouldn't be scalable. But then automation and coq hint database can help. In the demo example you referred to Adam adds the lemma as a rewrite hint, and then it's picked by coq automation elsewhere. That's how local changes can automatically be leveraged globally by proof automation in existing proofs, with no changes (ok, not always. But that limits the impact).

skybrian · on Dec 31, 2017

What progress has been made recently in making writing proofs easier? Are there any research projects targeting mainstream programmers?

slack3r · on Dec 31, 2017

Check out Ur/Web (http://www.impredicative.com/ur/).

It has been used by a "mainstream programmer" to build a RSS feed reader.

agentultra · on Dec 31, 2017

Plenty of advances in logic and category theory have enabled dependently typed languages to be both theorem prover and practical implementation language.

See Lean.

vladf · on Dec 31, 2017

I'm interested but "Lean programming languages" and its varieties are difficult to Google. Would you mind posting a complete reference?

cf · on Dec 31, 2017

I think this is a good application paper for Lean showing how you can use it to develop machine learning algorithms.

https://arxiv.org/abs/1706.08605

agentultra · on Dec 31, 2017

Sorry I forgot the link!

https://leanprover.github.io/

crb002 · on Dec 31, 2017

We already have them. It's called compiler type checks. Albeit having an SMT solver in your unit tests is nice too.

UncleMeat · on Dec 31, 2017

Type checking is (usually) intraprocedural and flow insensitive. That's what makes it easy. The other stuff is harder.