ZZ is a modern formally provable dialect of C

MaxBarraclough · on Feb 5, 2020

I find the basic idea of this project to be very compelling - I was thinking aloud on HN recently and arrived at roughly the idea this project is implementing. [0]

With that said, I really dislike the way they're describing their project.

When I read safe dialect of C, I first assumed they meant they had developed a safe subset of C, or perhaps a very similar language, like OpenCL C [1]. Instead, they developed a new language which isn't C at all. Nothing wrong with that, but if I don't instantly recognise the syntax as C, I wouldn't call it a C dialect.

They also put formally provable dialect of C. Their language compiles to C code which is guaranteed to be free from undefined behaviour. This is not the same thing as a language where hard guarantees can be made about program behaviour, such as SPARK [2] or Dafny [3].

If the authors are reading this, I urge you to improve your project summary. Your language does not allow me to prove program correctness, instead it protects me from C's undefined behaviour. That's still a great idea! Please make this clear!

[0] https://news.ycombinator.com/item?id=22102658

[1] https://en.wikipedia.org/wiki/OpenCL#OpenCL_C_language

[2] https://en.wikipedia.org/wiki/SPARK_(programming_language)

[3] https://en.wikipedia.org/wiki/Dafny

a3p · on Feb 5, 2020

Author here. fully agree that the description is vague and doesn't really tell you where it stands versus something like F* , SPARK, etc.

That's partially because frankly i don't know yet. We'll have to discover slowly how far the first class proof expressions can be pushed.

the word "safe" here is actually gone now, because it indeed says something different. thanks for the feedback.

MaxBarraclough · on Feb 5, 2020

My pleasure. I've bookmarked the project - I intend to keep an eye on its developments.

Am I right in thinking that, as it stands today, ZZ can be used as a rock-solid protection against C's undefined behaviour (in my own code at least), and as a protection against accidentally writing non-portable C code? That's a great starting point, if so.

MaxBarraclough · on Feb 5, 2020

(A now-deleted comment said it's disappointing that ZZ doesn't support whole-program correctness-proofs. I'll hang my reply here rather than delete it.)

I should point out it's possible I completely misread the project summary. Perhaps it does support that after all. [0]

I still like the idea though, and might try it out at some point. There's a lot of value in guaranteeing no undefined-behaviour in my code. Imagine how much more secure our systems would be if they had these guarantees. There's also a lot of value in transpiling to portable, standard-compliant C code.

If you want a language that supports full proof-of-correctness, we do already have SPARK and Dafny, but they're no walk in the park. I'm an optimist about this though - I think they'll continue to slowly get more approachable.

A brief aside: SPARK strikes me as already being more approachable than the Event-B formal specification language, which starts with the formal specification in math (set theory), and ends in imperative code (after 'refining' the model into an implementation). That's despite that Event-B is a more approachable derivative of B-Method, itself a more approachable alternative to Z Notation. Annoyingly I couldn't find a great one-page summary to give the flavour of Event-B (there are a lot of beautifully crafted PDFs, as it's from academia), the best I can do is [1].

[0] https://news.ycombinator.com/item?id=22249135

[1] https://www3.hhu.de/stups/handbook/rodin/current/html/tut_bu...

pjmlp · on Feb 5, 2020

I think NVidia's adoption of SPARK for security critical firmware is victory, specially since they also evaluated Frama-C and Rust as part of their selection process.

Rust probably would have made it in the future, but it is still not mature enough to the domains NVidia intends to use SPARK on.

https://blogs.nvidia.com/blog/2019/02/05/adacore-secure-auto...

MaxBarraclough · on Feb 8, 2020

Rust isn't aiming for formal verification, is it?

steveklabnik · on Feb 8, 2020

It’s something we’d like and are vaguely pursuing but not at the expense of other things.

MaxBarraclough · on Feb 9, 2020

Neat, thanks.

steveklabnik · on Feb 9, 2020

No problem. “Rust Belt” and “Miri” are the terms to search for if you want to learn more.

pickdenis · on Feb 5, 2020

> Your language does not allow me to prove program correctness

Isn't that the whole point of the SMT solver?

What about this example (that doesn't compile)?

  fn bla(int a) -> int
      model return == 2 * a
  {
      return a * a;
  }

Isn't that verifying program correctness? A sibling of this comment claims that "the only thing proven is memory access validity" but, again, this example takes that down.

MaxBarraclough · on Feb 5, 2020

I agree that it is. Looking at the page, I can't see how far this goes.

Was my earlier comment completely wrong? Does ZZ allow the programmer to express a formal specification, e.g. to verify a sort function? If so, their examples are selling their language very short.

a3p · on Feb 5, 2020

Prove of algorithms is possible as long as there's a known method of doing so in SMT. That means in practice, if someone has written a paper for formally proving an algorithm in SMT, you can mostly copy paste the proof.

zz is developed in parallel with a large project using zz and new syntax sugar features will surface slowly as they become practically useful.

That being said, it will never replace external formal verification with something like coq. They serve a different purpose.

loeg · on Feb 5, 2020

What's the large project using ZZ? Or is it behind closed doors? This appears to be an entirely different ZZ programming language: https://scratch.mit.edu/discuss/topic/80752/

a3p · on Feb 5, 2020

it is https://devguard.io/ which is being rewritten from rust to ZZ in this branch https://github.com/devguardio/carrier/tree/zz

twsted · on Feb 6, 2020

Very interesting.

Can you tell us a little more about the reasons of the rewrite from rust?

loeg · on Feb 7, 2020

I'm not involved, but one obvious answer is: broader portability than Rust (this is explicitly called out in the ZZ article). Clearly devguard is targetting a broader set of devices than the limited set Rust currently targets (x86; arm, mips, riscv in tier 2, with various caveats for bare metal targets).

loeg · on Feb 5, 2020

Thanks!

a1369209993 · on Feb 6, 2020

> > program correctness

How would this language (or any other) go about verifying the correctness of:

  fn subtract(int a,int b) -> int
      model return == a + b
    {
    return a + b;
    }

MaxBarraclough · on Feb 6, 2020

This defect is obvious to anyone reviewing the code, and it will be even more obvious when the function is used by another function with its own contracts.

In practice the issue with formal modelling is that it's very hard to scale up. Modelling the correct behaviour of traffic-lights is practical, and allows us to build completely bug-free traffic-light software, but these methods don't work so well for developing a word processor. Competent practitioners don't tend to mess up their formal models though. I've not heard an example of that causing defective software, but it's a valid question to ask.

Even without going the whole way to whole-program correctness guarantees, this kind of approach can be used to provide solid guarantees against bugs like buffer-overflows, without using runtime checks. This is something the SPARK language has supported for a long time.

lwb · on Feb 5, 2020

Seems like they are using a narrow definition of "mathematically provable" -- they only thing that appears to be proven is that "all memory access is mathematically proven to be defined". Which just seems like a fancy way of achieving the same thing as a type checker? Either way, cool project, but is there anything else that ZZ can prove?

augusto-moura · on Feb 5, 2020

In the README there's a example of proving a simple state machine open > read > close [1]

Just read the docs™

[1] https://github.com/aep/zz#theory

MaxBarraclough · on Feb 5, 2020

So it has typestates. (I recommend anyone [0] and its HN thread [1] for detail.) I think that's a great language feature for helping to avoid bugs, especially the kind that often happen in C, but I don't imagine it's anywhere close to proving whole-program correctness, is it?

For the curious, here's an official example in the SPARK language, that gives a proven-correct sort function. (A complete proof of correct program behaviour, not just absence of undefined behaviour). As you can see, proving correctness is a challenge that permeates every inch of the code. [2][3]

Microsoft's Dafny language is similar. [4]

[0] https://yoric.github.io/post/rust-typestate/

[1] https://news.ycombinator.com/item?id=21413174

[2] https://github.com/AdaCore/spark2014/blob/76e08d279c/docs/ug...

[3] https://github.com/AdaCore/spark2014/blob/76e08d279c/docs/ug...

[4] https://en.wikipedia.org/wiki/Dafny#Imperative_features

augusto-moura · on Feb 5, 2020

I imagine that whole-program correctness can be represented, but is not enforced in the case of ZZ. Developers are not obligated to declare typesets for their own code or can rely on other libraries being fully typed

MaxBarraclough · on Feb 5, 2020

> not enforced in the case of ZZ

I don't think it's really even possible for a language to force the programmer to give a proper formal specification.

We may want the formal spec to grant some leeway, i.e. not to specify the exact output/behaviour, but instead to express certain important constraints. For instance, we may not care which traffic-light turns green first, so long as the system preserves the important properties about safety and liveness.

If we wish to support such flexibility, we can't stop the programmer from expressing all output states are valid.

If we don't wish to support such flexibility, what we're really doing is comparing against a reference implementation.

zozbot234 · on Feb 5, 2020

> I don't think it's really even possible for a language to force the programmer to give a proper formal specification.

The language, probably not. The standard library can certainly require detailed-enough formal preconditions to make its code defect-free, meaning that any remaining bugs can then be directly traced to the programmer's code.

MaxBarraclough · on Feb 5, 2020

You mean that, for instance, a standard-library generic sort function, might require that your comparator function behave correctly when the arguments are reversed?

That's an interesting idea. Like Java interfaces, but with formal contracts.

I doubt that putting these contracts into the standard-library would give you good proof-coverage though.

a3p · on Feb 5, 2020

that is correct. however, zz does enforce api contracts which can be arbitrary expressions within the QF_UFVB theory.

for example check out the err::checked() theory which enforces that a function call must be followed by err::check.

similarly, i expect a community will come up with other standard contracts such as must_not_copy() for secret values that may not be copied.

lwb · on Feb 5, 2020

Ah ok, not sure how I missed that. That still seems to be within the realm of a type system, no? Or would you consider type systems a subset of mathematically provable systems?

augusto-moura · on Feb 5, 2020

Well Type Theory is actually one of the definitions of mathematics and computation[1]. Wikipedia gives a basic definition of it [2][3]

[1] https://en.wikipedia.org/wiki/Typed_lambda_calculus

[2] https://en.wikipedia.org/wiki/Type_theory

[3] https://en.wikipedia.org/wiki/Foundations_of_mathematics

ludamad · on Feb 5, 2020

Dependent type systems certainly are!

loeg · on Feb 5, 2020

It seems like the pre- and post-conditions could be used to prove program correctness?

MaxBarraclough · on Feb 6, 2020

Yes, see a3p's comments. I was mistaken.

scoutt · on Feb 5, 2020

> where we still program C out of desperation

I agree it's the standard and the only thing that actually works (author's words), but it's still a pleasure for me to write and have to deal with C (for embedded). I'd be desperate if I have to be forced to deal with huge different paradigms because pointer problems or insert-your-C-rant-here. C is not going to be replaced on embedded any moment soon.

Cyph0n · on Feb 5, 2020

I’m a C developer working in the SP routing space.

Personally, I do not enjoy debugging memory corruption issues. I especially would not enjoy doing this under customer pressure...

Of course, in some spaces, C is basically the only option. But when you have options, and you’re working with a large enough codebase, C is a terrible choice.

0xdead · on Feb 6, 2020

I have an anecdote to share. I work in embedded space for a living. A web developer (which is quite funny) from another team somehow convinced our director to use Rust for a critical process that involved a lot of concurrent processing. Ok, I said, and began developing that process in Rust.

I estimated that it took me about 10x the time to implement something than it would have taken if I did that in C. The reason for that could definitely be because I'm not a Rust expert at all. That's fine though because I will happily invest more time in programming if it saves me hours of debugging later. Plus, the claim that "if it compiles, it works" was enough for me to fight the compiler for hours if it gives me no trouble during runtime.

Fast forward a week, we have that binary deployed to a non-critical set of our field devices. A few days later, we get reports of that particular binary crashing. I logged into one of those devices and fetched the logs. The binary was reporting a panic on an MPSC channel's tx send. There was nothing useful in that error message that would point me to the root cause. I had no tools to attach to a running process because I'm on a device with an ancient kernel.

To fix the situtation "temporarily", because the customer is getting infuriated, we redeploy our old C binary. It has been there since. I basically now say "fuck off" to anyone who tells me to use Rust because it doesn't work if it compiles.

Blaisorblade0 · on Feb 7, 2020

"If it compiles it works" is just not true, just a very bad description for a true phenomenon. If there weren’t people repeating that seriously, you’d be attacking a strawman. People do say that, but that’s on those people, not Rust.

nicoburns · on Feb 5, 2020

> but it's still a pleasure for me to write and have to deal with C

But wouldn't you love a C with things like first class support for arrays and support for namespaces / modules, etc.

scoutt · on Feb 5, 2020

With years you get to put together a swiss-army toolset of libraries to deal with all that (like arrays, strings, etc.).

You know how they work, how far you can push them, the overhead, and all.

Do I really know how much cycles/stack it takes to do std::sort(a.begin(), a.end()); in that specific platform? No, so I cannot trust it.

I know it's reinventing the wheel, but I am sure that there are known simple libraries out there one can use. I use mine.

What I would like is a better precompiler, like assigning dynamic values to constants. But I won't change the language for that.

hurrrrrrrr · on Feb 5, 2020

>Do I really know how much cycles/stack it takes to do std::sort(a.begin(), a.end()); in that specific platform? No, so I cannot trust it.

I also don't know how many cycles it takes for my implementation of quicksort apart from checking the output of a specific compiler and counting instructions. C is not, was not and will never be a portable assembler.

cwzwarich · on Feb 5, 2020

> I also don't know how many cycles it takes for my implementation of quicksort apart from checking the output of a specific compiler and counting instructions.

On any modern out-of-order CPU, that doesn't get you close to determining the dynamic performance. Even with full knowledge of private microarchitectural details, you'd still have a hard time due to branch prediction.

loeg · on Feb 5, 2020

The embedded/micro space is not out-of-order, broadly. Memory access latency may be variable, or not, but I would charitably assume OP knows their subject material and is either using cycles as a metaphor, or actually works with tiny hardware that has predictable memory access latency.

whatshisface · on Feb 5, 2020

On most (small, I'm not talking about mini linux) embedded systems, instruction counting will tell you how long something takes to run. In fact, the compilers available are often so primitive that operation counting in the source code can sometimes tell you how long something will take.

archi42 · on Feb 5, 2020

Depends a lot on the compiler and target arch. You'll miss out on a lot of stack accesses, or add too many. You don't get around looking at the final executable if you want good results. And for more complex targets, in the end you need to know what the pipeline does and how the caches behave if you want a good bound on the cycle count. Of course assuming you're on anything more complex than an atmega, for which op counting might be enough. I work in the domain; lots of people do measurements, which only give a ballpark but are bad since you might miss the worst case (which is important for safety critical systems, where that latency spike in the wrong moment might be fatal). Pure op counting is bad since the results grossly overestimate (eg you always need to assume cache misses if you don't know the cache state, or pipeline stall, or DRAM, or...). Look at the complexity of PowerPC, this should give you a rough idea what we're usually dealing with (and yeah, I'm talking embedded here).

To me that "sometimes" feels like "I can wrestle some bears with my bare hands, e.g.a Teddy bear" ;-)

AstralStorm · on Feb 5, 2020

On most embedded ARM it won't (out of order execution, caches, alignment), and they don't have to run Linux, likewise on common MIPS implementations.

Simple predictability ends at 16-bit CPUs generally, and even those can be tricky if it's say m68k.

loeg · on Feb 5, 2020

"Most" embedded ARM? Cortex-A8 and smaller do not have OoO execution. Cortex-A9 is a 32-bit up-to-quad core CPU with clock of 800MHz-2GHz and 128k-8MB of cache. That's pretty big. I guess a lot of this is subject to opinion, but I don't think smartphones with GBs of RAM when I think of embedded systems.

mlyle · on Feb 5, 2020

Even e.g. A53 is in-order, and it's a 64 bit, pretty quick CPU (as seen in Raspberry Pi 2/3).

mlyle · on Feb 5, 2020

When I hear "embedded ARM", I hear "Cortex M".

All of Cortex M is in-order and only M7-- still somewhat exotic-- has a real cache (silicon vendors often do some modest magic to conceal flash wait states, though).

Alignment requirements are modest and consequences are predictable. Etc.

About the most complicated performance management thing you get is the analysis of fighting over the memory with your DMA engine. And even that you can ignore if you're using a tightly coupled memory...

jdc · on Feb 5, 2020

Doesn't C model the PDP-11 more so than ARM or x86?

gonzus · on Feb 5, 2020

I would. I am keeping an eye on the zig programming language, exactly because of these reasons: https://ziglang.org/

baybal2 · on Feb 5, 2020

There is a language called D

yvdriess · on Feb 5, 2020

D -betterC

dmos62 · on Feb 5, 2020

> C is not going to be replaced on embedded any moment soon.

Why? Doesn't for example Rust without stdlib already cover the use cases? Note I'm not experienced in embedded.

MrBuddyCasino · on Feb 5, 2020

Its slowly getting there, but no, right now there is a very limited number of target platforms that are supported, and the ecosystem is still very small and immature. You have to remember that there is a huge number of microcontrollers out there, with exotic architectures you have never heard of, and that LLVM definitely doesn't support. I'm not sure embedded Rust would even meaningfully exist without the work of Jorge Aparicio.

uxp100 · on Feb 5, 2020

Yeah. Even on the best supported platforms it is very immature. If you want to use embedded rust you kinda have to let language drive your choice of platform, which isn’t ideal.

I wouldn’t use it at work yet, but it’s coming along.

michaelt · on Feb 5, 2020

C's strength is the ecosystem:

* Your microprocessor has a C compiler and standard library, as does every processor you might ever switch to. All the hardware documentation that isn't tables in a PDF will be in C.

* Your target's static analysis tools and interactive debuggers will all support C.

* Every RTOS and embedded library/filesystem/whatever will support (and likely be written in) C.

* All experienced embedded developers are fluent in C.

* Nobody ever got fired for choosing C for an embedded project.

The disadvantages of C are many and well known.

AnimalMuppet · on Feb 5, 2020

> The disadvantages of C are many and well known.

Which, in a way, is an advantage. I know (much of) what to look out for. There are tools that can help me with some of those issues. There are techniques that avoid some of them, and there are people who are expert in many of them.

But if I pick some other language, it won't have those problems. It will have other problems. (There is no language that does not have problems.) I won't know what to avoid doing. There may not be tooling to help with them. The techniques for avoiding them may not be widely known. I may not be able to find people who know how to handle them.

To me, "well known problems" may be better than "not well known problems". More predictable, at least. The "not well known" problems have to be significantly better to be worth it. They probably have to be proven significantly better. That means either that someone else has to prove them better, or else I have to have a project that doesn't matter much that I can use as a testbed.

scoutt · on Feb 5, 2020

My personal opinion, is basically for 3 reasons:

1) There is no need to. C already has all we need to build our systems. The rest is seen as overhead/over-complication.

Regarding Rust, I try to keep up to date about its progress. Unfortunately most of the diseases that Rust cures are not much of a trouble in embedded. My last UB, memory leak of loose pointer happened years ago. It will happen again and when it happens, I'll debug it. That's it. I'm not afraid of UB or dealing with pointers, even if there are 10K Rust users trying to FUD me. I know what I'm doing. Every embedded/kernel/driver developer know what they are doing. When shit happens, that's it, no big deal. You plug your debugger and solve the issue. It's not a nightmare that chases us in the middle of the day.

FUD alone is not enough for switching. So, why should I start thinking that a variable cannot change, because it's a constant (wasn't it a variable?), unless it can mutate, so it's a constant variable that can change because now is mutable?

Or constrain myself into borrow-checker torture for a thing I can do in a couple of instructions?

WHY?

2) This is not about a language problem, is about solving a programmer problem with language. If C = math, then you cannot do math simpler/better because today's mathematicians are sloppier. Or because bosses pressures people to deliver crappy products.

I'm not a genius. I'm far from it, and if any seasoned C programmer challenges me I'll probably run away. But embedded/kernel/driver development is harsh, so if a developer thinks that he/she cannot make it because language, then it's mostly about searching for an excuse. Time to change jobs.

The key is to think that a lot of people did (and does) a lot with so much less, for 40 years now. It's not a language problem. People have to learn to deal with it.

I was there too 20 years ago, when every C++ developer was afraid that they would lose their jobs because C#. It never happened. C++ is still one of the most used languages.

3) In my case, there are official libraries from manufacturers you have to use. Sometimes receiving customer support depends on if and how you use those libraries. All those libraries are in C. All the support is in C. All the examples are in C.

Yes, I know there is that engineer that has a Github repo with a library that works fine with that STM32 for that specific language, that now is getting support for embedded so in 5 years we could maybe put something in production. But, not for now.

Sorry for the length. Edited some typos.

nickpsecurity · on Feb 5, 2020

"Every embedded/kernel/driver developer know what they are doing"

If that was true, then we wouldn't see more memory bugs found every time academics test a new analyzer or testing tool on open code programmed in C or C++. Microsoft said 70% of the problems they saw were memory safety. Linux has a ton of them. Even OpenBSD has many security fixes for memory safety. Your claim is mythical in the general case even if some individuals working on small codebases can pull it off.

https://www.zdnet.com/article/microsoft-70-percent-of-all-se...

https://events19.linuxfoundation.org/wp-content/uploads/2017...

https://www.collicutt.co.uk/notebook/openbsd_bugs.html

dx87 · on Feb 5, 2020

> Every embedded/kernel/driver developer know what they are doing.

Yet we still see security issues in all of those. I'm not saying that everything should be re-written in a different language, but C developers saying "I'm a good developer, all those safety mechanisms would hold me back" doesn't hold water considering all the security vulnerabilities we see that would have been prevented if they had used a language with better safeguards.

scoutt · on Feb 5, 2020

> Yet we still see security issues in all of those

That's the typical excuse. That's the FUD I mentioned about. Bugs will keep existing and so security issues, no matter the language you use.

hurrrrrrrr · on Feb 5, 2020

But if we get rid of a whole class of bugs and vulnerabilities, the number of bugs will go down, no?

scoutt · on Feb 5, 2020

This is a common way of thinking about it, but nobody really knows. It's purely a conjecture. There is no data to back it up, and it just appeals to common sense.

Should everybody drop C/C++/whatever and rush into the Rust train because Rust people has conjecture?

I ask the opposite question. What would happen if the only programming language left is C? Wouldn't we become better programmers and raise the bar so high that the bug count drops to 0?

jcranmer · on Feb 5, 2020

> What would happen if the only programming language left is C? Wouldn't we become better programmers and raise the bar so high that the bug count drops to 0?

Given that C was the dominant programming language for UNIX applications for over a decade, I think we can look to history for an answer to this question. And I believe the record shows that the answer is "no."

whatshisface · on Feb 5, 2020

>Should everybody drop C/C++/whatever and rush into the Rust train because Rust people has conjecture?

The way these things usually work in practice, the evidence that a new paradigm improves things usually builds up slowly. There will never be a point at which someone proves mathematically that C is obsolete. Instead, the gentle advantages of other options will get stronger and stronger, and the effectiveness of C programmers will slowly erode compared to their competition. At first only the people who are really interested in technology will switch, but eventually only the curmudgeons will be left, clinging to an ineffective technology and using bad justifications to convince themselves they aren't handicapped. Is Rust the thing that everyone except the curmudgeons will eventually switch to? Who knows, but if you don't want to end up behind the industry then it might pay to try it out in production to see for yourself. If you don't make room for research and its attendant risks you will inevitably fall behind.

scoutt · on Feb 5, 2020

I'm sorry you see things in terms of competition and curmudgeons. But I see your point, it's just another type of FUD: don't stay behind and adopt Rust because you'll be a curmudgeon.

whatshisface · on Feb 5, 2020

Not exactly, I'm saying that if you stay behind and don't adopt something, where that something is whatever the industry switches to after C, you will eventually be left behind. Of course it is also possible (and likely for many people) to die or retire before that happens. It's not like C is going away any time soon.

mypalmike · on Feb 5, 2020

> What would happen if the only programming language left is C? Wouldn't we become better programmers and raise the bar so high that the bug count drops to 0?

Is this a rhetorical question? Clearly the answer is no.

Doingmything123 · on Feb 5, 2020

Not a Rust programmer.

It seems like there is a big difference between a "mathematical certainty that certain bugs will not occur" if you play by the rules and just be a better programmer so that you don't write errors. Not that you won't have bugs in Rust but it seems like we should move towards having our tooling do more of the heavy lifting in ensuring correctness. I don't believe you will ever have the bug count drop to zero. I do believe in mathematical certainties though.

KarlKemp · on Feb 6, 2020

Making cars saver has dramatically reduced death and injury, and not caused the number of accident to spike. So.. no?

pjmlp · on Feb 5, 2020

We need regulation to make companies liable for CVEs, then this will stop being FUD, easy.

jcranmer · on Feb 5, 2020

It says something of our profession that we have all of our products come with a legal document that effectively says it's not our fault if the product doesn't do what it's supposed to do.

perl4ever · on Feb 6, 2020

We should just outlaw those damn statements (and binding arbitration for consumers) and see how the market naturally evolves. Don't force standards on programmers, don't regulate in some novel way, just get rid of the stupid disclaimers and let the old-fashioned legal framework return to primacy.

baq · on Feb 5, 2020

A language is a tool and if you're saying the problem is with people using the tool wrong, it's only half true. New tools are being invented to handle problems both old and new and discrediting them with what amounts to 'get off my lawn' is counter-productive. There are problems that are impractical to solve without tools designed for them and safe pointers are a great example. Yes, you'll hook up your debugger and yes, you'll fix the problem. I once fixed a dangling pointer problem after two 12h debugging sessions and while it felt great, I'd rather not do that again. I know there are people hunting these for months, so you could say I haven't seen anything yet, and I'll agree - but it's an argument in favor of better tools, not the other way around.

pjmlp · on Feb 5, 2020

The issue is social, not technical.

There are several alternatives to C, when a team is open minded.

pjc50 · on Feb 5, 2020

I think most users are going to wait until CMSIS or their local equivalent is available: https://github.com/ARM-software/CMSIS_5

Besides, it's a big retraining effort.

baybal2 · on Feb 5, 2020

Rust is not mature, and people don't like Mozilla now

lallysingh · on Feb 5, 2020

Of course it isn't. It's portable (-ish) assembler. The only thing replacing C will be even more in the C spirit. Adding more checking in system-programming-friendly ways is the only way I can think of for improving the situation.

wheels · on Feb 5, 2020

I know people often refer to C as portable assembly, but really it's not. That sounds like the sort of thing gets repeated by people who neither know C nor assembly. C is incomparably higher level than assembly and lacks the quirkiness modern instruction sets have which are focused on specific odd things modern processors do well.

Psyladine · on Feb 5, 2020

Bit like chicken & egg though, with processors chasing better implementations of C rather than better implementations of assembly:

https://queue.acm.org/detail.cfm?id=3212479

lallysingh · on Feb 5, 2020

I refer you to MASM macros in the late 1990s.

If you want the same runtime environment of asm without the tediousness of asm development, and a reasonable optimizer to save you more tediousness, you end up at C. I count that as close enough.

pjmlp · on Feb 5, 2020

While C isn't going to replaced on embedded any moment soon, there are alternatives out there, when the team is open minded.

C++, Pascal, Basic, Java, Ada, Oberon all have mature toolchains available from companies that have been in business for the last couple of decades.

As per several C++ retrospective talks, the focus on C is more social issue than anything else.

colonwqbang · on Feb 5, 2020

> As per several C++ retrospective talks, the focus on C is more social issue than anything else.

The social issue may be that many embedded programmers prefer C to C++, for what they believe to be legitimate reasons.

pjmlp · on Feb 5, 2020

Most of which seem more like cargo cult than anything else.

non-entity · on Feb 5, 2020

Does Oberon have any major commerical usage? I read up on it a while back and thought it was a neat experiment

nickpsecurity · on Feb 5, 2020

It doesn't that I know of. Relevant to the C space, there is an embedded platform for it:

https://astrobe.com/Oberon.htm

pjmlp · on Feb 5, 2020

One should note that this company is in business for around 20 years now, so they get by, even if they aren't SV like rich.

oblio · on Feb 5, 2020

I hope that with IoT we get some commoditization and standardization in this space. Though ARM doesn't give me much hope. I don't get why there couldn't be 1-2-3-5 standard architectures and 10-20-50-100 hardware configurations that cover the full spectrum of embedded configurations.

We've had 32 bit x86 CPUs since 1985, surely we could produce a 100Mhz one for cheap enough that nobody would need to use 8 bit ones in 2020? I know that I'm just daydreaming and companies are stingy...

pjc50 · on Feb 5, 2020

People have more or less converged on ARM for large things and 8051 for tiny things. I suspect the remainder are hiding in culturally isolated niches; I don't really know why people are still targeting PIC16, for example.

There's also the DSP subset where people need to write inner loops in highly platform-specific assembler. e.g. my employer Cirrus has our own "Coyote" architecture for which we have a C compiler: https://statics.cirrus.com/pubs/proDatasheet/CS4970x4_F1.pdf

"Cheap" isn't just about BOM price, the power consumption matters too.

pjmlp · on Feb 5, 2020

ESP32 is a very good example.

We already had plenty of high level languages running on CP/M, MS-DOS, Amiga, Atari, ... class hardware.

Those languages would already fly on ESP32.

scoutt · on Feb 5, 2020

The answer is size, power consumption and most of all, price.

I know I can do a wrist watch with a Raspberry Pi, but will it be profitable/convenient?

oblio · on Feb 5, 2020

Well, let's say I take the Intel P54 (https://en.wikipedia.org/wiki/List_of_Intel_microprocessors#...) as an example.

So basically an Intel Pentium from 1994, so from 26 (!) years ago. Size: 90 mm2, maximum power consumption: 10W, price at launch: $699.

Buuuut: production process 600nm.

Today every Intel CPU uses a 14nm production process, but let's go for a cheaper option and "use" the 22nm.

So the gate size is ~30x smaller today. And I know that the relation between the gate size and the die size is not linear (2x smaller gate size usually leads to a die size which is smaller by more than 2x), but let's go with the 30x, to keep things simple.

For the power I have no idea, but I'd assume that the power consumption would go down at least 30x.

Price: hard to say, but considering how hardware evolves, definitely more than 30x cheaper :-)

So a Intel Pentium with modern technology could look something like this:

Size: 3mm2 (probably less), maximum power consumption: 0.3W (probably way less), price at launch: $30 (I'd argue that it would be closer to 0.3$ :-) ).

I could be way off in the weeds here...

My guess is that it's just a matter of existing tooling, expertise, human resistance to change, companies wanting to a) not risk anything and b) to nickel and dime everything.

I'm arguing for an x86 because that way you could throw any kind of modern tooling at it. ARM or MIPS would probably be better candidates.

scoutt · on Feb 5, 2020

Then you need external RAM, storage, a video card or companion chip for a display, a chip for holding the BIOS and a power supply (if not many) for all that. x86 as is, is not good enough for my wrist watch.

You can use a small ARM with integrated flash, RAM and LCD controller. Does it need MMU for Linux? There is family of products for that. Or use another family for bare-metal systems.

You can see how the thing starts to complicate regarding architectures/platforms.

oblio · on Feb 5, 2020

Well, I just used the Pentium example because it was easy to source the data :-)

But my main point was to dump super old and limited architectures when these days we can economically use modern architectures we use everywhere else (x86/ARM/MIPS, whatever). If we wanted to, we could literally hoist designs from 20+ years ago and use newer production technologies to make them embeddable.

Using mainstream tech stacks is very empowering. Updated compilers everywhere, many programming languages and stacks with huge communities everywhere, modern debugging tools at low/no cost, fast debug cycles, etc.

But there's little interest in this because which self-respecting hardware maker would commoditize its own products? :-D

AstralStorm · on Feb 5, 2020

Hoisting old designs is unnecessary when you have x86 processors like Intel Atom, which are exactly that, and all the versions of ARM Cortex-R3 and R0.

The problem is not the processor, it's the lack of MMU and/or special DMA or interrupt engines and special GPIO. No kernel is potentially ported to such custom infrastructure, and if you have a very tiny flash, few of the ones which could will fit. (Say, target L4, porting Fiasco or OKL4.)

oblio · on Feb 5, 2020

Ok, but that doesn't mean that such designs couldn't be made and then standardized, right?

boomlinde · on Feb 5, 2020

You are very nearly talking about the Intel Quark. They weren't very popular and Intel have since discontinued the line.

oblio · on Feb 6, 2020

Oh, I must have missed that. They seem a bit too expensive for what I'm saying, I imagine these things have to cost cents to be worthwhile (50-70 cents or so).

ksec · on Feb 6, 2020

Every single one of your projection is still way off compare to lowest cost embedded system. $0.3 is expensive when people are really, literally counting pennies.

Accujack · on Feb 5, 2020

Or anywhere else.

Lots of people don't enjoy using C for various reasons, but generally programmers don't get to choose what language they work in, they get paid to work in whatever language is needed.

Whether programmers like it or not for non personal projects doesn't actually matter too much. If you don't like using C, then don't take jobs programming in C. Just don't whine when that makes it more difficult to get a job.

AstralStorm · on Feb 5, 2020

It does matter if the reasons why they do not like it are good. Decision makers should listen to such input instead of deciding on programming language by fiat or convention. Including hardware choices. Of course this should be weighted by market availability and cost of both hardware and programmers.

Typically the reason C is used because there's no other toolchain available for said embedded device, except assembler, and they do not wish to invest in building or extending one. Those devices often do not have a kernel with POSIX-like syscalls to adapt an existing full blown libc, nor provide one.

Some embedded chipset libraries are also written in C sprinkled with copious assembly, so the language that's used with these has to be very easily interoperable.

Accujack · on Feb 5, 2020

>It does matter if the reasons why they do not like it are good. Decision makers should listen to such input instead of deciding on programming language by fiat or convention.

This is rarely the case. Almost all programming tasks aren't free choices out of thin air except for start-ups or small software companies. Far more often than not you have to stick with the existing language for compatibility, or to minimize maintenance costs, or in short because someone else already decided on the language before you arrived.

>Including hardware choices. Of course this should be weighted by market availability and cost of both hardware and programmers.

In those rare cases when you get an opportunity to make such a choice, sure. However, those aren't the only criteria either.

loeg · on Feb 5, 2020

This is entirely orthogonal to the fine article and basically just a statement of personal preference. Having preferences is fine, and sure, if you like C you can take umbrage at some of the more flowery characterizations in the README, but it's not really the point of the language. There are lots of neat ideas in TFA that would be interesting to discuss; no one is interested in litigating the fate of C on embedded for the Nth time.

quotemstr · on Feb 5, 2020

> I agree it's the standard and the only thing that actually works (author's words),

C++ also "actually works" and, thanks to its much greater ability to support fluent abstractions, ends up being safer in practice than raw C code. You can use C++ for low-level system components. Did you know Android's libc is actually written in C++?

pjmlp · on Feb 5, 2020

Windows new one as well.

pjc50 · on Feb 5, 2020

This is a great achievement in making formal methods accessible in pragmatic terms - something that actually works and can be used by normal humans.

Would be nice to have an actual microcontroller example.

> The standard library is fully stack based and heap allocation is strongly discouraged

:/ - I can see why this is done, as it's hard, so banning it to make the problem tractable works. But it's also quite inconvenient. On the other hand, "MISRA C:2004, 20.4 - Dynamic heap memory allocation shall not be used."

Nursie · on Feb 5, 2020

There's the third sort of allocation that's often used in embedded systems - preallocated buffers, which are neither stack nor heap but set up by the .bss section of the executable.

These allow you to have very predictable memory usage.

a3p · on Feb 5, 2020

the standard library does not allocate any heap memory, but heap modelling will be added to the prover eventually.

note that there are convenience tools to deal with heap-free targets (microcontrollers) such as tail variants (statically enforced flexible arrays):

https://github.com/aep/zz#metaprogramming-or-templates-tail-...

pjmlp · on Feb 5, 2020

Unless it has changed, SPARK also forbids dynamic allocation.

yannickmoy · on Feb 6, 2020

It has changed: https://blog.adacore.com/pointer-based-data-structures-in-sp...

loeg · on Feb 5, 2020

Pre- and post-condition annotations (`where` and `model`, respectively) are great. It's nice to see those in a new language.

Syntactically, this seems more like a Rust dialect than a C dialect. The primary relation to C seems to be portability (transpiler) and integration (ABI). This is true of most C-transpiled languages, though.

It's certainly cute, and potentially useful if your program is small enough to be solved by a SAT solver. Ideally relatively quickly, or those compile times will be poor. I wonder how it deals with machine registers, which are often something you would be using in embedded C.

pjmlp · on Feb 5, 2020

I though this would be more like Frama-C, in this spirit it doesn't look like C to me.

So not sure about possible adoption among C devs, even when the idea looks quite good.

akavel · on Feb 5, 2020

Right, I wonder why they didn't build it with a C-like syntax?

socialdemocrat · on Feb 5, 2020

Very cool idea! What I love about C is that all sorts of programming language can talk to its ABI and that it fits well with low level programming.

However it is a cumbersome and unsafe language. This seems like a very nice solution. You can write in a much safer language while producing C code which does not look too alien relative to what you wrote

I have been interested in Rust but thinking it looks a tad too complicated. This may be a happy inbetween.

simias · on Feb 5, 2020

I'd expect that a formally provable language would be more complicated to write in practice than Rust. Take this example for instance:

>you must tell the compiler that accessing the array at position 2 is defined. quick fix for this one:

    fn bla(int * a)
        where len(a) == 3
    {
        a[2];
    }

In Rust you don't need the where clause, the `a[2]` operation will just panic at runtime if the array is too short. You don't have to prove to the language that the access is correct.

speps · on Feb 5, 2020

Except the checks in ZZ are at compile time and have no runtime overhead.

Gene_Parmesan · on Feb 5, 2020

Right, but the question was about which language was more complicated (with the implication being, complicated to write). In that sense I think my GP comment is probably correct.

dmos62 · on Feb 5, 2020

I've not looked at ZZ yet, but when we talk about formal proofs, we're usually proving much more interesting properties than what a type system typically proves.

firebacon · on Feb 5, 2020

> operation will just panic at runtime if the array is too short

Exactly, rust doesn't provide a good solution to this problem at all. Panics in rust are an escape hatch used to ensure the language stays "safe" in situations where the compiler can not prove a given behaviour at compile time, but where it would have made the language too ugly if you had to wire through Result types for all the trivial operations like adding two numbers.

In my experience, panics in rust have been a major source of pain. In contrast to an exception, which you can catch, a panic behaves more like an abort. At least it has been that way in the past. Now, with a lot of libraries using panics to signalize runtime errors, coding in rust has at some times felt like I was using a bunch of badly written C libraries that internally call "abort()" and kill the process when something goes wrong that would have been totally handle-able without killing the whole process. That's the benefit of using a safe language, right?

I think lately the rust "community" has become aware of this issue and IMO the way things are going is that that panics, as they are designed, should basically not be used. But, without proper exceptions, that brings you back to the situation where an operation as trivial as adding two numbers either produces a return code that must be explicitly checked or may silently fail and produce an "undefined" result in some cases.

jcranmer · on Feb 5, 2020

A panic in Rust nicely captures the category of errors which the programmer asserts should be impossible, which generally cannot be recovered well from. A panic is kind of catchable, but it only works in the sense of killing only part of the process rather than the entire process (like abort does).

If errors are expected, then you should Result instead of a panic. If people are using panics instead of Result for these kinds of errors, then the library is wrong. I'm curious what examples you have where this is happening.

steveklabnik · on Feb 5, 2020

What libraries are using panics for runtime errors?

wuschel · on Feb 5, 2020

If you like this, it might be worth looking at ATS: http://www.ats-lang.org/

throwaway17_17 · on Feb 5, 2020

I'm a fan of ATS, really I am, but recommending ATS to someone who said Rust looks too complicated made me literally laugh out loud. Like I said, I like ATS and don't particularly care for Rust, but ATS is in no way less complicated that Rust. But I really do feel like the laugh I had is going to make my day better, so thanks.

Just a note: the above is not sarcasm

Jaxan · on Feb 5, 2020

I like how they incorporate an SMT solver. They claim: “all code is proven”. What does that mean? What is proven about the code? Absence of memory bugs, or actual correctness of algorithms?

mhd · on Feb 5, 2020

"Beware of bugs in the above code; I have only proved it correct, not tried it."

frabert · on Feb 5, 2020

That the code is provably free of C undefined behavior

MaxBarraclough · on Feb 5, 2020

Looking at the example code, [0] it's clear you're correct.

As I just rambled about in another comment in this thread, their project summary isn't clear about this. Still a great idea for a language though.

[0] https://github.com/aep/zz/tree/master/examples/hello/src

fuklief · on Feb 5, 2020

+1

What does it even mean that the language is formally provable ?!

Jaxan · on Feb 5, 2020

To me it means that there is a formal semantics (for example in Coq), and that we can formally prove properties of the code. There does not seem to be a semantics here.

iainmerrick · on Feb 5, 2020

One question that isn’t obvious from the overview:

This language compiles to C and asserts that your program will never exhibit undefined behavior; have they proved that ZZ is correct, ie that it will definitely never output C code that exhibits undefined behavior?

If you really care about correctness, that seems important. I absolutely love the idea in general though.

pickdenis · on Feb 5, 2020

This seems similar to bootstrapping theorem provers. Here is a HN discussion on one: https://news.ycombinator.com/item?id=21358674

colonwqbang · on Feb 5, 2020

It's always concerning to see a compiler not being written in the compiled language.

simias · on Feb 5, 2020

So it's more like a C transpiler than a dialect of C IMO (it effectively looks more Rust-like than C-like). I'm not just saying that for nitpicking the obviously vague definition of "dialect" but rather because it's a very important feature IMO, and actually makes this project potentially more useful to me.

I'd be very wary of switching my toolchain to an experimental one in production, especially if I'm targeting some niche DSP. On the other hand generating C and then compiling it as usual seems less of a hurdle to me. They actually point that out in the intro but I thought it might be worth mentioning it here.

Now I just quickly skimmed the readme but do they explain how they deal with interfacing with standard, non-ZZ C code? I assume they need some sort of "FFI" bindings like Rust to make the code safe.

speps · on Feb 5, 2020

I think I found some FFI examples:

- Using ZZ struct in C: https://github.com/aep/zz/tree/master/tests/mustpass/inlinei...

- Using C code in ZZ: https://github.com/aep/zz/tree/master/tests/mustpass/ctype_i...

rezeroed · on Feb 5, 2020

It will always emit C into a C compiler which will then emit the binary.

Presumably we can dump the C before compilation?

iainmerrick · on Feb 5, 2020

Why?

For this purpose, C is a very convenient and very portable assembly language. You could use a verified C compiler like CompCert to convert it to machine code.

In an ideal world, there’d be no C compiler in the loop at all, sure. But in practice it’s not causing any trouble at all in a ZZ -> C -> machine code workflow. Quite the reverse, targeting C has some major benefits. (There are C compilers that are very fast, very highly optimizing, very portable, and/or verifiably correct.)

Edit to add: just realised I might have totally misunderstood your comment, sorry! Apologies for jumping the gun if so.

If you just mean can we save the generated C to disk instead of compiling it, yes, I would hope so too.

a3p · on Feb 5, 2020

Absolutely! The zz export command just dumps the C and SMT code along with makefiles for common build systems and stops there. Very handy for using it within other toolchains.

VladimirGolovin · on Feb 5, 2020

This is a very interesting idea. The world needs more languages that make formal provability practical.

And I think I found a typo: thery is_open(int*) -> bool; -- should be "theory".

_sbrk · on Feb 5, 2020

> Its main use case is code close to hardware, where we still program C out of desperation, because nothing else actually works.

I think I'll stick with Ritchie's language over this fly-by-night invention.

je42 · on Feb 5, 2020

> Checking is done by executing your code in SSA form at compile time within a virtual machine in an SMT prover. None of the checks are emitted into runtime code.

Cool !

dooglius · on Feb 5, 2020

VCC [0] was a project by Microsoft Research that was used to successfully prove Hyper-V correct, but unfortunately it seems to have been abandoned.

[0] https://github.com/microsoft/vcc

naasking · on Feb 5, 2020

No mention of unions, discriminated or otherwise. Sums/unions are a pretty important abstraction, and a lot of C's unsafety can sneak in if you don't enforce discriminated unions.

baybal2 · on Feb 5, 2020

This is gold, if it will actually work.

I will cross my fingers, and try it.

rurban · on Feb 5, 2020

Way cool. But in my testing yices blows away z3 by ~ factor 10. (see solvecomp.sh in one of the example ssa dirs)

camgunz · on Feb 5, 2020

Love this project. Super pragmatic (use an existing solver) and small scope (no template functions).

asimpletune · on Feb 5, 2020

I love this idea. Does anyone know what the compile-time performance is like?

waffle_ss · on Feb 5, 2020

Is the name supposed to be a French dick reference like Coq?

https://en.wiktionary.org/wiki/zizi#Noun

mr_vile · on Feb 5, 2020

Okay, it's provable... what value do really get out of it? isn't the whole reason why we still use C because it gets so close to resolving (or at least acknowledging) hardware or platform-level implementation problems? we can continue to re-invent the language in isolation but you can't replace a hardware-level programming language in this manner.

iainmerrick · on Feb 5, 2020

It is surprising that the syntax is so different for a “dialect” of C, but apart from that --

If the semantics are very close to C, so you can do basically all the same stuff, but you also get guarantees that your code will absolutely never hit any undefined behavior, that’s great and tremendously useful. It absolutely could replace C in applications where C is still the most useful and practical language.

It would resolve a couple of major headaches in existing C code: security bugs caused by memory overflows (caused by using arrays or pointers in undefined ways); and highly optimizing compilers doing weird things to your code, by exploiting undefined edge cases.

If I know my code will definitely not hit any undefined behavior, that gives me a ton more confidence that it won’t have stupid buffer overflow bugs and I won’t get mysterious errors on certain platforms.

DmitryOlshansky · on Feb 5, 2020

It can’t be a dialect of a language if the compiler of said language doesn’t compile it.

amitport · on Feb 5, 2020

https://en.wikipedia.org/wiki/Programming_language#Dialects,...

Plenty of well-known dialects are not subsets of said language (/compiled by its compiler).

marta_morena · on Feb 5, 2020

This is just another symptom of why OpenSource often sucks. Instead of somehow coordinating and focusing their efforts, everyone seems to need to start their own spin off "inspired" by other projects.

When do people realize that building a new language is almost always going to fail and only very very few languages ever reach anything close to adoption.

Instead of spending all this time writing your own doomed language, why not try to contribute to a project like LLVM or Rust and add your provable subset there?

The same goes for Linux, which is the paradigm of wasted efforts.

AnimalMuppet · on Feb 5, 2020

Bjarne Stroustrup never expected C++ to be very widely used. He made it anyway.

Alex Stepanov never thought that anyone would care about his ideas on generic programming. He pursued them anyway. They became the STL.

True, building a new language is almost always going to fail. The problem is, when someone starts working on a new language, they don't know if it's doomed or not. It is good that 1000 people try, because from that we get one language that many people use, and 10 specialized languages that a few people use, and 10 languages that nobody uses but future people steal some of the ideas.

> The same goes for Linux, which is the paradigm of wasted efforts.

Um... what? Wasted because nobody uses it? Very much no. Wasted because it's a duplication of what was there before? To some degree, yes. But not everything in Linux was in Unix before it. And Unix couldn't run all the places that Linux does (smartphones to mainframes). So, no, Linux is not wasted effort.

throwaway17_17 · on Feb 5, 2020

Just an example of why things don't go the way seem to feel they must:

I do things that are not for work, because I want to do them. I enjoy designing and implementing languages, I love writing compilers and interpreters. So, I don't care one bit if anyone ever sees them or uses them. Of those languages I've designed, the only language I consider minimally complete is one I designed for personal use on personal projects. I have been arguing with friends recently who want me to at least release it to the public, if only to post about it and its quirks on blogs.

I would be utterly shocked if anyone ever wanted to use anything I've built for fun/research, that's why I've never released any of it. Also because of the assumption you make being quite popular, that I somehow owe Open Source or something to help them do things that are interesting to them.

Additionally, telling a developer who is developing what they want for their own reasons to contribute to a project like LLVM or Rust is ridiculous. If these projects aren't what drew my interest why would I want to bend over backwards to change what I'm doing to try and fit it into some existing model.

TL;DR --> I don't program outside of my job for anyone but myself, and that's all that matters. If the ZZ devs want to make a provable dialect of C, that's what they should do.