I find the basic idea of this project to be very compelling - I was thinking aloud on HN recently and arrived at roughly the idea this project is implementing. [0]
With that said, I really dislike the way they're describing their project.
When I read safe dialect of C, I first assumed they meant they had developed a safe subset of C, or perhaps a very similar language, like OpenCL C [1]. Instead, they developed a new language which isn't C at all. Nothing wrong with that, but if I don't instantly recognise the syntax as C, I wouldn't call it a C dialect.
They also put formally provable dialect of C. Their language compiles to C code which is guaranteed to be free from undefined behaviour. This is not the same thing as a language where hard guarantees can be made about program behaviour, such as SPARK [2] or Dafny [3].
If the authors are reading this, I urge you to improve your project summary. Your language does not allow me to prove program correctness, instead it protects me from C's undefined behaviour. That's still a great idea! Please make this clear!
My pleasure. I've bookmarked the project - I intend to keep an eye on its developments.
Am I right in thinking that, as it stands today, ZZ can be used as a rock-solid protection against C's undefined behaviour (in my own code at least), and as a protection against accidentally writing non-portable C code? That's a great starting point, if so.
(A now-deleted comment said it's disappointing that ZZ doesn't support whole-program correctness-proofs. I'll hang my reply here rather than delete it.)
I should point out it's possible I completely misread the project summary. Perhaps it does support that after all. [0]
I still like the idea though, and might try it out at some point. There's a lot of value in guaranteeing no undefined-behaviour in my code. Imagine how much more secure our systems would be if they had these guarantees. There's also a lot of value in transpiling to portable, standard-compliant C code.
If you want a language that supports full proof-of-correctness, we do already have SPARK and Dafny, but they're no walk in the park. I'm an optimist about this though - I think they'll continue to slowly get more approachable.
A brief aside: SPARK strikes me as already being more approachable than the Event-B formal specification language, which starts with the formal specification in math (set theory), and ends in imperative code (after 'refining' the model into an implementation). That's despite that Event-B is a more approachable derivative of B-Method, itself a more approachable alternative to Z Notation. Annoyingly I couldn't find a great one-page summary to give the flavour of Event-B (there are a lot of beautifully crafted PDFs, as it's from academia), the best I can do is [1].
I think NVidia's adoption of SPARK for security critical firmware is victory, specially since they also evaluated Frama-C and Rust as part of their selection process.
Rust probably would have made it in the future, but it is still not mature enough to the domains NVidia intends to use SPARK on.
> Your language does not allow me to prove program correctness
Isn't that the whole point of the SMT solver?
What about this example (that doesn't compile)?
fn bla(int a) -> int
model return == 2 * a
{
return a * a;
}
Isn't that verifying program correctness? A sibling of this comment claims that "the only thing proven is memory access validity" but, again, this example takes that down.
I agree that it is. Looking at the page, I can't see how far this goes.
Was my earlier comment completely wrong? Does ZZ allow the programmer to express a formal specification, e.g. to verify a sort function? If so, their examples are selling their language very short.
Prove of algorithms is possible as long as there's a known method of doing so in SMT. That means in practice, if someone has written a paper for formally proving an algorithm in SMT, you can mostly copy paste the proof.
zz is developed in parallel with a large project using zz and new syntax sugar features will surface slowly as they become practically useful.
That being said, it will never replace external formal verification with something like coq. They serve a different purpose.
I'm not involved, but one obvious answer is: broader portability than Rust (this is explicitly called out in the ZZ article). Clearly devguard is targetting a broader set of devices than the limited set Rust currently targets (x86; arm, mips, riscv in tier 2, with various caveats for bare metal targets).
This defect is obvious to anyone reviewing the code, and it will be even more obvious when the function is used by another function with its own contracts.
In practice the issue with formal modelling is that it's very hard to scale up. Modelling the correct behaviour of traffic-lights is practical, and allows us to build completely bug-free traffic-light software, but these methods don't work so well for developing a word processor. Competent practitioners don't tend to mess up their formal models though. I've not heard an example of that causing defective software, but it's a valid question to ask.
Even without going the whole way to whole-program correctness guarantees, this kind of approach can be used to provide solid guarantees against bugs like buffer-overflows, without using runtime checks. This is something the SPARK language has supported for a long time.
Seems like they are using a narrow definition of "mathematically provable" -- they only thing that appears to be proven is that "all memory access is mathematically proven to be defined". Which just seems like a fancy way of achieving the same thing as a type checker? Either way, cool project, but is there anything else that ZZ can prove?
So it has typestates. (I recommend anyone [0] and its HN thread [1] for detail.) I think that's a great language feature for helping to avoid bugs, especially the kind that often happen in C, but I don't imagine it's anywhere close to proving whole-program correctness, is it?
For the curious, here's an official example in the SPARK language, that gives a proven-correct sort function. (A complete proof of correct program behaviour, not just absence of undefined behaviour). As you can see, proving correctness is a challenge that permeates every inch of the code. [2][3]
I imagine that whole-program correctness can be represented, but is not enforced in the case of ZZ. Developers are not obligated to declare typesets for their own code or can rely on other libraries being fully typed
I don't think it's really even possible for a language to force the programmer to give a proper formal specification.
We may want the formal spec to grant some leeway, i.e. not to specify the exact output/behaviour, but instead to express certain important constraints. For instance, we may not care which traffic-light turns green first, so long as the system preserves the important properties about safety and liveness.
If we wish to support such flexibility, we can't stop the programmer from expressing all output states are valid.
If we don't wish to support such flexibility, what we're really doing is comparing against a reference implementation.
> I don't think it's really even possible for a language to force the programmer to give a proper formal specification.
The language, probably not. The standard library can certainly require detailed-enough formal preconditions to make its code defect-free, meaning that any remaining bugs can then be directly traced to the programmer's code.
You mean that, for instance, a standard-library generic sort function, might require that your comparator function behave correctly when the arguments are reversed?
That's an interesting idea. Like Java interfaces, but with formal contracts.
I doubt that putting these contracts into the standard-library would give you good proof-coverage though.
Ah ok, not sure how I missed that. That still seems to be within the realm of a type system, no? Or would you consider type systems a subset of mathematically provable systems?
I agree it's the standard and the only thing that actually works (author's words), but it's still a pleasure for me to write and have to deal with C (for embedded). I'd be desperate if I have to be forced to deal with huge different paradigms because pointer problems or insert-your-C-rant-here. C is not going to be replaced on embedded any moment soon.
I’m a C developer working in the SP routing space.
Personally, I do not enjoy debugging memory corruption issues. I especially would not enjoy doing this under customer pressure...
Of course, in some spaces, C is basically the only option. But when you have options, and you’re working with a large enough codebase, C is a terrible choice.
I have an anecdote to share. I work in embedded space for a living. A web developer (which is quite funny) from another team somehow convinced our director to use Rust for a critical process that involved a lot of concurrent processing. Ok, I said, and began developing that process in Rust.
I estimated that it took me about 10x the time to implement something than it would have taken if I did that in C. The reason for that could definitely be because I'm not a Rust expert at all. That's fine though because I will happily invest more time in programming if it saves me hours of debugging later. Plus, the claim that "if it compiles, it works" was enough for me to fight the compiler for hours if it gives me no trouble during runtime.
Fast forward a week, we have that binary deployed to a non-critical set of our field devices. A few days later, we get reports of that particular binary crashing. I logged into one of those devices and fetched the logs. The binary was reporting a panic on an MPSC channel's tx send. There was nothing useful in that error message that would point me to the root cause. I had no tools to attach to a running process because I'm on a device with an ancient kernel.
To fix the situtation "temporarily", because the customer is getting infuriated, we redeploy our old C binary. It has been there since. I basically now say "fuck off" to anyone who tells me to use Rust because it doesn't work if it compiles.
"If it compiles it works" is just not true, just a very bad description for a true phenomenon. If there weren’t people repeating that seriously, you’d be attacking a strawman. People do say that, but that’s on those people, not Rust.
>Do I really know how much cycles/stack it takes to do std::sort(a.begin(), a.end()); in that specific platform? No, so I cannot trust it.
I also don't know how many cycles it takes for my implementation of quicksort apart from checking the output of a specific compiler and counting instructions. C is not, was not and will never be a portable assembler.
> I also don't know how many cycles it takes for my implementation of quicksort apart from checking the output of a specific compiler and counting instructions.
On any modern out-of-order CPU, that doesn't get you close to determining the dynamic performance. Even with full knowledge of private microarchitectural details, you'd still have a hard time due to branch prediction.
The embedded/micro space is not out-of-order, broadly. Memory access latency may be variable, or not, but I would charitably assume OP knows their subject material and is either using cycles as a metaphor, or actually works with tiny hardware that has predictable memory access latency.
On most (small, I'm not talking about mini linux) embedded systems, instruction counting will tell you how long something takes to run. In fact, the compilers available are often so primitive that operation counting in the source code can sometimes tell you how long something will take.
Depends a lot on the compiler and target arch. You'll miss out on a lot of stack accesses, or add too many. You don't get around looking at the final executable if you want good results. And for more complex targets, in the end you need to know what the pipeline does and how the caches behave if you want a good bound on the cycle count. Of course assuming you're on anything more complex than an atmega, for which op counting might be enough. I work in the domain; lots of people do measurements, which only give a ballpark but are bad since you might miss the worst case (which is important for safety critical systems, where that latency spike in the wrong moment might be fatal). Pure op counting is bad since the results grossly overestimate (eg you always need to assume cache misses if you don't know the cache state, or pipeline stall, or DRAM, or...). Look at the complexity of PowerPC, this should give you a rough idea what we're usually dealing with (and yeah, I'm talking embedded here).
To me that "sometimes" feels like "I can wrestle some bears with my bare hands, e.g.a Teddy bear" ;-)
"Most" embedded ARM? Cortex-A8 and smaller do not have OoO execution. Cortex-A9 is a 32-bit up-to-quad core CPU with clock of 800MHz-2GHz and 128k-8MB of cache. That's pretty big. I guess a lot of this is subject to opinion, but I don't think smartphones with GBs of RAM when I think of embedded systems.
All of Cortex M is in-order and only M7-- still somewhat exotic-- has a real cache (silicon vendors often do some modest magic to conceal flash wait states, though).
Alignment requirements are modest and consequences are predictable. Etc.
About the most complicated performance management thing you get is the analysis of fighting over the memory with your DMA engine. And even that you can ignore if you're using a tightly coupled memory...
Its slowly getting there, but no, right now there is a very limited number of target platforms that are supported, and the ecosystem is still very small and immature. You have to remember that there is a huge number of microcontrollers out there, with exotic architectures you have never heard of, and that LLVM definitely doesn't support.
I'm not sure embedded Rust would even meaningfully exist without the work of Jorge Aparicio.
Yeah. Even on the best supported platforms it is very immature. If you want to use embedded rust you kinda have to let language drive your choice of platform, which isn’t ideal.
I wouldn’t use it at work yet, but it’s coming along.
* Your microprocessor has a C compiler and standard library, as does every processor you might ever switch to. All the hardware documentation that isn't tables in a PDF will be in C.
* Your target's static analysis tools and interactive debuggers will all support C.
* Every RTOS and embedded library/filesystem/whatever will support (and likely be written in) C.
* All experienced embedded developers are fluent in C.
* Nobody ever got fired for choosing C for an embedded project.
Which, in a way, is an advantage. I know (much of) what to look out for. There are tools that can help me with some of those issues. There are techniques that avoid some of them, and there are people who are expert in many of them.
But if I pick some other language, it won't have those problems. It will have other problems. (There is no language that does not have problems.) I won't know what to avoid doing. There may not be tooling to help with them. The techniques for avoiding them may not be widely known. I may not be able to find people who know how to handle them.
To me, "well known problems" may be better than "not well known problems". More predictable, at least. The "not well known" problems have to be significantly better to be worth it. They probably have to be proven significantly better. That means either that someone else has to prove them better, or else I have to have a project that doesn't matter much that I can use as a testbed.
1) There is no need to. C already has all we need to build our systems. The rest is seen as overhead/over-complication.
Regarding Rust, I try to keep up to date about its progress. Unfortunately most of the diseases that Rust cures are not much of a trouble in embedded. My last UB, memory leak of loose pointer happened years ago. It will happen again and when it happens, I'll debug it. That's it. I'm not afraid of UB or dealing with pointers, even if there are 10K Rust users trying to FUD me. I know what I'm doing. Every embedded/kernel/driver developer know what they are doing. When shit happens, that's it, no big deal. You plug your debugger and solve the issue. It's not a nightmare that chases us in the middle of the day.
FUD alone is not enough for switching. So, why should I start thinking that a variable cannot change, because it's a constant (wasn't it a variable?), unless it can mutate, so it's a constant variable that can change because now is mutable?
Or constrain myself into borrow-checker torture for a thing I can do in a couple of instructions?
WHY?
2) This is not about a language problem, is about solving a programmer problem with language. If C = math, then you cannot do math simpler/better because today's mathematicians are sloppier. Or because bosses pressures people to deliver crappy products.
I'm not a genius. I'm far from it, and if any seasoned C programmer challenges me I'll probably run away. But embedded/kernel/driver development is harsh, so if a developer thinks that he/she cannot make it because language, then it's mostly about searching for an excuse. Time to change jobs.
The key is to think that a lot of people did (and does) a lot with so much less, for 40 years now. It's not a language problem. People have to learn to deal with it.
I was there too 20 years ago, when every C++ developer was afraid that they would lose their jobs because C#. It never happened. C++ is still one of the most used languages.
3) In my case, there are official libraries from manufacturers you have to use. Sometimes receiving customer support depends on if and how you use those libraries. All those libraries are in C. All the support is in C. All the examples are in C.
Yes, I know there is that engineer that has a Github repo with a library that works fine with that STM32 for that specific language, that now is getting support for embedded so in 5 years we could maybe put something in production. But, not for now.
"Every embedded/kernel/driver developer know what they are doing"
If that was true, then we wouldn't see more memory bugs found every time academics test a new analyzer or testing tool on open code programmed in C or C++. Microsoft said 70% of the problems they saw were memory safety. Linux has a ton of them. Even OpenBSD has many security fixes for memory safety. Your claim is mythical in the general case even if some individuals working on small codebases can pull it off.
> Every embedded/kernel/driver developer know what they are doing.
Yet we still see security issues in all of those. I'm not saying that everything should be re-written in a different language, but C developers saying "I'm a good developer, all those safety mechanisms would hold me back" doesn't hold water considering all the security vulnerabilities we see that would have been prevented if they had used a language with better safeguards.
This is a common way of thinking about it, but nobody really knows. It's purely a conjecture. There is no data to back it up, and it just appeals to common sense.
Should everybody drop C/C++/whatever and rush into the Rust train because Rust people has conjecture?
I ask the opposite question. What would happen if the only programming language left is C? Wouldn't we become better programmers and raise the bar so high that the bug count drops to 0?
> What would happen if the only programming language left is C? Wouldn't we become better programmers and raise the bar so high that the bug count drops to 0?
Given that C was the dominant programming language for UNIX applications for over a decade, I think we can look to history for an answer to this question. And I believe the record shows that the answer is "no."
>Should everybody drop C/C++/whatever and rush into the Rust train because Rust people has conjecture?
The way these things usually work in practice, the evidence that a new paradigm improves things usually builds up slowly. There will never be a point at which someone proves mathematically that C is obsolete. Instead, the gentle advantages of other options will get stronger and stronger, and the effectiveness of C programmers will slowly erode compared to their competition. At first only the people who are really interested in technology will switch, but eventually only the curmudgeons will be left, clinging to an ineffective technology and using bad justifications to convince themselves they aren't handicapped. Is Rust the thing that everyone except the curmudgeons will eventually switch to? Who knows, but if you don't want to end up behind the industry then it might pay to try it out in production to see for yourself. If you don't make room for research and its attendant risks you will inevitably fall behind.
I'm sorry you see things in terms of competition and curmudgeons. But I see your point, it's just another type of FUD: don't stay behind and adopt Rust because you'll be a curmudgeon.
Not exactly, I'm saying that if you stay behind and don't adopt something, where that something is whatever the industry switches to after C, you will eventually be left behind. Of course it is also possible (and likely for many people) to die or retire before that happens. It's not like C is going away any time soon.
> What would happen if the only programming language left is C? Wouldn't we become better programmers and raise the bar so high that the bug count drops to 0?
Is this a rhetorical question? Clearly the answer is no.
It seems like there is a big difference between a "mathematical certainty that certain bugs will not occur" if you play by the rules and just be a better programmer so that you don't write errors. Not that you won't have bugs in Rust but it seems like we should move towards having our tooling do more of the heavy lifting in ensuring correctness. I don't believe you will ever have the bug count drop to zero. I do believe in mathematical certainties though.
It says something of our profession that we have all of our products come with a legal document that effectively says it's not our fault if the product doesn't do what it's supposed to do.
We should just outlaw those damn statements (and binding arbitration for consumers) and see how the market naturally evolves. Don't force standards on programmers, don't regulate in some novel way, just get rid of the stupid disclaimers and let the old-fashioned legal framework return to primacy.
A language is a tool and if you're saying the problem is with people using the tool wrong, it's only half true. New tools are being invented to handle problems both old and new and discrediting them with what amounts to 'get off my lawn' is counter-productive. There are problems that are impractical to solve without tools designed for them and safe pointers are a great example. Yes, you'll hook up your debugger and yes, you'll fix the problem. I once fixed a dangling pointer problem after two 12h debugging sessions and while it felt great, I'd rather not do that again. I know there are people hunting these for months, so you could say I haven't seen anything yet, and I'll agree - but it's an argument in favor of better tools, not the other way around.
Of course it isn't. It's portable (-ish) assembler. The only thing replacing C will be even more in the C spirit. Adding more checking in system-programming-friendly ways is the only way I can think of for improving the situation.
I know people often refer to C as portable assembly, but really it's not. That sounds like the sort of thing gets repeated by people who neither know C nor assembly. C is incomparably higher level than assembly and lacks the quirkiness modern instruction sets have which are focused on specific odd things modern processors do well.
If you want the same runtime environment of asm without the tediousness of asm development, and a reasonable optimizer to save you more tediousness, you end up at C. I count that as close enough.
I hope that with IoT we get some commoditization and standardization in this space. Though ARM doesn't give me much hope. I don't get why there couldn't be 1-2-3-5 standard architectures and 10-20-50-100 hardware configurations that cover the full spectrum of embedded configurations.
We've had 32 bit x86 CPUs since 1985, surely we could produce a 100Mhz one for cheap enough that nobody would need to use 8 bit ones in 2020? I know that I'm just daydreaming and companies are stingy...
People have more or less converged on ARM for large things and 8051 for tiny things. I suspect the remainder are hiding in culturally isolated niches; I don't really know why people are still targeting PIC16, for example.
There's also the DSP subset where people need to write inner loops in highly platform-specific assembler. e.g. my employer Cirrus has our own "Coyote" architecture for which we have a C compiler: https://statics.cirrus.com/pubs/proDatasheet/CS4970x4_F1.pdf
"Cheap" isn't just about BOM price, the power consumption matters too.
So basically an Intel Pentium from 1994, so from 26 (!) years ago. Size: 90 mm2, maximum power consumption: 10W, price at launch: $699.
Buuuut: production process 600nm.
Today every Intel CPU uses a 14nm production process, but let's go for a cheaper option and "use" the 22nm.
So the gate size is ~30x smaller today. And I know that the relation between the gate size and the die size is not linear (2x smaller gate size usually leads to a die size which is smaller by more than 2x), but let's go with the 30x, to keep things simple.
For the power I have no idea, but I'd assume that the power consumption would go down at least 30x.
Price: hard to say, but considering how hardware evolves, definitely more than 30x cheaper :-)
So a Intel Pentium with modern technology could look something like this:
Size: 3mm2 (probably less), maximum power consumption: 0.3W (probably way less), price at launch: $30 (I'd argue that it would be closer to 0.3$ :-) ).
I could be way off in the weeds here...
My guess is that it's just a matter of existing tooling, expertise, human resistance to change, companies wanting to a) not risk anything and b) to nickel and dime everything.
I'm arguing for an x86 because that way you could throw any kind of modern tooling at it. ARM or MIPS would probably be better candidates.
Then you need external RAM, storage, a video card or companion chip for a display, a chip for holding the BIOS and a power supply (if not many) for all that. x86 as is, is not good enough for my wrist watch.
You can use a small ARM with integrated flash, RAM and LCD controller. Does it need MMU for Linux? There is family of products for that. Or use another family for bare-metal systems.
You can see how the thing starts to complicate regarding architectures/platforms.
Well, I just used the Pentium example because it was easy to source the data :-)
But my main point was to dump super old and limited architectures when these days we can economically use modern architectures we use everywhere else (x86/ARM/MIPS, whatever). If we wanted to, we could literally hoist designs from 20+ years ago and use newer production technologies to make them embeddable.
Using mainstream tech stacks is very empowering. Updated compilers everywhere, many programming languages and stacks with huge communities everywhere, modern debugging tools at low/no cost, fast debug cycles, etc.
But there's little interest in this because which self-respecting hardware maker would commoditize its own products? :-D
Hoisting old designs is unnecessary when you have x86 processors like Intel Atom, which are exactly that, and all the versions of ARM Cortex-R3 and R0.
The problem is not the processor, it's the lack of MMU and/or special DMA or interrupt engines and special GPIO. No kernel is potentially ported to such custom infrastructure, and if you have a very tiny flash, few of the ones which could will fit. (Say, target L4, porting Fiasco or OKL4.)
Oh, I must have missed that. They seem a bit too expensive for what I'm saying, I imagine these things have to cost cents to be worthwhile (50-70 cents or so).
Every single one of your projection is still way off compare to lowest cost embedded system. $0.3 is expensive when people are really, literally counting pennies.
Lots of people don't enjoy using C for various reasons, but generally programmers don't get to choose what language they work in, they get paid to work in whatever language is needed.
Whether programmers like it or not for non personal projects doesn't actually matter too much. If you don't like using C, then don't take jobs programming in C. Just don't whine when that makes it more difficult to get a job.
It does matter if the reasons why they do not like it are good.
Decision makers should listen to such input instead of deciding on programming language by fiat or convention. Including hardware choices. Of course this should be weighted by market availability and cost of both hardware and programmers.
Typically the reason C is used because there's no other toolchain available for said embedded device, except assembler, and they do not wish to invest in building or extending one. Those devices often do not have a kernel with POSIX-like syscalls to adapt an existing full blown libc, nor provide one.
Some embedded chipset libraries are also written in C sprinkled with copious assembly, so the language that's used with these has to be very easily interoperable.
>It does matter if the reasons why they do not like it are good. Decision makers should listen to such input instead of deciding on programming language by fiat or convention.
This is rarely the case. Almost all programming tasks aren't free choices out of thin air except for start-ups or small software companies. Far more often than not you have to stick with the existing language for compatibility, or to minimize maintenance costs, or in short because someone else already decided on the language before you arrived.
>Including hardware choices. Of course this should be weighted by market availability and cost of both hardware and programmers.
In those rare cases when you get an opportunity to make such a choice, sure. However, those aren't the only criteria either.
This is entirely orthogonal to the fine article and basically just a statement of personal preference. Having preferences is fine, and sure, if you like C you can take umbrage at some of the more flowery characterizations in the README, but it's not really the point of the language. There are lots of neat ideas in TFA that would be interesting to discuss; no one is interested in litigating the fate of C on embedded for the Nth time.
> I agree it's the standard and the only thing that actually works (author's words),
C++ also "actually works" and, thanks to its much greater ability to support fluent abstractions, ends up being safer in practice than raw C code. You can use C++ for low-level system components. Did you know Android's libc is actually written in C++?
This is a great achievement in making formal methods accessible in pragmatic terms - something that actually works and can be used by normal humans.
Would be nice to have an actual microcontroller example.
> The standard library is fully stack based and heap allocation is strongly discouraged
:/ - I can see why this is done, as it's hard, so banning it to make the problem tractable works. But it's also quite inconvenient. On the other hand, "MISRA C:2004, 20.4 - Dynamic heap memory allocation shall not be used."
There's the third sort of allocation that's often used in embedded systems - preallocated buffers, which are neither stack nor heap but set up by the .bss section of the executable.
These allow you to have very predictable memory usage.
Pre- and post-condition annotations (`where` and `model`, respectively) are great. It's nice to see those in a new language.
Syntactically, this seems more like a Rust dialect than a C dialect. The primary relation to C seems to be portability (transpiler) and integration (ABI). This is true of most C-transpiled languages, though.
It's certainly cute, and potentially useful if your program is small enough to be solved by a SAT solver. Ideally relatively quickly, or those compile times will be poor. I wonder how it deals with machine registers, which are often something you would be using in embedded C.
Very cool idea! What I love about C is that all sorts of programming language can talk to its ABI and that it fits well with low level programming.
However it is a cumbersome and unsafe language. This seems like a very nice solution. You can write in a much safer language while producing C code which does not look too alien relative to what you wrote
I have been interested in Rust but thinking it looks a tad too complicated. This may be a happy inbetween.
I'd expect that a formally provable language would be more complicated to write in practice than Rust. Take this example for instance:
>you must tell the compiler that accessing the array at position 2 is defined. quick fix for this one:
fn bla(int * a)
where len(a) == 3
{
a[2];
}
In Rust you don't need the where clause, the `a[2]` operation will just panic at runtime if the array is too short. You don't have to prove to the language that the access is correct.
Right, but the question was about which language was more complicated (with the implication being, complicated to write). In that sense I think my GP comment is probably correct.
I've not looked at ZZ yet, but when we talk about formal proofs, we're usually proving much more interesting properties than what a type system typically proves.
> operation will just panic at runtime if the array is too short
Exactly, rust doesn't provide a good solution to this problem at all. Panics in rust are an escape hatch used to ensure the language stays "safe" in situations where the compiler can not prove a given behaviour at compile time, but where it would have made the language too ugly if you had to wire through Result types for all the trivial operations like adding two numbers.
In my experience, panics in rust have been a major source of pain. In contrast to an exception, which you can catch, a panic behaves more like an abort. At least it has been that way in the past. Now, with a lot of libraries using panics to signalize runtime errors, coding in rust has at some times felt like I was using a bunch of badly written C libraries that internally call "abort()" and kill the process when something goes wrong that would have been totally handle-able without killing the whole process. That's the benefit of using a safe language, right?
I think lately the rust "community" has become aware of this issue and IMO the way things are going is that that panics, as they are designed, should basically not be used. But, without proper exceptions, that brings you back to the situation where an operation as trivial as adding two numbers either produces a return code that must be explicitly checked or may silently fail and produce an "undefined" result in some cases.
A panic in Rust nicely captures the category of errors which the programmer asserts should be impossible, which generally cannot be recovered well from. A panic is kind of catchable, but it only works in the sense of killing only part of the process rather than the entire process (like abort does).
If errors are expected, then you should Result instead of a panic. If people are using panics instead of Result for these kinds of errors, then the library is wrong. I'm curious what examples you have where this is happening.
I'm a fan of ATS, really I am, but recommending ATS to someone who said Rust looks too complicated made me literally laugh out loud. Like I said, I like ATS and don't particularly care for Rust, but ATS is in no way less complicated that Rust. But I really do feel like the laugh I had is going to make my day better, so thanks.
I like how they incorporate an SMT solver. They claim: “all code is proven”. What does that mean? What is proven about the code? Absence of memory bugs, or actual correctness of algorithms?
To me it means that there is a formal semantics (for example in Coq), and that we can formally prove properties of the code. There does not seem to be a semantics here.
One question that isn’t obvious from the overview:
This language compiles to C and asserts that your program will never exhibit undefined behavior; have they proved that ZZ is correct, ie that it will definitely never output C code that exhibits undefined behavior?
If you really care about correctness, that seems important. I absolutely love the idea in general though.
So it's more like a C transpiler than a dialect of C IMO (it effectively looks more Rust-like than C-like). I'm not just saying that for nitpicking the obviously vague definition of "dialect" but rather because it's a very important feature IMO, and actually makes this project potentially more useful to me.
I'd be very wary of switching my toolchain to an experimental one in production, especially if I'm targeting some niche DSP. On the other hand generating C and then compiling it as usual seems less of a hurdle to me. They actually point that out in the intro but I thought it might be worth mentioning it here.
Now I just quickly skimmed the readme but do they explain how they deal with interfacing with standard, non-ZZ C code? I assume they need some sort of "FFI" bindings like Rust to make the code safe.
For this purpose, C is a very convenient and very portable assembly language. You could use a verified C compiler like CompCert to convert it to machine code.
In an ideal world, there’d be no C compiler in the loop at all, sure. But in practice it’s not causing any trouble at all in a ZZ -> C -> machine code workflow. Quite the reverse, targeting C has some major benefits. (There are C compilers that are very fast, very highly optimizing, very portable, and/or verifiably correct.)
Edit to add: just realised I might have totally misunderstood your comment, sorry! Apologies for jumping the gun if so.
If you just mean can we save the generated C to disk instead of compiling it, yes, I would hope so too.
Absolutely!
The zz export command just dumps the C and SMT code along with makefiles for common build systems and stops there. Very handy for using it within other toolchains.
> Checking is done by executing your code in SSA form at compile time within a virtual machine in an SMT prover. None of the checks are emitted into runtime code.
No mention of unions, discriminated or otherwise. Sums/unions are a pretty important abstraction, and a lot of C's unsafety can sneak in if you don't enforce discriminated unions.
Okay, it's provable... what value do really get out of it? isn't the whole reason why we still use C because it gets so close to resolving (or at least acknowledging) hardware or platform-level implementation problems? we can continue to re-invent the language in isolation but you can't replace a hardware-level programming language in this manner.
It is surprising that the syntax is so different for a “dialect” of C, but apart from that --
If the semantics are very close to C, so you can do basically all the same stuff, but you also get guarantees that your code will absolutely never hit any undefined behavior, that’s great and tremendously useful. It absolutely could replace C in applications where C is still the most useful and practical language.
It would resolve a couple of major headaches in existing C code: security bugs caused by memory overflows (caused by using arrays or pointers in undefined ways); and highly optimizing compilers doing weird things to your code, by exploiting undefined edge cases.
If I know my code will definitely not hit any undefined behavior, that gives me a ton more confidence that it won’t have stupid buffer overflow bugs and I won’t get mysterious errors on certain platforms.
This is just another symptom of why OpenSource often sucks. Instead of somehow coordinating and focusing their efforts, everyone seems to need to start their own spin off "inspired" by other projects.
When do people realize that building a new language is almost always going to fail and only very very few languages ever reach anything close to adoption.
Instead of spending all this time writing your own doomed language, why not try to contribute to a project like LLVM or Rust and add your provable subset there?
The same goes for Linux, which is the paradigm of wasted efforts.
Bjarne Stroustrup never expected C++ to be very widely used. He made it anyway.
Alex Stepanov never thought that anyone would care about his ideas on generic programming. He pursued them anyway. They became the STL.
True, building a new language is almost always going to fail. The problem is, when someone starts working on a new language, they don't know if it's doomed or not. It is good that 1000 people try, because from that we get one language that many people use, and 10 specialized languages that a few people use, and 10 languages that nobody uses but future people steal some of the ideas.
> The same goes for Linux, which is the paradigm of wasted efforts.
Um... what? Wasted because nobody uses it? Very much no. Wasted because it's a duplication of what was there before? To some degree, yes. But not everything in Linux was in Unix before it. And Unix couldn't run all the places that Linux does (smartphones to mainframes). So, no, Linux is not wasted effort.
Just an example of why things don't go the way seem to feel they must:
I do things that are not for work, because I want to do them. I enjoy designing and implementing languages, I love writing compilers and interpreters. So, I don't care one bit if anyone ever sees them or uses them. Of those languages I've designed, the only language I consider minimally complete is one I designed for personal use on personal projects. I have been arguing with friends recently who want me to at least release it to the public, if only to post about it and its quirks on blogs.
I would be utterly shocked if anyone ever wanted to use anything I've built for fun/research, that's why I've never released any of it. Also because of the assumption you make being quite popular, that I somehow owe Open Source or something to help them do things that are interesting to them.
Additionally, telling a developer who is developing what they want for their own reasons to contribute to a project like LLVM or Rust is ridiculous. If these projects aren't what drew my interest why would I want to bend over backwards to change what I'm doing to try and fit it into some existing model.
TL;DR --> I don't program outside of my job for anyone but myself, and that's all that matters. If the ZZ devs want to make a provable dialect of C, that's what they should do.
With that said, I really dislike the way they're describing their project.
When I read safe dialect of C, I first assumed they meant they had developed a safe subset of C, or perhaps a very similar language, like OpenCL C [1]. Instead, they developed a new language which isn't C at all. Nothing wrong with that, but if I don't instantly recognise the syntax as C, I wouldn't call it a C dialect.
They also put formally provable dialect of C. Their language compiles to C code which is guaranteed to be free from undefined behaviour. This is not the same thing as a language where hard guarantees can be made about program behaviour, such as SPARK [2] or Dafny [3].
If the authors are reading this, I urge you to improve your project summary. Your language does not allow me to prove program correctness, instead it protects me from C's undefined behaviour. That's still a great idea! Please make this clear!
[0] https://news.ycombinator.com/item?id=22102658
[1] https://en.wikipedia.org/wiki/OpenCL#OpenCL_C_language
[2] https://en.wikipedia.org/wiki/SPARK_(programming_language)
[3] https://en.wikipedia.org/wiki/Dafny