Translating All C to Rust (TRACTOR)

steveklabnik · 2024-07-30T15:42:35.000000Z

See also https://sam.gov/opp/1e45d648886b4e9ca91890285af77eb7/view

thesuperbigfrog · 2024-07-30T16:05:14.000000Z

Direct link to Proposer's Day info [PDF]: https://sam.gov/api/prod/opps/v3/opportunities/resources/fil...

"The purpose of this event is to provide information on the TRACTOR technical goals and challenges, address questions from potential proposers, and provide an opportunity for potential proposers to consider how their research may align with the TRACTOR program objectives."

mike_hearn · 2024-07-30T16:10:24.000000Z

That sounds ... hard. Especially as idiomatic Rust as written by skilled programmers looks nothing like C, and most interesting code is written in C++ anyway.

Isn't it equivalent to statically determining the lifetimes of all allocations in the C program, including those that are implemented using custom allocators or which cross into proprietary libraries? There's been a lot of research into this sort of thing over the years without much success. C/C++ programs can do things like tie allocation lifetimes to what buttons a user clicks, without ref counting or other mechanisms to ensure safety. It's not a good idea, but, they can do it.

The other obvious problem with trying to write such a static analysis is that the programs you're analyzing are by definition buggy and the lifetimes might not make sense (if they did, they wouldn't have memory safety holes and wouldn't need to be replaced). The only research I've seen on this problem of statically detecting what lifetimes should be does assume the code being analyzed is actually correct to begin with. I guess you could try and aim for a program that detects where lifetimes can't be worked out and asks the developer for help though.

woodruffw · 2024-07-30T16:20:26.000000Z

It's very hard; DARPA likes to fund hard things[1] :-).

This isn't, however, DARPA's first foray into automatic program translation, or even automatic translation into Rust[2].

[1]: https://www.urbandictionary.com/define.php?term=DARPA%20hard

[2]: https://c2rust.com/

the_snooze · 2024-07-30T16:58:05.000000Z

DARPA is basically a state-sponsored VC that optimizes for completely different things. Instead of looking for 100x financial returns, they want technical advantages for the United States. The "moat" is the hardness of developing and operationalizing those technologies first.

mburns · 2024-07-30T20:07:25.000000Z

To be pedantic, In-q-tel is the literal state-sponsored VC.

DARPA is a step closer to traditional research labs but there is obviously some overlap.

https://en.wikipedia.org/wiki/In-Q-Tel

throwup238 · 2024-07-30T20:31:07.000000Z

> DARPA is a step closer to traditional research labs but there is obviously some overlap.

It's more like the NSF but focused on commercial grantees with project management thrown on top to orchestrate everything.

The really unique part is how much independence each program manager has and the term limits that prevent empire building.

woodruffw · 2024-07-30T17:01:14.000000Z

DARPA's commercialization track record is decidedly mixed, so the VC comparison is unexpectedly apt :-)

(But yes: DARPA's mandate is explicitly to discover and develop the next generation of emerging technologies for military use.)

pfdietz · 2024-07-30T20:34:28.000000Z

Decades ago, as my father explained to me, ARPA (no "D" at that time) was happy if 1% of their projects went all the way through to successful deployment. If they had a higher success rate it would mean they weren't aiming high enough.

VikingCoder · 2024-07-30T20:54:19.000000Z

> DARPA's commercialization track record is decidedly mixed...

If you count my number of attempts, sure.

If you count by impact, it's hard to come up with many things more impactful than the Internet...?

woodruffw · 2024-07-30T21:01:18.000000Z

Yeah, I meant by number. But also: ARPA didn't commercialize the Internet! They explicitly refused to commercialize it; commercialization only happened after an Act of Congress induced interconnections between NSFNET and commercial networks.

fsckboy · 2024-07-30T19:56:21.000000Z

in this case it seems to me the hard task that DARPA has chosen is to get me to forget how much they spent on pushing Ada.

woodruffw · 2024-07-30T20:00:50.000000Z

I can't find any clear references to DARPA (or ARPA) being involved in Ada's development. It was a DoD program but, well, the DoD is notoriously large and multi-headed.

(But even if DARPA was involved in Ada: I think it's clear, at this point, that Ada has been a resounding success in a small number of domains without successfully breaking into general-purpose adoption. I don't have a particular value judgment associated with that, but from a strategic perspective it makes a lot of sense for DARPA to focus program analysis research on popular general-purpose languages -- there's just more labor and talent available.)

indolering · 2024-07-31T04:56:38.000000Z

Too lazy to look it up, but I'm pretty sure DARPA was involved and certain that DoD contracta prioritized ADA for a long time.

rerdavies · 2024-07-31T07:44:48.000000Z

Too bored to pass up a challenge to refute somebody who is too lazy to look it up.

I looked it up. DARPA was not involved.

9659 · 2024-07-30T20:58:30.000000Z

ada does not require 'pushing'.

once the maturity of the users advances to a sufficient point, then ada is the only solution.

"ada. used in creating reliable software since 1983"

when i first saw ada, i didn't understand the why. now i understand the why, but ada is effectively gone.

-- old fortran / C / Assembly programmer

pjmlp · 2024-07-31T06:32:43.000000Z

Ada is still around, at a big enough level to keep 7 commercial vendors selling compilers.

Something unheard of, paying for software tools in 2024, who would imagine that.

9659 · 2024-07-31T12:58:47.000000Z

it was depressing when RH dropped ada support. sure, it was gcc, but it was so nice to have an ada compiler part of the default gcc installation.

gnat needs money. well deserved. but adoption needs a free, easy to install compiler.

5 years ago i had the pleasure of resurrecting a dead system. it was about 30k of ada, lets call it ada 87 (!). unknown compiler, 32 bit, 68K processor, 16 MB memory, unknown OS.

code was compiling in 2 days, running in 2 weeks. i needed to change from using 32 bit floats to 64 bit floats (seems positional data is a little more accurate in 2020). 1 declaration in 1 package spec and a recompile, and all my positions are good.

i love that language!

ajxs · 2024-07-31T21:29:44.000000Z

Very cool project! Were you able to build a working 68K GNAT cross-compiler yourself, or did you purchase one from one of the major Ada vendors?

9659 · 2024-08-01T00:37:40.000000Z

target was x86_64 / linux. just updated the rpm spec file for the gcc build to enable ada. rebuild and install.

so, changed wordsize, processor, operating system... minimal source code impact.

Avamander · 2024-07-31T09:23:32.000000Z

Oh, it's around, but laypeople never see those codebases.

reaperducer · 2024-07-30T20:51:45.000000Z

in this case it seems to me the hard task that DARPA has chosen is to get me to forget how much they spent on pushing Ada.

You hate jumbo jets, high-speed trains, air traffic control, and satellites?

9659 · 2024-07-30T20:59:41.000000Z

Do you know what fear is? Getting in an airplane where the flight controls use NPM.

warkdarrior · 2024-07-30T21:12:45.000000Z

   npm ERR! install Couldn't read dependencies
   npm ERR! package.json ENOENT, open '/boeing/787-9/flaps-up.json'
   npm ERR! package.json This is most likely not a problem with npm itself.
   npm ERR! package.json npm can't find a package.json file in your current directory.

tracker1 · 2024-07-31T14:59:22.000000Z

I have enough fears about features in the entertainment system, and that performance options are accessed through that same touch screen UX.

sam0x17 · 2024-07-30T16:39:24.000000Z

speaking of hard, the DOE actually funds a project that has been around for 20+ years now (ROSE) that involves (among other things) doing static analysis on and automatically translating between C/C++/Cuda and even high level languages like Python as well as HPC variants of C/C++. They have a combined AST that supports all of those languages with the same set of node types essentially. Quite cool. I got to work on it when I was an intern at Livermore, summer of 2014.

and it's open source as well! http://rosecompiler.org/ROSE_HTML_Reference/index.html

seren · 2024-07-31T07:30:42.000000Z

I have already seen legacy projects that were designed using Rational Rose, but for some reason I thought it was only a commercial name, not an actual system. Thanks, I learned something today !

andrewflnr · 2024-07-31T18:37:00.000000Z

Your first instinct was more correct, that's definitely a different thing. :) https://en.wikipedia.org/wiki/IBM_Rational_Rose

andrewflnr · 2024-07-31T03:43:47.000000Z

> a combined AST that supports all of those languages with the same set of node types essentially.

I can't believe that works at all. I'll take a look for sure.

sam0x17 · 2024-07-31T05:01:00.000000Z

Most of what they use it for is static analysis, but the funding comes from its ability to translate old simulation code to HPC-ready code. I think they even support fortran IIRC

rectang · 2024-07-30T19:50:49.000000Z

I have to imagine that in the general case it will be a translation to unsafe Rust, with occasional isolated leaf nodes being translated to safe Rust.

If you think it's hard wrestling with the borrow checker, just imagine how much harder it is to write automatic translation to borrow-checker-approved code that accounts for all the possible program space of C and all it's celebrated undefined behavior. A classic problem of writing compilers is that the space of valid programs is much larger than the space of programs which will compile.

A quick web search reveals some other efforts, such as c2rust [1]. I wonder how TRACTOR differs.

[1] https://github.com/immunant/c2rust

Someone · 2024-07-30T20:43:33.000000Z

> have to imagine that in the general case it will be a translation to unsafe Rust, with occasional isolated leaf nodes being translated to safe Rust.

That’s not what they are aiming for. FTA: “The goal is to achieve the same quality and style that a skilled Rust developer would produce”

> just imagine how much harder it is to write automatic translation to borrow-checker-approved code that accounts for all the possible program space of C and all it's celebrated undefined behavior

Nitpick: undefined behavior gives the compiler leeway in deciding what a program does, so the more undefined behavior a C program invokes, the easier it is to translate its code to rust.

(Doing that translation in such a way that the behavior remains what gcc, clang or “most C compilers” do may be harder, but I’m not sure of that)

rectang · 2024-07-30T21:14:53.000000Z

> undefined behavior gives the compiler leeway in deciding what a program does, so the more undefined behavior a C program invokes, the easier it is to translate its code to rust.

That's the kind of language lawyer approach that caused a rebellion in the last decade amongst C programmers against irresponsible compiler optimizations. "Who cares if your program actually works as intended? My optimization is legal according to the standard, it's your program that's written to exploit loopholes".

I don't see any evidence that that's the attitude being taken by TRACTOR — I sure hope it isn't. But hell, even if the result is unreliable in practice, I suppose that if somebody gets to claim "it works" then the incentives are aligned to produce garbage.

atiedebee · 2024-07-30T21:29:15.000000Z

> Who cares if your program actually works as intended? My optimization is legal according to the standard, it's your program that's relying written to exploit loopholes".

If your program invokes undefined behaviour, it's invalid and non-portable. Out of bounds array accesses are UB, yet a program containing them may just happen to work. It won't be portable even between different compiler versions.

The C standard is a 2 way contract: the programmer doesn't produce code that invokes undefined behaviour, and the compiler returns a standard conforming executable

matheusmoreira · 2024-07-30T23:27:21.000000Z

If undefined behavior is invalid, then reject the program instead of "optimizing" it. This "oh look undefined behavior I'm gonna turn the entire function into a no-op" nonsense is completely unacceptable. It's adversarial and borders on malicious. Null pointer check deletion can turn bugs into exploitable vulnerabilities.

zajio1am · 2024-07-31T02:13:50.000000Z

> If undefined behavior is invalid, then reject the program instead of "optimizing" it.

Undefined behavior is usually a result of runtime situation, it is usually not obvious from just the code whether it could or could not happen, so the compiler cannot reject the program.

The 'UB-based' optimization is just assumption that the code is correct and therefore UB-situation could not happen in runtime.

grumpyprole · 2024-07-31T07:55:00.000000Z

Usually but not always. For example, the removal of an empty effect free infinite loop. This should be an error.

the8472 · 2024-07-31T08:19:11.000000Z

The C++ forward progress guarantee enables more optimizations since it allows the compiler to reason more easily about loops:

> The standards added the forward progress guarantees to change an optimization problem from "solve the halting problem" to "there will be observable side effects in the forms of termination, I/O, volatile, and/or atomic synchronization, any other operation can be reordered". The former is generally impossible to solve, whereas the latter is eminently tractable.

But yeah, that's one of the more foot-gunny UB rules that Rust does not have. But it does mean it doesn't mark functions as `mustprogress` in LLVM IR which means it misses out on whatever optimizations that enables.

Avamander · 2024-07-31T09:29:10.000000Z

> This "oh look undefined behavior I'm gonna turn the entire function into a no-op" nonsense is completely unacceptable. It's adversarial and borders on malicious.

You significantly underestimate how much UB people write and overestimate the end-result if the current approach would not be taken.

rectang · 2024-07-30T21:41:18.000000Z

The C standard with its extensive undefined behavior causes programmers and compiler writers to be at odds. In a sane world, "undefined behavior" wouldn't be assumed to mean "the programmer must have meant for me to optimize this whole section of code away". We aren't on the same team, even if I believe that all parties are acting with the best of intentions.

I don't feel that the Rust language situation incentivizes such awful conflict, and it's one of many reasons I now try really hard to avoid C and use Rust instead.

astrange · 2024-07-31T01:39:28.000000Z

A funny thing about this problem is that it gets worse the more formally correct your implementation is. Undefined behavior is undefined, so it's outside the model, and if your program is a 100% correct implementation of a model then how can it know what to do about something outside it?

But I don't think defining all behavior helps. The defined behavior could be /wrong/, and now you can't find it because the program using it is valid, so it can't be detected with UBSan.

Asooka · 2024-07-30T21:51:02.000000Z

Doing one funny thing on platform A and a different funny thing on platform B when an edge case arises is way better than completely deleting the code on all platforms with no warning.

Someone · 2024-07-31T14:57:14.000000Z

> I don't see any evidence that that's the attitude being taken by TRACTOR — I sure hope it isn't.

I don’t see any way it can do otherwise. As a simple example, what would one translate this C statement to:

  int i;
  …
  i = abs(i);

? I would expect TRACTOR to generate (assuming 64-bit integers):

  let i: i64;
  …
  i = abs(i);

However, that can panic in debug mode and return a negative number in release mode (https://doc.rust-lang.org/stable/std/primitive.i64.html#meth...), and there’s no way for TRACTOR to know whether that makes the program “work as intended”. That code may have worked fine/fine enough) for decades because its standard library returns zero for abs(INT_MIN).

rectang · 2024-07-31T17:01:10.000000Z

It's possible to preserve the semantics of the original program using unsafe Rust. [1]

    unsafe {
        let mut i: std::os::raw::c_int
            = std::mem::MaybeUninit::uninit().assume_init();
        // ...
        i = libc::abs(i);
    }

That's grotesque, but it is idiomatic Rust insofar as it lays bare many of the assumptions in the C code and gives the programmer the opportunity to fix them. It is what I would personally want TRACTOR to generate if it could not prove that `i` can never take on the value `libc::INT_MIN`.

Given that generated code, I could then piecemeal migrate the unsafe bits to cleaner, idiomatic safe rust: possibly your code but more likely `i::wrapping_abs()` or similar.

What will TRACTOR choose? At least for this example, they don't have to choose inappropriate pruning of undefined behavior. They claim the following:

> The goal is to achieve the same quality and style that a skilled Rust developer would produce, thereby eliminating the entire class of memory safety security vulnerabilities present in C programs.

If they're going to uphold the same "quality", the translation you presented doesn't cut it. But you may be right and they will go down the path of claiming that a garbage translation is technically valid under undefined behavior and therefore”quality” — if so, I will shun them.

[1] https://play.rust-lang.org/?version=stable&mode=debug&editio...

Someone · 2024-07-31T19:02:30.000000Z

> It's possible to preserve the semantics of the original program using unsafe Rust

Because of the leeway the C standard gives you, you can preserve the semantics of the C program by just calling abs, and I think that’s the best you can do.

What the compiler does may be different for different compilers, different compiler versions or different compilation flags, so if all you have is the C source code, there’s no way to preserve the semantics of the machine code that the C compiler generates.

You could special-case all of them, but even then, there is the problem that a C compiler, even in a single translation unit, can inline one call and then apply some transformations while compiling another call to a call to a library function, making the semantics of overflow in one location different from that in another.

If you want to replicate that, I’d say you aren’t writing a C to rust translator, but a (C + assembly) to rust translator.

Also, if you go this route, you’d have to do similar gnarly stuff for all arithmetic on integers where you cannot prove there will not be overflow. I would not call the resulting code idiomatic rust.

rectang · 2024-07-31T20:21:38.000000Z

What you describe is antithetical to idiomatic Rust, written by a skilled Rust programmer.

To uphold the spirit of Rust, a C program must go through a process where assumptions are laid bare and footguns are dismantled. Applying an automatic process which arbitrarily changes the behavior from the implementation-dependent compilation of a C program just gets you a messy slop of hidden bugs collected inside an opaque, "safe" garbage can.

You don't get to Rust's reliability by applying a translation which discards it!

> Also, if you go this route, you’d have to do similar gnarly stuff for all arithmetic on integers where you cannot prove there will not be overflow.

Damn straight. That's what C is! It was always this bad, as those of us who have struggled to control it can attest. Faithful translation to unsafe Rust just makes it obvious.

artikae · 2024-08-01T22:06:31.000000Z

The first line is already UB. `assume_init` requires the contents to be initialized, hence the name.

rectang · 2024-08-02T04:07:41.000000Z

Mmm, I went back and read the docs for MaybeUnit more carefully and that's a good point.

It may be better to just leave the assignment off the declaration. If the variable is read before it's initialized to something, we'll get a Rust compilation error, forcing programmer intervention. Detecting actual bugs that would result in memory errors and forcing them to be resolved is very much in the spirit of Rust. TRACTOR may aspire to gift C programs with memory safety for free, but it won't always be possible.

Of course if TRACTOR can determine through static analysis that the unitialized read can't cause problems, it might emit different code.

derdi · 2024-07-30T21:34:56.000000Z

> undefined behavior gives the compiler leeway in deciding what a program does, so the more undefined behavior a C program invokes, the easier it is to translate its code to rust.

You assume that the compiler can determine what behavior is undefined. It can't. C compilers don't just look at some individual line of the program and say "oh, that's undefined, unleash the nasal demons". C compilers look at code, reason that if such-and-such variable has a certain value (say, a null or invalid pointer), then such-and-such operation is undefined (say, dereferencing that variable), and therefore on the next line that variable can be assumed not to have that bad value. Despite all the FUD, this is a very limited power. C compilers don't usually know the actual values in question, all they do is exclude some invalid ones.

bigstrat2003 · 2024-07-30T23:12:15.000000Z

I (not the person you are replying to) do understand that's how compilers interact with UB. However, a wealth of experience has shown us that the assumption "UB doesn't occur" is completely false. It is, in my opinion, quite irresponsible for compiler writers to continue to use a known-false assumption when building the optimizer. I don't really care how much speed it costs, we need to stop building software on a shaky foundation like that.

astrange · 2024-07-31T01:41:22.000000Z

Soon (or actually, already) we'll have MTE and CHERI, and then that C undefined behavior will be giving you security improvements as well as speed improvements.

Can't design a system that 100% crashes on invalid behavior if you've declared that behavior is valid, because then someone is relying on it.

mattgreenrocks · 2024-07-30T16:19:34.000000Z

Projects are termed DARPA-hard for a reason.

jandrese · 2024-07-30T17:12:27.000000Z

I have to think the approach will be something like "AI summarizes the features of the program into some kind of technical language, then the AI synthesizes Rust code that covers the same feature set".

It would be most interesting if the approach was not to feed the program the original program but rather the manual for the program. That said it's rare that a manual captures all of the nuances of the program so a view into the source code is probably necessary, at least for getting the ground truth.

munificent · 2024-07-30T17:22:30.000000Z

More like:

"AI more or less sort of summarizes the features of the program into some approximate kind of technical language, then the AI synthesizes something not too far from Rust code that hopefully covers aspirationally the same feature set".

trealira · 2024-07-31T23:52:49.000000Z

Ghidra, which is decompilation software, already manages to produce almost-valid C from assembly, and it does so without AI. I know nothing about how it works, but just from that, I'm guessing that producing almost-valid Rust from C code would be a simpler problem to solve.

ip26 · 2024-08-03T20:59:30.000000Z

In theory, a codebase is a language precisely describing a program. The same program can be described in other languages. So that’s what you’re asking the LLM to do, in the same way you can describe a flower in either English or Spanish.

the8472 · 2024-07-30T19:13:04.000000Z

Write tests for your C code. Run c2rust (mechanical translation), including the tests. Let a LLM/MCTS/verifier loop go to town. Verifier here means it passes compiler checks, tests, santiziers and miri.

Additional training data can be generated by running mrustc or by inlining unsafe code (from std/core/leaf crates) into safe code and running semantics-preserving mechanical refactorings on the code.

This can be closer to AlphaProof than ChatGPT

astrange · 2024-07-31T01:42:20.000000Z

You could already use ASAN + UBSan, or Frama-C.

the8472 · 2024-07-31T08:13:11.000000Z

I did mention using sanitizers in the verification step of the optimization loop. The optimization goal here would be reducing the lines of `unsafe` while preserving program semantics.

TestingWithEdd · 2024-07-31T04:55:08.000000Z

Essentially neural program synthesis

kragen · 2024-07-30T19:59:19.000000Z

presumably dan wouldn't have gotten darpa funding if it were obviously feasible, and success wouldn't give him anything publishable academically

dgacmu · 2024-07-30T21:42:11.000000Z

Just to be clear to others, Dan is the darpa PM on this - he convinced darpa internally it was worth funding other people to do the work, so he himself / his research group won't be doing this work. He's on leave from Rice for a few years to be a PM at DARPA's I2O.

And while DARPA doesn't directly care about research publications as an outcome, there's certainly a publishable research component to this, as well as a lot of lower papers-per-$ engineering and validation work. A lot of the contracts they hand out end up going to some kind of contractor prime (BBN, Raytheon, that kind of company) with one or more academic subs. The academic subs publish.

kragen · 2024-07-30T22:09:31.000000Z

thank you for the correction; I didn't realize he was the darpa pm

what you describe is exactly my experience as a darpa performer (on a program which dan is apparently now the pm for!)

elromulous · 2024-07-31T03:36:06.000000Z

> and most interesting code is written in C++ anyway.

You're just asking for people to bring out their pitchforks :P

dlenski · 2024-07-31T16:41:05.000000Z

Man, I want to upvote this but…

> most interesting code is written in C++ anyway.

Really?! The Linux kernel is a _pretty enormous_ counterexample, as are many of the userland tools of most desktop Linux distros.

I am also a key developer of an entirely-written-in-C tool which I'd venture that [a large fraction of desktop Linux users in corporate environments use on a regular basis](https://gitlab.com/openconnect/openconnect).

mike_hearn · 2024-08-02T08:49:07.000000Z

The refusal to use C++ in Linux isn't entirely rational. Nobody else makes that decision. Other kernels are a mix of C and C++ (macOS/iOS, Windows, even hobby operating systems like SerenityOS).

Then you get into stuff that's not kernels and the user-spaces are again mostly all C++. The few exceptions that exist are coming out of the 90s UNIX culture, stuff like Apache or nginx. Beyond that it's all C++ or managed languages.

dlenski · 2024-08-02T17:58:13.000000Z

> The refusal to use C++ in Linux isn't entirely rational.

Whether or not the decision not to use C++ is good or bad, rational or irrational, what is the relevance?

The point is that the Linux kernel is technologically interesting and innovative, under very active development, and it's written in C.

> The few exceptions that exist are coming out of the 90s UNIX culture, stuff like Apache or nginx. Beyond that it's all C++ or managed languages.

I literally just told you about a software project to which I contribute, which thousands of people and organizations use, which is written in C.

Also, it wasn't written in the ’90s.

01HNNWZ0MV43FF · 2024-07-30T16:21:14.000000Z

Can't most c++ be machine-lowered to C?

woodruffw · 2024-07-30T16:28:51.000000Z

Lowering is typically easier than lifting (or brightening). When you lower, you can erase higher-level semantics that aren't relevant; when you lift, you generally want to compose lower-level program behaviors into their idiomatic (and typically safer) equivalent.

pjmlp · 2024-07-31T06:34:22.000000Z

Yes, that is after all how C++ started.

How good the resulting performance would be like, that is another matter.

downrightmike · 2024-07-30T17:24:23.000000Z

If the IRS could have more timely funding, all their Cobol would be translated to Java by now

psunavy03 · 2024-07-30T21:40:12.000000Z

COBOL migrations are tar pits of replicating 40+ years of undocumented niche business logic for a given field, edge cases included, that was "commonly understood" by people who are now retired or dead. Don't get your hopes up.

pjmlp · 2024-07-31T06:37:01.000000Z

MicroFocus has COBOL compilers for Java and .NET, as do other COBOL vendors still in business.

Usually the biggest issue, is that most of the porting attempts don't start there, rather they go for the rewritte from scratch, and lets not pay the licenses for those cross-compilers.

childintime · 2024-07-30T16:24:44.000000Z

Hard for humans. But it's DARPA, is it hard for AI? Image classification used to be hard also, today cars drive themselves.

I'd say it's good timing.

Calavar · 2024-07-30T16:36:54.000000Z

> today cars drive themselves

You can attach about a hundred asterisks to that.

If anything, I think self the failure to hit L5 driving after billions of dollars and millions of man hours invested is probably reflective of how automatic C to Rust translation will go. We'll cruise 90% of the way, but the last 10% will prove insurmountable with current technology.

Think about the number of C programs in the wild that rely on compiler-specific or libc-specific or platform-specific behavior, or even undefined behavior plus the dumb luck of a certain brittle combination of {compiler version} ∩ {libc version} ∩ {linker version} ∩ {build flags} emitting workable machine code. There's a huge chunk of C software where there's not enough context within the source itself (or even source plus build scripts) to understand the behavior. It's not even clear that this is a solvable problem in the abstract.

None of that is to say that DARPA shouldn't fund this. Research isn't always about finding an industrial strength end product; the knowledge and expertise gained along the way is important too.

sqeaky · 2024-07-30T20:16:16.000000Z

This is the exact formulation of the argument before computers beat humans at chess, or drew pictures, or represented color correctly, or... Self driving cars will be solved. There is at least one general purpose computer that can solve it already (a human brain), so of a purpose built computer can also be made to solve it.

In 10 (or 2 or 50 or X) years when Chevy, Ford, and others are rolling out cheap self driving this argument stops working. The important thing is that this argument stops working with no change in how hard C to Rust conversion is.

We really should be looking at the specifics of both problems. What makes computer language translation hard? Why is driving hard? One needs to be correct while inferring intent and possibly reformulating code to meet new restrictions. The other needs to be able to make snap judgments and in realtime avoid hitting things even if it just means stopping to prefer safety over motion. One problem can be solved piecewise without significant regard to time and the other solved in realtime as it happens without producing unsafe output.

These problems really aren't analogous.

I think you picked self driving cars just because it is a big and only partially solved problem. One could just as easily pick a big solved problem or a big unstarted problem and formulate equally bad arguments.

I am not saying this problem is easy, just that it seems solvable with sufficient effort.

mywittyname · 2024-07-30T20:58:51.000000Z

> These problems really aren't analogous.

I'd put money on the solutions to said problems looking largely the same though - big ass machine learning models.

My prediction is that a tool like copilot (but specialized to this domain) will do the bulk of source code conversions, with a really smart human coming behind to validate.

fch42 · 2024-08-01T06:52:15.000000Z

With you, except for the conclusion "[ the tool ] will do the bulk of source code conversions, with a really smart human coming behind to validate".

The director orders the use of the tool when the dev team got downsized (and the two most-seniors left for greener pastures just after that). Validation is in the "extensive" tests anyway, we have those, right, so the new intern shall have a look, make it all work (fudge the tests where possible and remove the persistently failing ones as they've probably been always broken). The salesman said it comes from the DOA or DOD or something. If the spooks can do it so can we.

lmm · 2024-07-31T01:27:00.000000Z

> This is the exact formulation of the argument before computers beat humans at chess, or drew pictures, or represented color correctly, or...

Which are things that took 20 or 50 years longer than expected in some cases.

> I think you picked self driving cars just because it is a big and only partially solved problem. One could just as easily pick a big solved problem or a big unstarted problem and formulate equally bad arguments.

But C to Rust translation is a big and only partially solved problem.

psychoslave · 2024-07-30T17:03:52.000000Z

Ok, but if it's like 90% of small projects can use it as direct no pain bridge, that can be a huge win.

Even if it's "can handle well 90%" of the transition for any project, this is still interesting. Unlike cars on the road, most code transition project out there doesn't need to be 100% fine to provide some useful value.

0cf8612b2e1e · 2024-07-30T17:53:59.000000Z

Even if every project can only be 90% done, that’s a huge win. Best would be if it could just wrap the C equivalent code into an unsafe block which would be automatically triaged for human review.

Just getting something vaguely Rust shaped which can compile is the first step in overcoming the inertia to leave the program in its current language.

swiftcoder · 2024-07-31T09:18:16.000000Z

c2rust exists today, and pretty much satisfies this. I've used it to convert a few legacy math libraries to unsafe rust, and then been able to do the unsafe->safe refactor in the relative comfort of the full rust toolset (analyser + IDE + tests)

There is real utility in slowly fleshing out the number of transforms in a tool like c2rust that can recognise high-level constructs in C code and produce idiomatic safe equivalents in rust

fch42 · 2024-08-01T07:10:40.000000Z

"real" (large) C/C++ programs get much of their complexity from the fact that it's hundred of "sources" (both compiled and libraries) that sometimes, or even often, share global state and at best use a form of "opportunistic sharing". Global variables are (deliberately, and justifiedly-so) hard in rust, but (too) trivial in C/C++, cross-references / pointer chains / multi-references likewise. And once you enter threading, it becomes even harder to output "good" rust code - you'd have to prove func() is called from threaded code and should in rust best take Arc<> or some such instead of a pointer.

It'll be great for "pure" functions. For the grimey parts of the world, funcs taking pointer args and returning pointers, for things that access and modify global data without locks, for threaded code with implicit (and undocumented) locking, the tool would add most value. If it can. Even only by saying "this code looks grimey. here's why. A bit of FFI will also be thrown in because it links against 100 libraries. I suggest changes along those lines ... use one of the 2000000 hint flags to pick-your-evil".

D-Coder · 2024-07-30T17:28:56.000000Z

In addition to the other replies, this is a one-time project. After everything (or almost everything) has been translated, you're done, you won't be running into new edge cases.

programd · 2024-07-30T17:10:32.000000Z

> > today cars drive themselves

> You can attach about a hundred asterisks to that.

Not in San Francisco. There are about 300 Waymo cars safely driving in one of the most difficult urban environments around (think steep hills, fog, construction, crazy traffic, crazy drivers, crazier pedestrians). Five years ago this was "someday" science-fiction. Frankly I trust them much more then human drivers and envision a future utopia where human drivers are banned from urban centers.

To get back on topic, I don't think automatic programming language translation is nearly as hard, especially since we have a deterministic model of the machines it runs on. I can see a possible approach where AI systems take the assembler code of a C++ program, then translate that into Rust, or anything else. Can they get 100% accuracy and bit-for-bit compatibility on output? I would not bet against it.

m0llusk · 2024-07-30T20:06:00.000000Z

Opinions about automated driving systems vary. Just from my own experience doing business all around San Francisco I have seen at least a half dozen instances of Waymo vehicles making unsafe maneuvers. Responders have told me and local government officials that Waymo vehicles frequently fail to acknowledge emergency situations or respond to driving instructions. Driving is a social exercise which requires understanding of a number of abstractions.

fragmede · 2024-07-31T04:39:06.000000Z

they're not perfect, sure, but they're out there, just driving around all autonomously and all, contrary to GGP's assertion that they don't exist.

cuu508 · 2024-07-31T05:47:52.000000Z

GGGP talked about L5 self-driving, isn't Waymo L4?

creata · 2024-07-30T20:03:56.000000Z

Isn't 100% accuracy (relatively) easy? c2rust already does that, or at least comes close, as far as I know.

Getting identical outputs on safe executions, catching any unsafe behavior (at translation-time or run-time), and producing efficient, maintainable code all at once is a million times harder.

TestingWithEdd · 2024-07-31T04:58:44.000000Z

Limited to specific areas during specific hours, and have caused crashes (at least when I lived there till last summer).

saagarjha · 2024-07-30T21:10:46.000000Z

San Francisco, for all its challenges, mostly has traffic laws that people follow. This is not true throughout the world.

mike_hearn · 2024-07-30T16:32:30.000000Z

Well, Claude 3.5 can do translation from one language to another in a fairly competent manner if the languages are close enough. I've used it for that task myself with success (Java -> JavaScript).

But, this isn't just about rewriting code from one language to another. It's about reverse engineering complex information out of the code, which may not be immediately visible in it, and then finding a way to make it "safe" according to Rust's type system. Where's the training data for that? It'd be really hard even for skilled humans.

Personally I think the most pragmatic way to make C/C++ memory safe quicker is one of two approaches:

1. Incrementally. Make std::vector[] properly bounds checked (still not done even in chrome!), convert allocations to allocations that know their own size and do bounds checking e.g. https://issues.chromium.org/issues/40285824

2. Or, go the whole hog and use runtime techniques like garbage collection and runtime bounds checks.

A good example of approach (2) is Managed Sulong, which extends the JVM to execute LLVM bitcode directly whilst exposing to the C/C++/FORTRAN a virtualized Linux syscall interface. The whole piece of code can be sandboxed with permissions, and memory safety errors are caught at runtime. The compiler tries to optimize out as many bounds checks as possible. The interesting thing about this approach is it doesn't require big changes to the source code (as long as it's already been ported to Linux), which means the work of making something safe can be done by teams independent of the original authors. In practice "rewrite it in Rust" will usually mean a fork, which introduces lots of complicated technical, cultural and economic issues.

Managed Sulong is also a research project and has a bunch of problems to solve, for instance it needs to lose the JITC dependency and go fully AOT compiled (doable, there's no theoretical issue with it and much of the needed infra already exists). And performance/memory usage can always be improved of course, it regresses vs the original C. But those are "just" systems engineering problems, not rewrite-the-world and solve-static-analysis problems.

Disclosure: I do work part time at Oracle Labs which developed Managed Sulong, but I don't work on it.

Animats · 2024-07-30T20:29:40.000000Z

> But, this isn't just about rewriting code from one language to another. It's about reverse engineering complex information out of the code, which may not be immediately visible in it, and then finding a way to make it "safe" according to Rust's type system. Where's the training data for that? It'd be really hard even for skilled humans.

That might not be too bad.

A combination of a formal system and an LLM might work here. Suppose we see a C function

   void somefn(char* buf, int n);

First question: is "buf" a pointer to an array, or a pointer to a single char? That can be answered by looking at what the function does with "buf", and what callers pass to it.

If it's an array, how big is it? We don't have enough info to know that yet. But a reasonable guess, and one than an LLM might make, is that the length of buf is "n".

Following that assumption, it's reasonable to translate this to Rust as

   fn somefn(buf: &[u8])

and, if n is needed within the function, use

   buf.len()

The next step is to validate that guess. The run-time approach is to write all calls to "somefn" with

   assert!(buf.len() == n);
   somefn(buf, n);

Maybe formal methods can prove the assert true, and we can take it out. Or if a SAT solver or a fuzz tester can generate a counterexample, we know that the guess was wrong and this has to be done the hard way, as

   fn somefn(buf: &[u8], int n)

implying more subscript checks inside "somefn".

The idea is to recognize common C idioms and do clean translations to Rust for them. This should handle a high percentage of cases.

mike_hearn · 2024-07-31T07:56:51.000000Z

Yes, this is similar to what IntelliJ does for Java->Kotlin. Do a first pass that's extremely non-idiomatic and mechanical, then do lots of automated refactoring to bring it closer to idiomatic.

But if you're going to do it that way, the right place to start is probably to a safer form of C++ not Rust. That way code can be ported file-at-a-time or even function-at-a-time, and so you'll have a chance to run the assertions in the context of the original code. Which of course may not have good test coverage, as C codebases often don't, so you'll have to be testing your assertions in production.

Animats · 2024-07-31T17:23:48.000000Z

> But if you're going to do it that way, the right place to start is probably to a safer form of C++ not Rust.

There's something to be said for that. You're going to need at least an internal representation that's a safe C/C++.

pjmlp · 2024-07-31T06:38:57.000000Z

> Make std::vector[] properly bounds checked

Most compilers do have flags to turn this on, which I use all the time.

The issue is the "performance trumps safety" culture that pushes back against using them.

TinkersW · 2024-07-30T16:36:38.000000Z

std::vector [] has had bounds checking since forever if you set the correct compiler flag. Since they aren't using it this is a choice, presumably they prefer the speed gain.

mike_hearn · 2024-07-30T16:43:00.000000Z

You mean _GLIBCXX_DEBUG? It's got some issues. Linux only, it doesn't always work [1] and it's all or nothing. What's really needed is the ability to selectively opt-out on a per-instantiation level so very hot paths can keep the needed performance whilst all the rest gets opted into safety checks.

Microsoft has this:

https://learn.microsoft.com/en-us/cpp/standard-library/safe-...

but it doesn't seem to actually make std::vector[] safe.

It's frustrating that low hanging fruit like this doesn't get harvested.

[1] "although there are precondition checks for some string operations, e.g. operator[], they will not always be run when using the char and wchar_t specializations (std::string and std::wstring)."

TinkersW · 2024-07-31T00:36:35.000000Z

With MSVC you can use _CONTAINER_DEBUG_LEVEL=1 to get a fast bounds check that can be used in release builds. Or just use it in development to catch errors.

mike_hearn · 2024-07-31T07:58:19.000000Z

Interesting thanks. Seems the reason I couldn't find anything on that is because it's internal only and not a feature you're actually meant to use?

https://github.com/microsoft/STL/issues/586

> We talked about this at the weekly maintainer meeting and decided that we're not comfortable enough with the (lack of) design of this feature to begin documenting it for wide usage.

pjmlp · 2024-07-31T12:05:27.000000Z

What you want should be _ITERATOR_DEBUG_LEVEL instead, that is the public macro for bounds checking configuration.

Calavar · 2024-07-30T16:51:16.000000Z

As far as I am aware, the standard doesn't mandate bounds checking for std::vector::operator[] and probably never will for backwards compatibility reasons. Most standard library implementations have opt-out std::vector[] bounds checking in unoptimized builds, but not in optimized builds.

I tried a toy example with GCC [1], Clang [2], and MSVC [3], and none of them emit bounds checks with basic optimization flags.

[1] https://godbolt.org/z/W5e3n5oWM

[2] https://godbolt.org/z/Pe8nPPvEd

[3] https://godbolt.org/z/YTdv3nabn

TinkersW · 2024-07-31T00:38:42.000000Z

As I said you need the correct flag set.. MSVC use _CONTAINER_DEBUG_LEVEL=1 and it can be used in release. They have had this feature since 2010 or so, though the flag name has changed.

pjmlp · 2024-07-31T12:06:12.000000Z

The correct name is _ITERATOR_DEBUG_LEVEL.

pjmlp · 2024-07-31T12:09:17.000000Z

Add a "#define _ITERATOR_DEBUG_LEVEL 1" on top for VC++.

childintime · 2024-08-03T09:34:30.000000Z

In my experience claude.ai has near perfect grasp of what a program (that fits its window) is written to do. It can already make a program in another language that can do the same. What this means is that the cost of a full rewrite is going to come down dramatically over the next few years.

This is an excellent example of government action I like to see, as it isn't about favoritism or the swamp dynamics. Just provide a target, a bounty and no or low barriers to entry.

This challenge does push Rust out in front of everybody. That's a mixed blessing. I hope this challenge gets modified to not specify the target language, but instead the requirement of memory and type safety. Rust is likely an intermediate stop on the way to something better, and it shouldn't matter if that language is called Rust 2.0 or something else.

eesmith · 2024-07-30T17:27:28.000000Z

As a reminder, DARPA funded self-driving car research since at least the 1980s with the Autonomous Land driven Vehicle (ALV) project, plus the DARPA Grand Challenges, and more.

sans-seraph · 2024-07-30T17:59:29.000000Z

I have been aware of this proposed initiative for some time and I find it interesting that it is now becoming public. It is a very ambitious proposal and I agree that this level of ambition is appropriate for DARPA's mission and I wish them well.

As a Rust advocate in this domain I have attempted to temper the expectations of those driving this proposal with due respect to the feasibility of automatic translation from C to Rust. The fundamental obstacle that I foresee remains that C source code contains less information than Rust source code. In order to translate C code to Rust code that missing information must be produced by someone or something. It is easy to prove that it is impossible to infallibly generate this missing information for the same reason that scaling an image to make it larger cannot infallibly produce bits of information that were not captured by the original image. Instead we must extrapolate (invent) the missing information from the existing source code. To extrapolate correctly we must exercise judgement and this is a fallible process especially when exercised in large quantities by unsupervised language models. I have proposed solutions that I believe would go some way towards addressing these problems but I will decline to go into detail.

Ultimately I will say that I believe that it is possible for this project to achieve a measure of success, although it must be undertaken with caution and with measured expectations. At the same time it should be emphasized it is also possible that no public result will come of this project and so I caution those here against reading too much into this at this time. In particular I would remind everyone that the government is not a singular entity and so I would not interpret this project as a blanket denouncement against C or vice versa as a blanket blessing of Rust. Each agency will set its own direction and timelines for the adoption of memory-safe technologies. For example NIST recommends Rust as well as Ada SPARK in addition to various hardened dialects of C/C++.

steveklabnik · 2024-07-30T18:04:36.000000Z

> As a Rust advocate in this domain I have attempted to temper the expectations of those driving this proposal

Thank you!

pfdietz · 2024-07-30T20:42:19.000000Z

How does it relate to the CRAM effort at Grammatech?

https://cpp-rust-assisted-migration.gitlab.io/blog/

the8472 · 2024-07-31T08:42:16.000000Z

> In order to translate C code to Rust code that missing information must be produced by someone or something.

If you don't go for preserving the formal semantics of C code and instead only require the test-suite to still pass after translation that can provide a lot of wiggle room for the translation. This is how oxidation projects often work in practice. Fuzzers can also help with generating additional test data to get good branch coverage.

nanolith · 2024-07-30T16:21:07.000000Z

I'm personally not a fan of "rewrite the world in Rust" mentality, but that being said, if one is planning to port a project to a new language or platform, mechanical translation is a poor means of doing so. Spend the time planning better architecture and designing a better software system, and find a way to replace it piece by piece. Don't build a castle in the sky, because it will never reach the ground. If you've decided to use Rust for this system, that's fine. But, write Rust. Don't try to back-port C into Rust.

I think a far better and more mature process is to update C to modern C and use a model checker such as CBMC to verify memory, resource, and integer math safety. One gets the same safety as a gradual Rust rewrite, but the code base, knowledge base, and developers can be maintained.

IshKebab · 2024-07-30T16:32:26.000000Z

> I think a far better and more mature process is to update C to modern C and use a model checker such as CBMC to verify memory, resource, and integer math safety.

No chance. CBMC is amazing, but have you actually tried formally verifying a "real" program?

I agree replacing with a hand-architected Rust version is clearly the better solution but also more expensive. I think they're going for an RLBox style "improve security significantly with little-to-no effort" type product here. That doesn't mean you shouldn't do a full manual rewrite if you have the resources, but it's better than nothing if you haven't.

nanolith · 2024-07-30T19:57:37.000000Z

> No chance. CBMC is amazing, but have you actually tried formally verifying a "real" program?

Yes. Every day. It's actually quite easy to do. Write shadow methods covering the resources and function contracts of called functions, then verify the function. Repeat all of the way up and down the stack. It adds about 30% overhead over just TDD development.

PhilipRoman · 2024-07-30T20:50:10.000000Z

Last time I tried CBMC, it ended up running out of memory for relatively small programs, do you encounter any resource usage issues with it? I'm learning Frama-C and I find it more predictable, although the non-determinism of solvers shocked me when I first tried to prove non-trivial programs. I guess ideally I would like something even more explicit than Frama-C.

nanolith · 2024-07-30T21:04:46.000000Z

CBMC works best on functions, not programs. You want to isolate an individual function, then provide shadows of the functions it calls. The shadows should have nondeterministic behavior (cover every possible error condition) and otherwise follow the same memory and resource rules as the original function. For instance, if shadowing a function that reads a buffer, the shadow should ensure full buffer access as part of its assertions.

The biggest issue you will run into with bounded model checking is recursion and looping. In these cases, you want to refactor the code to make it easier to formally verify outside of the loop. Capture and assert on loop variants / invariants, and feed these forward in assertions on code.

There's no way I can capture all of this in an HN comment, but to get CBMC to work, you need to break down your code.

PhilipRoman · 2024-07-30T21:13:31.000000Z

Thanks, that was really helpful. Relying on getting shadow functions right does seem icky, but I guess the improved productivity of CBMC should make up for it. Definitely going to give it another chance!

nanolith · 2024-07-30T21:24:52.000000Z

You're welcome. I've been meaning to write a blog article on the subject, because it is a subtle thing to get working.

Think of shadow functions as the specifications that you are building. Unlike proof assistants or Frama-C, you write specifications in C itself, and they work similarly to code. Often, the same contracts you write in these specifications can be shared by both the shadow functions and the real functions they shadow.

I take a bottom-up approach to model checking. I'll start by model checking the lowest level code, then I'll shadow this code to model check code that depends on it. In this way, I can increase the level of abstraction for model checking, focusing just on the side effects and contracts of functions I shadow, and move up the stack toward more and more general code.

KsassPeuk · 2024-08-02T06:47:51.000000Z

What do you mean by "non determinism of solvers"? AFAIK, unless your proof finishes really close to the timeout, it is pretty uncommon that a failed PO suddenly succeeds and vice-versa if the code/the annotation are not modified.

pjmlp · 2024-07-31T06:40:01.000000Z

Modern C still has the same security exploits in arrays and strings as Classical C, nothing changed in 50 years.

nanolith · 2024-07-31T11:20:46.000000Z

Bounded model checking has changed things. C on its own can't solve these problems. Likewise, Rust on its own -- while it can solve memory errors -- can't demonstrate safety from all errors that lead to CVEs.

Practical formal methods using a tool like CBMC can make C safer. The existing code base can be made safer without porting it to a new language or using experimental mechanical translation. This isn't just something for C. Such tools exist for many languages now, including Rust, so that even Rust can be made safer.

pjmlp · 2024-07-31T11:55:20.000000Z

The amount of people using stuff like CBMC is like trying to boil the ocean.

WG14 can solve those problems, they decided it isn't their priority to fix C.

nanolith · 2024-07-31T12:05:22.000000Z

> The amount of people using stuff CBMC is like trying to boil the ocean.

That's like saying, "Getting everyone to use Rust or TDD or X is like trying to boil the ocean."

It's impossible to solve all things for all people at once. But, that doesn't mean that we can't advocate for tooling that can be used today to build safer software. This goes beyond C, as such tools and techniques are being ported to many languages and platforms.

Rust is a solution that works for some people. Modern C with bounded model checking is another solution that works for some other people. I'm certainly not going to change the minds of folks who have decided to port a project to Rust and who are willing to spend the engineering budget for this. But, hopefully, I can convince someone to try bounded model checking instead of maintaining the status quo. Because, the status quo is where we are with projects like the Linux kernel. Linux may pay lip service to Rust folks and allow them to write some components in that language, but the majority of the kernel is still in C and is not being properly vetted for these vulnerabilities, as we can see with the stream of CVEs coming out weekly.

> WG14 can solve those problems, they decided it isn't their priority to fix C.

WG14 must maintain some semblance of backwards compatibility with previous versions of C. It's no good to make a feature that breaks older code. This happens from time to time -- old school K&R C won't work in a C18 or C23 compliant compiler -- but efforts are made to keep that legacy code compiling, for good or ill.

pjmlp · 2024-07-31T14:17:08.000000Z

50 years are more than enough time to improve C's security story.

nanolith · 2024-07-31T15:29:51.000000Z

Yep, but we have to deal with what we have. For better or for worse, C remains where it is. We can either use process and tools to improve existing C, or throw our hands up.

I prefer to work toward fixing what is. We are unlikely to see things like array slices in C, and even if such features were added, this does nothing to fix the billions of lines of legacy code out there.

Y_Y · 2024-07-31T08:23:59.000000Z

The programmers have changed, the machines have changed, the literature has changed, the compilers have changed a lot. You can still write and run the old insecure code, but you'll get warnings and hit stack canaries and your colleagues will gasp at you and your merge requests will be rejected.

pjmlp · 2024-07-31T09:00:36.000000Z

Meaningless changes, as proven by the CVE database, or the kernel corruption by a bad pointer caused by Crowdstrike.

Y_Y · 2024-07-31T11:41:08.000000Z

I respectfully disagree. GP claimed that nothing has changed [regarding string and array security bugs in C] in 50 years. I responded that many relevant factors have changed, such that people tend to write different code now which is less susceptible to those bugs. Of course the same old bugs are possible, and sometimes good coders will still write them. Still I argue that there has been meaningful change since there are more protections against writing bugs in the first place, less incentive to write dangerous code, and more security for when (some) bugs still appear.

pjmlp · 2024-07-31T11:56:19.000000Z

ISO C89 is exactly like ISO C23 in that regard.

CVE database proves that those kind of errors keep coming up in 2024, regardless of those changes.

Not only do they keep coming up, the monetary cost of fixing those issues has raised up to a level that now even governments are looking into this.

Y_Y · 2024-07-31T13:07:04.000000Z

You've made three true statements, but I don't agree if you're implying that they prove that "nothing has changed". Bugs still appear, but they are significantly less common (per project or line not per year) and not as damaging when they occur. This is a non-trivial change for the better in the realm of C application quality.

There are more slaves in the world now than ever before in history, but global society has still made great progress on eliminating it in the last thousand years.

RavSS · 2024-07-31T20:23:53.000000Z

>ISO C89

Not that it matters, but isn't that technically ANSI C(89)? If I remember correctly, the first ISO C standard is instead C90, which is basically identical to C89.

Avamander · 2024-07-31T09:44:35.000000Z

All that has changed but we still got the libcue code execution bug.

I could not find an open-source static analyzer (including -analyzer) that would actually pick up the flaw before someone tries to exploit it.

And that's a simple example.

We can't tame the dragon C is, empirically nobody can.

Apofis · 2024-07-30T19:18:01.000000Z

This is definitely a pie-in-the-sky DARPA challenge that would be great to have around as we migrate away from legacy systems, however, even taking your functions/methods in one language and giving them to ChatGPT and asking it to translate your method to a different language generally doesn't work. Asking ChatGPT the initial problem you're trying to solve, works more frequently, but still generally doesn't work. You still need to do a lot of tinkering and thinking to get even basic things to work that it outputs.

usrusr · 2024-07-30T19:46:31.000000Z

If you have dormant code, as in running everywhere but not getting worked on anywhere, a "translate to shitty rust before ever touching again" has a certain appeal. Not the appeal of an obviously good idea: chances are the "shitty rust" created through translation would be so much worse to work on than C with some level of background noise of bugs (that would also be present in the "shitty rust" thanks to faithful translation). In C, people have an idea about how to deal with the problems. In "shitty rust", it's, well, shitty, because rust people are not used to that stuff.

But there's a non-zero chance that someone could develop a skillset for iteratively cleaning up into something tolerable.

And then there are non-goal things that could grow out of the project, e.g. some form of linter feedback "can't translate into tolerable rust because of x, y and z". C people could look into that, and once the code is translatable into good rust, why translate.

If that was an outcome of the project, some people might find it easier to describe their solution in runnable C and let the "translator/linter" guide them to a non-broken approach.

I'd certainly consider all these positive outcomes quite unlikely, but isn't it pretty much the job description of DARPA to do the occasional dark horse bet?

suprjami · 2024-07-30T21:43:22.000000Z

In my experience (supporting a machine-translated codebase which resulted in shitty Java) your theory doesn't play out.

If you give developers a shitty codebase then those developers will leave to work somewhere else.

After a few years of working on this codebase we had 88% turnover. 1 in 10 developers remembered the original project's design philosophy and intention.

It wasn't a good situation.

dcsommer · 2024-07-30T23:29:05.000000Z

GP was proposing a different situation where the source code is not changing or changing very rarely. If you have a high churn codebase, obviously the maintenance experience will worsen dramatically after machine translation (at least with many current tools), so your experience is not unexpected.

pdimitar · 2024-07-30T16:30:09.000000Z

> I'm personally not a fan of "rewrite the world in Rust" mentality

There is no such mentality anywhere. There is a ton of software that's much better off left alone in a dynamic language, or a statically typed language with a garbage collector (like Golang). Good engineers understand the idea of using the right tool for the job.

The push is to start reducing those memory safety CVEs because they have been proven to be a real problem, many times over.

> mechanical translation is a poor means of doing so

Agreed. If we could automatically and reliably translate C/C++ to Rust it would have been done already.

> Spend the time planning better architecture and designing a better software system, and find a way to replace it piece by piece.

OK, I am just saying that somewhere along that process people might get a bout of confidence and tell themselves "oh, we're doing C much better now, we no longer write memory safety bugs, can't we stop here?" and they absolutely will. Cue another hilarious buffer overflow CVE 6 months later.

> I think a far better and more mature process is to update C to modern C and use a model checker such as CBMC to verify memory, resource, and integer math safety.

A huge investment. If you are going to do that then you might as well just move to Rust.

> One gets the same safety as a gradual Rust rewrite

Maybe, but that sounds fairly uncertain or far from a clear takeaway to me.

uecker · 2024-07-30T20:11:31.000000Z

Rewriting is rarely a good idea in general. Rust proponents like to pretend that it is impossible to avoid safety issues in C while it is automatically given in Rust. But this is not so simply in reality.

pdimitar · 2024-07-30T20:21:46.000000Z

I don't like generalizations... in in general. :D (Addressing your "rewrites are rarely a good idea in general" here.)

My experience tells me that if a tech stack supports certain safety guarantees by default that this leads to measurable reduction of those safety problems when you switch to the stack. People love convenient defaults, that's a fact of life.

The apparently inconvenient truth is that most programmers are quite average and you can't rely on them going above and beyond to reduce memory safety errors.

So I don't buy the good old argument of "just hire better C programmers". We still have a ton of buffer overflow CVEs regardless.

And I never "pretended it's impossible to avoid safety issues in C". I'll appreciate if you don't clump me in some imaginary group of "Rust proponents".

What I'm saying is this: use the right tool for the job. The C devs have been given decades and yet memory safety CVEs are still prevalent.

What conclusion would you arrive at if you were in my place -- i.e. not coding C for a living for like 18 years now but still witnessing it periodically crapping the bed?

I'm curious of your take on this. Again, what other conclusion would you arrive at?

uecker · 2024-07-30T22:18:39.000000Z

I am complaining about the usual phrases which are part of the Rust marketing, like the "just hire better C programmer did not work" or the "why are there still CVEs" pseudo arguments, etc.

For example, let's look at the "hire better C programmers does not work" argument. Like every good propaganda it starts with a truism: In this case that even highly skilled C/C++ programmers will make mistakes that could lead to exploitable memory safety issues. The problem comes from exaggerating this to the idea that "all hope is lost and nothing can be done". In reality one can obviously do a lot of things to improve safety in C/C++. And even one short look at CVEs should make it clear that there is often huge room for improvements even with relatively simple measures. For example, a lot of memory safety bugs in C/C++ come from open-coded string or buffer manipulation. But it is not exactly rocket science to abstract this away behind a safer interface. But once this is understood, the obvious conclusion is that addressing some of these low-hanging fruits would be far more effective in improving safety than wasting a lot of time and effort in rewriting in Rust.

pdimitar · 2024-07-31T00:00:41.000000Z

> In reality one can obviously do a lot of things to improve safety in C/C++.

That's not "in reality", that's "in theory". Because in actual reality, people still write the good old buffer overflow bugs to this day.

I don't think anyone reasonable is disputing that we indeed can improve C/C++ programming. The argument of myself and many others like myself is: "a lot can be done but for one reason or another it is STILL NOT being done". Likely the classic cost cutting but there are likely other factors at play as well.

> But once this is understood, the obvious conclusion is that addressing some of these low-hanging fruits would be far more effective in improving safety than wasting a lot of time and effort in rewriting in Rust.

Explain why this has not been done yet. Explain why Microsoft, Google and various intelligence agencies attribute memory safety bugs to between 60% to 75% of all CVEs and demonstrable exploits that they are aware of.

Please do, I am listening. Why has almost nothing been done yet?

Secondly, "wasting a lot of time and effort in rewriting in Rust" is an empty claim. To demonstrate why, I ask you this: at which point the continued cost of investing in endlessly patching C/C++ and all its glorious foot-guns becomes bigger than the cost a rewrite?

Surely at one point just endlessly throwing money at something that gives you a 1% return of investment (in terms of getting more stable and less dangerously buggy) does indeed get more expensive than starting over?

I have no clear answer because it depends on the organization, the tenure of C/C++ and the devs in the org, and many others. It's strange that you pretend to have the answer.

nanolith · 2024-07-31T00:36:32.000000Z

> That's not "in reality", that's "in theory". Because in actual reality, people still write the good old buffer overflow bugs to this day.

That's because while the technology exists, it is not widely communicated. That's not a fault of C, and that's not something that any language can solve.

> Explain why this has not been done yet.

See above.

The technology to make C and C++ safer is not yet widely used. But, it exists and it is being used. I use it on every firmware and OS project that I currently work on. The code we produce is free of memory errors, integer errors, API misuse errors, resource management errors, cryptography errors, confused deputization errors, and a host of other errors that our specifications are designed to catch. That goes well beyond what Rust or any other language can provide on its own. But, to be fair, Rust developers can do this using similar tooling.

It's laudable that you wish to rid the world of memory errors. I want to normalize going three or four steps further. Rust by itself won't get us there.

pjmlp · 2024-07-31T09:13:59.000000Z

The proven fact that the said technology has failed its purpose, as the C and C++ culture keeps resisting its adoption, is the fact that all CPU vendors are now integrating hardware memory tagging as the ultimate weapon against memory corruption exploits.

Solaris has already been doing it since 2015, ARM more recently, we have Microsoft putting the big buckets into CHERI (including custom FPGA boards for testing), the new CoPilot+ PCs architecture with Pluton, and while AMD/Intel attempts weren't quite right like MPX, they will surely do something for x64 as well.

nanolith · 2024-07-31T11:29:34.000000Z

> The proven fact that the said technology has failed its purpose,

How, because other solutions are being explored? That's not due to a failure of one thing, but because both defense in depth and a desire to fix existing systems with no additional engineering are paths that security researchers and vendors explore. Not everyone will converge on a single solution, even when that solution is practical.

Just because something is not being used universally doesn't mean that it has failed. Moreso, it is not widely known about, and there persists rumors that it requires extraordinary effort, often reinforced by well meaning, but rather outdated advice.

pdimitar · 2024-07-31T21:58:11.000000Z

> desire to fix existing systems with no additional engineering

I, too, enjoy sci-fi.

> Just because something is not being used universally doesn't mean that it has failed.

You are only correct in the dictionary sense of these words. Fact is that a lot of the programmers are vain creatures prone to ego, and they make their chosen technical stack part of their core identity. This prevents them from being flexible, they get rigid as they age and they become part of the problems they so passionately wanted to fix when they were young.

None of that is made easier by the managerial class that absolutely loves and financially stimulates the programmers who don't want to rock the boat.

So I'd say if the said CBMC, and likely other tools in the same area, has more or less failed if it could not convince a critical mass of C/C++ devs to use it and finally start keeping up with Rust (and the other languages @pjmlp mentioned).

> Moreso, it is not widely known about, and there persists rumors that it requires extraordinary effort, often reinforced by well meaning, but rather outdated advice.

The victims of Heartbleed and many other CVEs don't care. The breaches happened anyway.

I am amazed at your desire to downplay the problem and keep claiming that eventually stuff will work out.

I disagree. And I'll repeat a very core part of my argument: C/C++ devs were handed a monopoly in their areas for decades and they still can't arrive at a set of common techniques that reduce or eliminate memory safety bugs.

I am not impressed. And I am not even a particularly good programmer. Just a diligent guy with average programming ability whose only unique trait is that he refuses to accept the status quo and always looks at how can stuff be improved. But this has taken me a long way.

nanolith · 2024-08-01T00:04:07.000000Z

> I, too, enjoy sci-fi

I was characterizing these hardware changes as being fantasy, so I'm glad you agree.

> So I'd say if the said CBMC, and likely other tools in the same area, has more or less failed if it could not convince a critical mass of C/C++ devs to use it

So, in the same vein, Rust has failed because it has only been around for a similar amount of time and people still use C/C++?

> The victims of Heartbleed and many other CVEs don't care. The breaches happened anyway.

I fail to see how a CVE that occurred due to poor engineering practices has anything to do with the adoption of good engineering practices and tooling. Yes, Heartbleed is why we need this tooling.

You are simultaneously arguing that if we could just adopt Rust, our problems would be solved, but since another technology has not yet been adopted, it has failed. Rust isn't adopted due to programmer ego, but the use of tooling that does the same thing as Rust and more has not yet been adopted because it has failed. Do you not see the logical inconsistency in your position?

pdimitar · 2024-08-02T12:55:41.000000Z

> So, in the same vein, Rust has failed because it has only been around for a similar amount of time and people still use C/C++?

Yes, it kind of failed there indeed. And I even hinted at why: Rust is far from perfect and its async implementation is a cobbled together mess. Golang's model reads much better, though I hate their foot-guns quite a lot (like writing to a closed channel leads to a panic; who thought that was a good idea?).

> I fail to see how a CVE that occurred due to poor engineering practices has anything to do with the adoption of good engineering practices and tooling. Yes, Heartbleed is why we need this tooling.

You can't see it? But... the good practices do lead to less of these CVEs as you yourself seem to realize? I don't get this part of your comment.

> You are simultaneously arguing that if we could just adopt Rust, our problems would be solved, but since another technology has not yet been adopted, it has failed.

You have answered it yourself: a lot of people see manual wrangling of `void**` as a badge of honor and their ego takes over (and the fear of being displaced, of course). I claim that Rust is not being more widely adopted due to programmer ego and fear of being obsolete. The fear of the end of nice salaries because they belong to a diminishing cohort of old-school cowboys.

Who would not fear that? Who would want that to end?

> Do you not see the logical inconsistency in your position?

No, and I don't get your argument. The reasons for C/C++ devs not improving the memory safety of their code, and the reasons for them not adopting Rust are very different. Not only is the analogy bad, it is plain inapplicable.

---

But it also does not help that HN reacts like a virgin schoolgirl pinched on the arse when Rust is mentioned. I've coded it for a few years, I loved it, I hated the bad parts and called them out, but even to this day I very quickly and easily get branded as a Rust fanboy even if my comment history shows balanced criticisms towards it. People don't care. People are emotional and are quick to put you in a camp that's easy to hate.

That is the part that I truly hate. No objective debate.

Too expensive to move to Rust? GOOD! That's an amazing argument, we can talk that for weeks and get very interesting insights in both directions.

People unwilling to get re-trained? Also a good argument, with big potential for interesting insights!

But most of everything else is at the level of a heated table debate after the 11th beer. Pretty meh and very uninteresting. No idea why I keep engaging, I think I am just bitter that people who REALLY should know better are reacting on emotion and not on merit. But that's on me. We all have our intolerances to the reality we inhabit. This is one of mine.

pjmlp · 2024-07-31T11:58:30.000000Z

Hardware is the ultimate castle wall when nothing else fixes the problem at the software level.

nanolith · 2024-07-31T12:27:21.000000Z

That's a rather cynical interpretation of these initiatives. CHERI, for instance, has been in development for twenty years. It predates the general availability of open source tools like CBMC or languages like Rust. But, that doesn't make the concept better or obsolete. It makes it complementary.

Hardware security is complementary to software security. Mitigations at the hardware level, the hypervisor level, and the operation system level complement architectural, process, and tooling decisions made at the software level.

Defense in depth is a good thing. There can always be errors in one layer or another, regardless of software solution, operating system, hypervisor, or hardware. I can wax poetic about current CPU vulnerabilities that must be managed in firmware or operating systems.

pjmlp · 2024-07-31T14:14:24.000000Z

Complementary, as the ultimate defence wall.

Many of the issues caused by C, are solved by Modula-2, Object Pascal and Ada, we didn't need to wait for Rust. But those aren't the languages that come for free with UNIX.

Or even better, they would be solved by C itself, if WG 14 cared even a little about providing proper support for slices, proper arrays and proper string types, or even as library vocabulary types.

But what to expect, when even Dennis Ritchie wasn't able to get his approach to slices being worked on by WG 14.

So hardware memory tagging, and sandboxed enclaves it is.

nanolith · 2024-07-31T15:43:13.000000Z

There is nothing wrong with defense in depth. But, this is not where things stop.

I make extensive use of bounded model checking in my C development. I also use privilege separation, serialization between separate processes, process isolation, and sandboxing. That's not because bounded model checking has somehow failed, but because humans are fallible. I can formally verify the code I write, but unless I'm running bare metal firmware, I also have to deal with an operating system and libraries that aren't under my direct control. These also have vulnerabilities.

That's not a trivial thing. The average software stack running on a server -- regardless of whether it is written in C, Rust, Modula-2, Pascal, Ada, or constructively proven Lean extracted to C++ -- still goes through tens of millions of lines of system software that is definitely NOT safe. All of that code is out of a developer's control for now. Admins can continually apply patches, but until those projects employ similar technology, they are themselves a risk.

One day, hopefully, all software and firmware will go through bounded model checking as a matter of course. Until then, we work with what we can, and we fix what we can. We can also rely on hardware mitigations where applicable. That's not failure as you have claimed, but practical reality.

pdimitar · 2024-07-31T22:15:46.000000Z

> I make extensive use of bounded model checking in my C development...

I would absolutely love it if you were the majority, alas you are not.

I emulate exhaustive pattern matching in my main language of choice because it does not have it (it's not Rust or OCaml or Haskell) but because I saw how beneficial and useful it is. And sadly, many of the other devs using that language don't do so, and I have made a good buck going after them and fixing their mistakes.

I don't doubt your abilities as a person. I doubt the abilities of the corpus of C/C++ devs at large.

nanolith · 2024-07-31T23:25:48.000000Z

Well, that's something I hope to change. The tools required to write safer software exist. They just aren't widely distributed yet.

I can say, without ego, that I'm a reasonably good software developer. But, it is the tooling and process that I use that allows me to build safer software and that makes me a reasonably good developer. The same is true of Rust developers.

I can teach these skills to other developers, and in fact, I have plans to do so.

I don't expect things to change overnight, any more than I expect things to be rewritten in Rust overnight. C++ has been around for nearly 40 years, and software is still written in C. But, we can do better, and we must do better.

pdimitar · 2024-07-31T23:31:33.000000Z

Fully agreed. I hope you don't take my criticisms and our back and forth as hostile -- they are not.

nanolith · 2024-08-01T00:33:05.000000Z

I don't. Passion is good, and I'm glad we can have a passionate discussion while remaining civil.

We both want the same thing: safer software.

roca · 2024-07-31T05:58:23.000000Z

Bounded model checking is not a silver bullet. If you want to prove it is, verify a Web browser and blog about it.

nanolith · 2024-07-31T11:39:58.000000Z

There are no silver bullets. But, that doesn't mean that we should dismiss tooling that is not well understood in order to chase unrealistic goals, like rewriting extant code bases in a different language to achieve security goals. Or, worse, as this article suggests, using mechanical translation to somehow capture the features of error-prone software without carrying over the errors.

Better process and better tooling allows us to write better software. Bounded model checking is an incredibly useful bit of tooling that allows us, within context of the software, to demonstrate that certain conditions do not arise. This includes memory errors, resource errors, and other classes of errors. The limitation is the faithfulness of the translation to SMT and the complexity of the code being modeled. The former has gotten quite good with CBMC 6, and the latter can be managed through careful refactoring and shadow function substitution.

Is it magic? There is no such thing. But, it is a practical tool that is available for use today.

One need not wait until an entire web browser is verified using it. It can scale to this, but given the unreasonable size and scope of web browsers with respect to this challenge, which are basically operating systems and suites of software in one these days, that's like saying, "verify all software then blog about it."

pdimitar · 2024-07-31T00:54:41.000000Z

> That's not a fault of C, and that's not something that any language can solve.

If you say so. Rust clearly does, and before you go saying "but `unsafe` exists!" I'll have to remind you that (1) scarcely any Rust devs reaches for that and (2) it still keeps quite a lot of guarantees and only relaxes some. Some, not all. Not even most.

> It's laudable that you wish to rid the world of memory errors. I want to normalize going three or four steps further. Rust by itself won't get us there.

Well now we are on the same page. I never said "ONLY Rust will save us", I am saying that Rust clearly can get us further than we are right now. If there's something even more accessible, less verbose, and with not such a cobbled together Frankenstein async implementation like Rust, I'll start using it tomorrow.

nanolith · 2024-07-31T01:17:20.000000Z

> Rust clearly does

Until it exists at the kernel layer, the firmware layer, the runtime library layer, and the application layer, these issues still exist. CVEs come out weekly for memory errors in Linux, in firmware, in operating system libraries, and in application libraries. We need to think beyond rewriting code in one language or platform, and instead think about technologies that we can apply to all languages and platforms, including C and Rust.

> I am saying that Rust clearly can get us further than we are right now.

As can bounded model checking, without having to teach developers a new language with new idioms.

> If there's something even more accessible, less verbose, and with not such a cobbled together Frankenstein async implementation...

Indeed there is. Reach for the bounded model checker that works with your existing language or platform. Pour over the manual, and look at existing practical examples.

If you like Rust, feel free to use it. But, if you prefer C/C++, Pascal, Ada, Python, C#, Java, or Modula2, that's fine. Either use an existing bounded model checker for that language or port CProver / GOTO to that platform. Rust developers ported CProver to Rust via Kani, because they also recognize that writing safer code can't be done by language alone.

I don't think it's necessary to push people to use different languages or platforms to write safer code. They just need to use or port existing tooling and learn safer coding practices. If I come at firmware developers or old school OS developers with "we need to use Rust", the conversation is immediately shut down and I'm considered a fool. If, instead, I show them tooling that allows them to maintain their existing code base and make it safer, I get much further.

adgjlsfhk1 · 2024-07-31T03:10:39.000000Z

bounded model checking is a new language with new idioms

nanolith · 2024-07-31T03:40:24.000000Z

Respectfully, that's a rather extraordinary claim. There are model checkers that use separate specification languages, but there are also model checkers embedded in the host language.

CBMC translates C -- the same language -- to an SMT solver. A different target but the same language.

It is true that new idioms will often be discovered along the way of converting existing C to pass the bounded model checker in every branch condition and in every case. However, software that is already relatively safe will require very little modification. I've seen it go both ways. Simpler code bases can pass model checks relatively unscathed. More complex code bases require refactoring to pass model checking.

To my point, the code base can remain in C, and can be model checked gradually. It doesn't have to be ported to a different language or platform. But, it will require added assertions and some refactoring to make the execution of code more clear. It's still in C. The specifications are specified in C using regular assertions. The only thing that changes is that one will often use shadow methods -- still written in C but simpler than the functions they are shadowing -- in order to model check other functions.

Other bounded model checkers like JBMC, Kani, or PolySpace work in similar ways.

andrewflnr · 2024-07-30T23:35:33.000000Z

> There is no such mentality anywhere.

There definitely is. Mainstream and official Rust community material is generally sane, but the meme did not come from nowhere. The rewrite-everything people are out there.

pdimitar · 2024-07-30T23:49:21.000000Z

> The rewrite-everything people are out there.

Meh, there are zealots in every community -- we're not even talking programming language communities only. Not even programming either. Everywhere.

No idea why people over-reacted so much to one particular 0.1% fanatics. It's a pretty normal state of affairs. Point me at your hobby group and even if it is only 20 people I can bet my balls at least 1 of them is a fanatic.

andrewflnr · 2024-07-31T02:26:59.000000Z

Overreacting to fanatics is also a normal state of affairs, so don't act surprised. :) By their nature fanatics almost always make a disproportionate amount of noise, and if you're outside the community you often can't tell the difference: don't know which if any of the loudmouths members pay attention to, etc. And even more broadly, a small number of people can cause a lot of damage.

nanolith · 2024-07-30T19:58:58.000000Z

> A huge investment. If you are going to do that then you might as well just move to Rust.

People say that, but the people who say this rarely have any practical experience using CBMC. It's very straight-forward to use. I could teach a developer to use it reliably, on practical software, in a month.

pdimitar · 2024-07-30T20:02:08.000000Z

I am not denying it, nor am I claiming that "just move to Rust" is an universal escape hatch.

What I am saying is that if it were as simple as "just learn CBMC" then maybe Microsoft and Google would have not published their studies demonstrating that 60% - 75% of all CVEs are memory safety errors like buffer under-/over-flows.

nanolith · 2024-07-30T20:07:19.000000Z

These studies aren't wrong. But, that's also because neither Microsoft nor Google make use of practical formal methods in practice. Both have research teams and pie-in-the-sky projects, not dissimilar to this DARPA project. But, when it comes down to the nitty-gritty development cycle, both companies use decades old software development practices.

commodoreboxer · 2024-07-30T16:40:10.000000Z

A lot of people are reading this as a call or demand to translate all C and C++ code to Rust, but (despite the catchy project name), I don't read the abstract in that way. There are two related but separate paragraphs.

1. C and C++ just aren't safe enough at large. Even with careful programming and good tooling, so many vulnerabilities are caused by their unsafe by default designs. Therefore, as much code as possible should be translated to or written in "safe" languages (especially ones that guarantee memory safety).

2. We are funding and calling for software to translate existing C code into Rust.

It's not a consensus to rewrite the world in Rust. It's a consensus to migrate to safe languages, which Rust is an example of, and a program that targets Rust in such migration.

akira2501 · 2024-07-30T20:21:02.000000Z

> or written in "safe" languages

So when those languages have 'unsafe' constructs what are the rules going to be around using those? Without a defining set of rules to use here you're just going to end up right back where you started.

> to migrate to safe languages, which Rust is an example of

Rust has a safe mode. It is _not_ a safe language. To do anything interesting you will require unsafe blocks. This will not get you very much.

Meanwhile you have tons of garbage collected languages that don't even let the programmer touch pointers. Why aren't those considered? The reason is performance. And because Rust programmers "care" so much about performance you're not ever going to solve the fundamental problem with that language.

Do you want performance or safety? You can't have both.

bigstrat2003 · 2024-07-30T22:31:51.000000Z

> Rust has a safe mode. It is _not_ a safe language. To do anything interesting you will require unsafe blocks. This will not get you very much.

1. There are plenty of interesting programs which don't require unsafe.

2. Even if your program does require unsafe, Rust still limits where the unsafety is. This lets you focus your scrutiny on the small section of the program which is critical for safety guarantees to hold. That is still a win.

timeon · 2024-07-30T20:30:13.000000Z

> To do anything interesting you will require unsafe blocks. This will not get you very much.

This is not true.

akira2501 · 2024-07-30T20:42:59.000000Z

> This is not true.

Burying unsafe blocks in unevaluated cargo modules does not make this true. You're just taking the original problem and sweeping it under the rug.

commodoreboxer · 2024-07-31T00:54:54.000000Z

You can do tons of stuff with purely safe Rust. The main things that you can't do are FFI, making self-referential structures, and dereferencing raw pointers.

And unsafe isn't a problem. It's a point of potential danger to be heavily audited, tested, and understood. Having the entire language unsafe by default is an obviously worse situation. This is throwing the baby out with the bathwater, like rallying against seat belts because you can still die while wearing one. An improvement is still an improvement. I don't understand why people criticizing Rust tend so heavily to let perfect be the enemy of good.

techbrovanguard · 2024-07-31T07:00:05.000000Z

> I don't understand why people criticizing Rust tend so heavily to let perfect be the enemy of good.

if you've convinced yourself that you're special and all problems with c are solved by trying harder, clearly everyone else is just lazy. with that line of logic, there's nothing to fix with c. rust is not just redundant, but also aggravating, since its popularity causes the cognitive dissonance to start creeping in.

maybe i can make mistakes? should we improve tooling somewhat? no, it's the children who are wrong.

talldayo · 2024-08-01T16:33:34.000000Z

> all problems with c are solved by trying harder, clearly everyone else is just lazy.

If you're even remotely familiar with professional C development then you should know this is unironically true. Tooling does exist to offer memory-safe features in C, they're just far more complicated than using a safe language from the offset. Nobody wants to use Valgrind when your linter can do the same job without leaving your editor.

Most of today's high-performance C code is compiled using the same IR that LLVM generates when compiling C. Unless you're a GCC pundit it doesn't make sense to reject the direction the industry is headed in.

> maybe i can make mistakes? should we improve tooling somewhat?

After a while, being allowed to make mistakes starts to pile up: https://www.zdnet.com/article/microsoft-70-percent-of-all-se...

j-krieger · 2024-08-03T21:25:33.000000Z

You can rely entirely on crates that disallow unsafe to be included.

j-krieger · 2024-08-03T21:24:57.000000Z

> It is _not_ a safe language. To do anything interesting you will require unsafe blocks.

This is largely untrue. You can use proven abstractions over 99% of cases that would require unsafe.

techbrovanguard · 2024-07-31T03:40:47.000000Z

> To do anything interesting you will require unsafe blocks

this is just flagrantly false, have you no shame?

pjmlp · 2024-08-01T08:17:36.000000Z

People still die while wearing seatbelts, helmets and motorbike protective gear, body armor, bullet proof vests, yet many more survive, than those not wearing any of those in similar situations.

jdblair · 2024-08-03T20:00:32.000000Z

I'm really surprised this can work at all in any automated way. You can't just make a line-by-line transcription of a typical c program into rust. Pointers and aliasing are ubiquitous in c programs, concepts that rust explicitly prevents. You have to rethink many typical constructs at a high level to rewrite a c program in rust, unless you wrap the whole thing in "unsafe."

morgante · 2024-08-03T21:56:20.000000Z

Line by line is infeasible, which is precisely why you need to use AI to make larger semantic inferences.

You also don't have to one-shot translate everything. One of the valuable things about the Rust compiler is it gives lots of specific information that you can feed back into an LLM to iterate.

I've been working on similar problems for my startup (grit.io) and think C -> Rust is definitely tractable in the near term. Definitely not easy but certainly solvable.

stogot · 2024-08-03T21:58:17.000000Z

What about convert to AST then ask the AI to convert to Rust. Would that work?

Someone · 2024-08-03T22:02:47.000000Z

That’s probably the rout they would take, but the C AST won’t have ownership attributes. You‘d have to discover those yourself.

ASTs also don’t have much info on threading (that’s more or less limited to “the program starts a thread with entry point foo at some time”, “Foo waits for another thread to finish”)

morgante · 2024-08-03T23:09:33.000000Z

Foundation models aren't primarily trained on ASTs, so you're typically going to have worse results than just using text unless you do extensive fine-tuning yourself.

ASTs also generally don't actually have magical information in them. They won't solve the lifetime issues for you.

Someone · 2024-08-03T21:58:29.000000Z

> Pointers and aliasing are ubiquitous in c programs

If we ignore multi-threaded programs is long term aliasing actually ubiquitous in C programs? For many programs, I would expect most of it to happen within the scope of a single function (and within it, across function calls, but there, borrowing will solve this, won’t it?)

If so I would trying to tackle that as one sub-problem (you have to start somewhere), and detecting how data gets shared between threads as another. For the latter, I expect that many programs will have some implicit ownership rule such as “thread T1 puts stuff in queue Q where thread T2 will pick it up” that can be translated as “putting it in queue transfers ownership”.

Detecting such rules may not be easy, but doesn’t look completely out of reach for me, either, and that would be good enough for a research project.

ip26 · 2024-08-03T20:51:58.000000Z

For a naive newcomer - could you go line by line, wrap the whole thing in “unsafe”, compile to an identical binary, and then slowly peel away the “unsafe” while continuing to validate equivalence?

That would at least get you to as much rust as possible, and then let engineers tackle rethinking just those concepts.

jcranmer · 2024-08-03T21:58:39.000000Z

Converting C to legal (unsafe) Rust is quite possible; there is indeed already a tool that does this (https://github.com/immunant/c2rust).

The problem you run into is that the conversion is so pedantically correct that the resulting code is useless. The result retains all of the problems that the C code has, and is so far from idiomatic Rust that it's easier to toss the code and start from scratch. Progressive lifting on unsafe Rust to safe Rust is a very difficult order, and the tool I mentioned had a tool to do that... which is now abandoned and unmaintained.

At the end of the day, the chief issue with converting to safe Rust is not just that you have to copy semantics over, but you also have to recover a lot of high-level preconditions. Turning pointers into slices is perhaps the easiest task of the lot; given the very strict mutability rules in Rust, you also have to work out when and where to insert things like Cell or Rc or Mutex or what have you, as well as building out lifetime analysis. And chances are the original code doesn't get all these rules right, which is why there are bugs in the first place.

Solving that problem is the goal of this DARPA proposal, or perhaps more accurately, determining how feasible it is to solve that problem automatically. Personally, I think the better answer is to have a semi-automated approach, where users provide as input the final Rust struct layouts (and possibly parts of the API, to fix lifetime issues), and the tool automates the drudgery of getting the same logic ported to that mapping.

Animats · 2024-08-03T22:11:49.000000Z

Right. Used c2rust once. Been there, done that. The Rust code that comes out is awful. Does the same thing as the C code, bugs and all. You don't get Rust subscript check errors, you get segfaults from unsafe Rust code. What comes out is hopeless for manual "refactoring".

The hardest part may be Rust's affine type rules. Reference use in Rust is totally different than pointers in C/C++. Object parenting relationships are hard to express in Rust.

j-krieger · 2024-08-03T21:20:02.000000Z

There are "warts" with unsafe Rust that would make this feat very difficult. Aliasing rules still apply.

rolph · 2024-08-03T21:17:35.000000Z

you need to create a transpiler philosophy.

transform CtoASM, then ASMtoRust.

what you need to avoid is incompatibilites between different high level languages with a low level intermediary so you arent stuck attempting to convert high level hardware abstraction directly to another high level hardware abstraction.

alkonaut · 2024-08-03T21:29:35.000000Z

A line-by line doesn't require much "AI" either. You could probably make a rough translation in some (mostly unsafe) Rust.

Assume the AI actually needs to figure out lifetimes and so on to be actually useful and make valid programs. Which would be impressive if it does.

alex_suzuki · 2024-08-03T20:25:19.000000Z

I wonder about this as well, especially im code bases that make heavy use of macros.

poikroequ · 2024-08-03T21:28:52.000000Z

> Those involved with the oversight of C and C++ have pushed back, arguing that proper adherence to ISO standards and diligent application of testing tools can achieve comparable results without reinventing everything in Rust.

If you stick to extremely stringent coding practices and incorporate third party static verification tools that require riddling your code with proprietary situations, then sure, you can achieve comparable results with C/C++.

Or you can just use Rust.

dtx1 · 2024-08-03T22:11:10.000000Z

It's quite hilarious to see the push back rust gets by the c/c++ community. Obviously their decades of hard work and experience to work with those languages are overriding their reasoning circuits. Who in their right mind would defend a language that has such major and obvious design flaws if a genuine alternative is there.

constantcrying · 2024-08-03T22:46:49.000000Z

Many of the most widely used languages have obvious major design flaws. (JavaScript is one obvious candidate, python is another. How did a language which has no built-in floating point type become the number one language for numerical analysis?)

The real question is what tradeoffs you are making and what you are gaining. Rust makes certain memory safety guarantees about the program at compile time, but at the same time it disallows perfectly safe constructions, which can exist in C++, as well.

brigadier132 · 2024-08-03T22:52:43.000000Z

I think DARPA is making the right decision about choosing Rust as the language for low level systems programming. For national security related matters you'd definitely want the certainty Rust brings.

The reason I personally chose Rust as my go to language for low level programming is that despite learning systems programming in college I pretty much never used it outside of school. Meaning I didn't have any of that knowledge that c and c++ programmers had built up over years of experience. So I decided that instead of having to deal with the unknown skill deficiencies in writing concurrent software and memory management I'd rather just have a compiler scream at me. I don't regret the decision.

Also, I remember writing an async TCP implementation in college with c++ using boost. Rust tooling is just so far ahead of that.

wslh · 2024-08-03T23:04:02.000000Z

> I think DARPA is making the right decision about choosing Rust as the language for low level systems programming. For national security related matters you'd definitely want the certainty Rust brings.

I see this differently: DARPA bets on different baskets in parallel. This is just one basket, if they are wrong it doesn't matter because there are other bets to reduce the general risk.

dtx1 · 2024-08-03T22:55:27.000000Z

> JavaScript is one obvious candidate

I don't see anyone defending JavaScript. In fact a whole lot of people are using typescript now because JavaScript is just so bad.

As for python, that's a good point. I guess it's just because it's easy to use and all the numerical stuff is done with c-bindings anyway?

But the C++ Situation is genuinely different. There's a reason governments are now calling upon developers to just let it die already[0]. That design flaw is so bad it's causing genuine harm.

[0] https://www.cisa.gov/news-events/news/urgent-need-memory-saf...

constantcrying · 2024-08-03T23:10:15.000000Z

>I guess it's just because it's easy to use and all the numerical stuff is done with c-bindings anyway?

No, it's horrible, because now you have both python types and numpy types, which don't really interact well with one another. If you are using a language made for numerical analysis (e.g. Julia), a lot of headaches disappear instantly.

Python is 100% just a case of a language being used because it is being used. It has, by itself, few merits to many of the tasks it is actually being for.

>design flaw

It is a tradeoff though. Rust is paying that tradeoff by being very restrictive about certain patterns and being in general quite complex to learn.

ksp-atlas · 2024-08-03T23:33:49.000000Z

Honestly, it feels too limiting, I know about unsafe and stuff but there's just something about managing memory manually, Zig is a good middle ground imo