> C-to-Rust transpiler is a pretty doable project. It's been done, but what come...

jcranmer · 2024-03-04T19:49:52 1709581792

> It's been done, but what comes out is terrible Rust. Everything is unsafe types with C semantics.

The idea behind the current c2rust tool is that you'd do a one-shot conversion to Rust and then gradually do refactoring passes over the barely-Rust code to convert it to correct C code. The focus is on preserving semantics of C over writing anything close to idiomatic (cue a + b being translated to a.wrapping_add(b) all the time, e.g.). Which is an approach, but I'm not sure it ends up providing any value over "set your system to compile both C and Rust into a final image and then slowly move stuff from the C to the Rust side as appropriate" in practice.

> Usually, C code does have array length info; it's just not in a form that the language ties to the array itself.

This is actually why C23 made VLA support semi-mandatory: it enables you to describe a function signature as

  int write_to_device(size_t len, char buf[len])

and C23 compilers are required to support that, even in absence of full VLA support! The intent of making this support mandatory was to be able to use that as a basis for adding better bounds-checking support to the language and compilers. (Although, as you noticed, there is an order-of-declarations issue compared to the typical idiomatic expression of such APIs in C, and the committee has yet to find a solution to that).

Animats · 2024-03-04T20:03:42 1709582622

> The idea behind the current c2rust tool is that you'd do a one-shot conversion to Rust and then gradually do refactoring passes over the barely-Rust code to convert it to correct C code.

I've seen what comes out of the transpiler. Nobody should touch that code by hand. It's awful Rust, and uglier than the original C. Modifying that by hand is like modifying compiler-generated machine code.

> This is actually why C23 made VLA support semi-mandatory.

C23 doesn't actually use that info. You can't get the size of buf from buf. I proposed something like that 12 years ago.[1] But I wanted to add enough features to check it.

[1] http://animats.com/papers/languages/safearraysforc43.pdf

jcranmer · 2024-03-04T20:21:52 1709583712

> I've seen what comes out of the transpiler. Nobody should touch that code by hand. It's awful Rust, and uglier than the original C.

I can't disagree here. I think the original idea was to rely on automated refactoring tools to try to make the generated Rust somewhat more palatable, but I never was able to get that working.

> C23 doesn't actually use that info.

True; the intent is to require it so that it can be leveraged by future extensions. The C committee tends to move glacially.

Animats · 2024-03-04T21:10:49 1709586649

The real problem is not translating code. It's translating data types. If you can determine that a "char *" in C can be a Vec in Rust, you're most of the way there. It's no longer ambiguous what to do with the accesses.

This is where I think LLMs could help. Ask an LLM "In this code, could variable "buf" be safely represented as a Rust "Vec", and if so, what is its length?. LLMs don't really know the languages, but they have access to many samples, which is probably good enough to get a correct guess most of the time. That's enough to provide annotation hints to a dumb translator. The problem here is translating C idioms to Rust idioms, which is an LLM kind of problem.

galangalalgol · 2024-03-04T22:27:15 1709591235

They have made some improvements here recently. There is a lot less unsafe generated. The rest is more idiomatic too. The cost is that it will be throwing panics everywhere until you fix the faulty assumptions it asserted. I like the new way better.