Hacker News new | past | comments | ask | show | jobs | submit login
C2rust: Transpile C to Rust (c2rust.com)
140 points by juancampa on Feb 1, 2022 | hide | past | favorite | 87 comments



We kinda gave up on translating C into D, as many C constructs were unduly awkward to represent in D. What's working much better is to incorporate a C compiler directly into the D compiler. No need to translate, and the C semantics can be adhered to exactly. (The D compiler is extended to understand the C semantics coming from the C code, although that isn't part of the D specification.)


While your approach to this in D is super-cool, and I love seeing more projects embracing it (e.g. `zig cc` for cross-compiling C code), I think it solves a different problem to what they're tackling here.

FTA:

> We are developing several tools that help transform the initial Rust sources into idiomatic Rust.

It sounds like this translator they've built may be meant primarily as the first part of a pipeline of tools to help you eventually translate existing C code into idiomatic Rust, rather than just letting you compile C code using the Rust compiler.

The only use case I can imagine personally having for this tool _on its own_ would be in avoiding the hell of trying to target musl with a project that mixes Rust and C.


The trouble with translating existing C code, for example, a .h file, is the .h file vendor changes it. Then you've got to figure out what changed, and fold in the differences to the translated file.

It sounds easy enough in theory, but in practice, it's a nightmare. For example, you have no clue if the enum values changed, or the api's acquired an extra argument. Being C, the linker won't give you a clue, either.

This problem simply vanishes when your XLang compiler can actually compile C files.


I don't think the idea here is to use it on third party dependencies, I think the idea is "we have this C code base that we maintain, and we really wish it was a rust codebase instead".


It's a nice idea, but in my experience very few people want to do that, even with a tool to help. What they do want is to be able to seamlessly interact with their existing code base the way it is.

Examples: C and C++, Java and Kotlin. The latter advertises: "Write concise and expressive code while maintaining full compatibility and interoperability with Java" and has been very successful with that.

https://kotlinlang.org/lp/server-side/


Yes, interop is a much more common use case, but this just isn't the tool for that.

Rust-bindgen is. Which "just" tries to translate header files, since rustc can link to C object files just fine. And for what it's worth, I'm pretty sure most of your complaints do apply to that tool.

https://github.com/rust-lang/rust-bindgen/


I agree that in general most want to integrate new languages with their existing code bases (rust-bindgen automatically generates C bindings, ABI tests, etc. using libclang to access C and C++ from Rust, and for each language there are tools to generate bindings and ABI tests for Rust code, e.g., the cpp crate generates C++ wrappers around Rust libraries).

The consultancy company developing c2rust specifically helps clients translate their apps to Rust. IIUC these clients want to move from C to a memory and thread safe language without loosing performance.

c2rust is the first step in that process. It mechanically translates C into "C-looking unsafe Rust".

The engineers then go and start migrating from unsafe Rust to safe Rust incrementally.

This is a long process, c2rust speeds up a small fraction of it, but most of the engineers time is spent into translating unsafe Rust into safe Rust, and then refactoring safe Rust into idiomatic Rust.


> It's a nice idea, but in my experience very few people want to do that

Maybe it's relatively few, but they exist, and this is targeting serving their needs. There are other tools serving C/Rust interop use cases, and at a certain point the “this isn't the most common use case” dismissal is just tedious.

HN shouldn't be a place where things targeting real but narrow technical niches are dismissed because the niche is narrow or there is a much bigger superficially related one around.


It would be nice to have a language independent header format intended for sharing readable information about .so files. .h files are just not good enough as they can embed arbitrary C and thus forces whatever is going to use it to also be a C compiler.


> We are developing several tools that help transform the initial Rust sources into idiomatic Rust.

(2019) C2Rust hasn't changed much in a few years.

The Rust you get out of C2Rust is a representation of the pointer semantics of C in unsafe Rust using a library of functions that emulate C operations. This is only useful if you desperately need to compile C with a Rust compiler.

Intelligent conversion of C to Rust is hard, and less useful as time goes on and more lower-level crates become available for Rust.


Some examples of awkwardness:

1. C tag name space

2. enum values are placed into the enclosing scope

3. structs defined within a struct definition don't go into that definition's scope

4. `if (a = b)` is not allowed in D

5. unprototyped C function declarations

6. C bit fields

7. C struct declarations without a definition

8. oddities of _Alignas and _Alignof behavior

9. _Generic

10. all the weird C compiler extensions

11. etc.

All these things wind up making translation about 95% feasible, and the rest is a mess.


Hi Walter!

Was compiling into C a consideration? Seems like this side steps some of the compatibility issues (not to mention getting a lot of optimization "for free"), albeit while inheriting any compilation and ABI limitations.

Thanks! P.s. We met at GoingNative2013 - you were very kind with my many questions back then as well!


I've considered compiling into C code, but after seeing what cfront (compiles from C++ to C) had to go through, I figured that would be a ton more work than just generating code directly!

Glad I was able to help you at GN2013!


I feel strongly that this is underrated.

Basically all compilers should be multi-lingual. Especially, rustc given its abilities with the borrow checker. Ideally (within reason for maintenance reasons), even parsers as “plugins”. Still building the same HIR (or AST if applicable) within rustc, but with different parsers.

I’d love to see lot of experiments with Rust syntax in the vein of how CoffeeScript inspired the future of JavaScript.

But still getting all the benefits of incremental compilation, cargo, etc.


Please clarify what do you mean by incorporating a C compiler. Does this mean you're abandoning the "custom C frontend inside the D compiler" effort in favor of something like clang? To me, the other possibility - having *yet another* C compiler to look out for - is just terrifying.


> Does this mean you're abandoning the "custom C frontend inside the D compiler" effort in favor of something like clang?

No. It means the custom C frontend inside the D compiler, with the semantic routines in the D compiler tweaked where necessary to support C semantics.


Won't this require years of busywork? The various intrinsics, function attributes, C compiler args compatibility, ABI profiles for various platforms/systems (including the Windows GCC/VC ABI idiocy?)

upd: Also, I'm not sure that implementing -funsigned-char in 2022 will be all that great for morale!


It required maintenance, sure. But most of those extensions can be safely ignored.


We used to call fortran transpired into ada: adatran.

Adatran was terrible. Hard to impossible to edit if something needed fixing. Used none of the features that made Ada a decent language. The very limited code they did this to did work however. This was a huge project and included C and ada code.

What do we call C code transpiled to Rust. Crust?


You’re misunderstanding the c2rust project though. Just like Walter above is.

The goal of c2rust is not “regularly compile C to Rust and keep compiling that garbage”. The goal of c2rust is “migrate a C codebase to Rust”. That is the tagline of the Git repository, as well as the title of the RustConf 2018 presentation linked in TFA.

The idea is that you convert the entire thing to working (but unsafe and C-equivalent) Rust, then have your entire build pipeline and tooling ready to chip at it and perform the conversion to safe rust, as an alternative to performing the conversion piecemeal and having a C/Rust boundary which you have to keep moving about, and duplicated definitions for the interop.


Aaah that makes more sense

Because the translation, while it works, is awful

Now, translating everything and then going function by function seems better


> What do we call C code transpiled to Rust. Crust?

Fun fact: in ancient Rust, what we now call extern "C" functions were called "crust" functions, pronounced like the word.


Rust devs are commonly called Rustaceans, so users of this transpiler could circle back to the term Crustaceans.


Yep. https://github.com/NishanthSpShetty/crust

CRUST

C/C++ to Rust Transpiler


Many years ago, I used the f2c fortran to c translator. It worked, but produced godawful code. I removed about 75% of the c to get usable code.


Think of the transpiled code as a compiled binary. Have your build system be able to recognize both languages. So developers work in C and Rust, but they don't actually work in transpiled rust. I guess this is similar to using the FFI.


Not mentioned on this page is the associated refactoring tool (https://c2rust.com/manual/c2rust-refactor/index.html) which IME is too complicated to learn in 15 minutes, but would likely be a very useful investment when translating a large code base.


https://github.com/immunant/c2rust:

> We rely on Emscripten's Relooper algorithm to translate arbitrary C control flows.

Article on why Relooper isn't good enough and the superior Stackifier algorithm, which they probably should be using instead:

https://medium.com/leaningtech/solving-the-structured-contro...


It seems to "optimize out" volatile reads if the result is not used.


It looks to me like they're turned into the appropriate Rust intrinsic.

Do you have an example where the read vanishes?

Notice that it's very easy for C programmers to write code they think is performing a volatile read, but isn't, whereas obviously the intrinsics reduce the scope for this error in the Rust (it is also, though that's not relevant here, easy to write C code that depends on imaginary semantics of volatile access and so the code doesn't actually always work, or it works but not for the reason the programmer expected)


If you believe there is no volatile read in this code, I'd like to understand why.

    void func(volatile unsigned *a) { *a; }


Nope, I agree that's a volatile read, and indeed the web page doesn't emit the appropriate intrinsic, nor indeed do small modifications that result in a read of the pointer, cause the intrinsic to be used as I'd anticipated.

Definitely something to be wary of. I assure you the c2rust source code does know about volatile access and Rust's intrinsics (I was looking at that code for other reasons), so if you work with this stuff I'd encourage talking to the people who wrote it to find out what the situation is.


If true, that would be a bug. You dont want to remove register reads from devices for example if the device requires it.


This looks more useful than some other C to Rust tools, as this will work on code directly instead of the finally linked product, allowing you to use it to port larger library collections or SDKs ahead of time for use in a subsequent Rust project. This can also be used on any old random snippet of valid C, (known limitations aside)

This will be fantastic for helping encourage Rust on microcontrollers. The microcontroller world is very C heavy, so libraries for daughter board and other chips, and other example code, is often only published in C, despite the growing ecosystem of microcontroller chips and boards that have good support for Rust, you end up constantly pushed towards C due to the ecosystem basically only using C. Having a good tool to take that C code and give at least a mechanical, non-idiomatic Rust port is just fantastic and I'm looking forward to giving this a shot with a particular add on board SDK that I wanted to use on a Rust supported microcontroller.


Have their been any large projects that used this as a starting point for a port? I would be curious how it works out in practice.


I've used my homebrew c2rust converter[1] to translate lodepng and pngquant[2] libraries to Rust. My two key takeaways are:

Good test coverage is essential for this. Count how many bugs you've written when you were writing this code for the first time. Even if your bug-rate is 99% better during the rewrite, that may still be a significant number. Fine-grained tests aren't necessary, but end-to-end tests that touch every feature are crucial to catch regressions.

Once the rough conversion is done, it is necessary to refactor the code to take advantage of Rust's idioms to get safety benefits. Just 1:1 conversion is underwhelming, and it feels like replacing gcc with rustc. I did not realize just how recklessly pointer-heavy C tends to be until I saw it through Rusty lens.

The lodepng conversion was "meh". It's a good C code, but its structure was very different from what you'd do in Rust (e.g. Rust prefers generics over pointer casts, iterators over indexing or pointer arithmetic, has interfaces for steaming processing that C lacks). I don't know how far I can refactor the code to Rusty idioms and still call it lodepng :)

OTOH the pngquant codebase was mine, and I'm happy with the results. When converting I took advantage of Rust idioms, and the Rust version is nicer to maintain and even a bit faster.

[1]: https://lib.rs/crates/citrus [2]: https://pngquant.org/rust.html


Thanks for citrus, I've started experimenting on it after I've read your pngquant blog post[1] the other day, and it's exactly what I was looking for: c2rust does semantically exact conversion, which isn't what I need: I needed a tool to automate the boring syntax conversion, and when doing the idiomatic rustification by hand I can take care about the different semantics of the two languages.

Off course there's a different trade-off when bugs are involved: c2rust shouldn't add any bugs during the conversion, while citrus will.

[1]: https://pngquant.org/rust.html


I just started using c2rust on openjpeg [0] (jpeg 2000 encoder/decoder) today and already have it working as a drop in replacement for the C libopenjp2.so on Linux. Still has a lot of unsafe code, but it does work. Which will be a big help with testing during refactoring to idiomatic safe Rust.

c2rust also has a refactor command that helps with refactoring the generated Rust code.

[0] https://github.com/Neopallium/openjpeg/tree/c2rust


I converted https://github.com/FirefoxGraphics/qcms and then refactored it to mostly safe Rust. I ran into a number of issues https://github.com/immunant/c2rust/issues?q=is%3Aissue+autho... but it generally worked ok.

I found refactoring the resulting Rust code somewhat error prone and didn't have great success with the automated tools. I'd recommend having a good test suite and suggest adjusting the C before the conversion to avoid using C features that don't translate well like the C preprocessor.


What I am most curious about is the final paragraph of https://github.com/immunant/c2rust#acknowledgements-and-lice...

There was a while there where they were trying to test this out on the cvs codebase, IIRC. It's a good candidate: upstream doesn't exactly move quickly, but is very much a real-world codebase, still in use.


Interesting, this paper also mentions the same DARPA contract: https://apps.dtic.mil/sti/pdfs/AD1084807.pdf

It shows up in this document of contracts: https://www.esd.whs.mil/Portals/54/Documents/FOID/Reading%20...

  * Contract Number: FA875015C0124
  * Performer Name: GALOIS, INC.
  * Agent Name: AFRL, Information Directorate MR
  * Program Name: Cyber Fault-tolerant Attack Recovery (CFAR)
  * Office: I2O
  * Fiscal Year: 2015
  * Obligated ($): 2,099,878


Immutant did convert the quake 3 source to rust and hot it to run... They have a blogpost about it kicking around....


we used c2rust in 2019 for the Delta Chat core library; we added some more details about that to the blog post that time [1]

[1] https://delta.chat/en/2019-05-08-xyiv#the-coming-delta-chat-...


But what about borrow checking? Will the transpiled code be runnable.


The transpiled code will use pointers, it will not transform C-isms that could map to Rust-isms (like borrowing instead of pointers, or iterators instead of pointer arithmetic). This is meant to be only the first step in a Rust refactor.

https://galois.com/blog/2018/08/c2rust/

https://github.com/immunant/c2rust/wiki/Known-Limitations-of...


it doesn't convert pointers to references, it just uses pointers, so the borrow checker isn't invoked.


For the non-Rust programmers here, does that mean you don't get Rust's memory safety with this transpiler?


That's correct. There's no way to translate arbitrary C to memory safe code.


If you could, you wouldn't need Rust in the first place.


That's a good point. I feel stupid for asking now :D


You can compile the C code to WASM and then compile the WASM to safe Rust (I wrote a prototype for this and it works). Though as with all WASM, it protects the environment from the C code, but the C code can still corrupt its own WASM memory. iirc Firefox is even starting to use this approach to sandbox some of its components (though they compile the WASM back to C).


There are non-standard extensions to C syntax that provide some amount of safety. It might be interesting to implement support for those within c2rust.


But what if the arbitrary C code was memory-safe to begin with?


Then I assume you'd get safe, "unsafe" Rust code.


Fun fact: in order to be valid, Rust code in an `unsafe` block must uphold all of Rust's invariants. So (correct) Unsafe Rust is the only code written in Rust that's explicitly safe.


Idiomatically the unsafe block should have a comment explaining why this is actually fine, and if it's an unsafe public API it should have a doc-comment explaining how it can be used safely by other unsafe code.

If you're using unsafe functions to flag something other than Rust's safety considerations (e.g. Rust's core concept doesn't care that this flag bit disables the interrupt controller, and thus if you get it wrong now the product doesn't work, but you probably do so let's mark that "unsafe") the same likely applies for that too.

One of the things I like in Jon Gjengset's live coding Youtube videos is that he takes the time to write such comments, which means both the final code and the live session explain why he thinks this is safe, and once in a while there's a realisation while doing this - aha, this is the wrong way to do it, we need to change other things.


But can we get 80% there and then have a human help out?


> But can we get 80% there and then have ma human help out?

The problem is that for complex enough projects, the architectural redesign is considerably more demanding than a "remaining 20%". I can imagine it also being quite irritanting, due to shortcuts one may take in C taking advantage of implicit application logic (independently of them being warranted or not), that can't be directly translated due to the Rust strictness.


I wonder if the brainiacs at OpenAI could train a neural net on human-written C -> Rust translations.


From another non-rust programmer: all bugs you could write in C will still be there in the transpiled code and the parts that happen to not be bugs will still look as indistinguishable from bugs as they did before. Likely even more indistinguishable than before, because chances are you are better at reading idiomatic C than at reading the anti-idiomatic rust created by the transpiler. But there will likely be some low-hanging fruits of code that can subsequently be changed into idiomatic rust that simply cannot encode certain classes of bugs (without becoming as un-idiomatic as the transpiler output).


The idea here is you get the c code transpiled verbatim into unsafe rust code.

It is your job to then make it safe rust code :).

Edit: spelling


Correct, although it looks like converting some parts to safe code may be a future goal


What they've stated previously is that they plan to go with the approach of "compile C to unsafe Rust" and "compile unsafe Rust to safe Rust" as two separate things that could be chained together. I don't remember if it's meant to be literally another tool or just the general approach, but seems very interesting and sensible to me!


Shame it's not called `crust`. Very cool though!


Back in the day there was also Corrode: https://github.com/jameysharp/corrode

Discussed here: https://news.ycombinator.com/item?id=12056230


Why is this a transpiler and not a compiler?


a transpiler is a compiler.


So why is it a transpiler and not only a compiler?


Because it compiles to a programming language of same or higher level of abstraction.


Are the techniques used different in such a case? Shouldn't it just be the same?

Basically F(C) = R such that eval_C(C) = eval_R(R) where = means something like "discernible effects".

Wouldn't an optimization pass be a transpiler then?


It’s just a slightly more descriptive term that gives you the information that it’s high-to-high.

It’s like saying cheese burger instead of burger - it’s a little more info. It’s not saying that it’s not a burger as well, or that the techniques involved aren’t broadly the same.


A bunch of things a compiler "going downwards" needs to do don't apply, e.g. you don't need to worry about registers and things like that. And there's potentially more complex concepts to map to instead of just breaking source-language concepts down.

Haven't seen that terminology applied to individual passes.


Most compilers don't have to worry about registers nowadays, since they compile to LLVM or C.


It was just an example of a task. Also, "compile to LLVM" is very much "LLVM is part of the compiler, and thus worries about registers". Compile to C is (depending on source language and exactly which definitions you use) potentially also a transpiler.


Eh, I don't really see what this term gives us... So C2Rust is a compiler too, since Rust is compiled to LLVM?


> So C2Rust is a compiler too

All transpilers are compilers.

C2Rust is a transpiler and a compiler.

Not all compilers are transpilers.

Graal for example is a compiler but not a transpiler.

Just like all cheese burgers are burgers but not all burgers are cheese burgers. Nobody questions the term ‘cheese burger’ because we already have the word ‘burger’.


Yeah, because a cheese burger has cheese. What does a transpiler have? What is the common factor between all transpilers? Who decides on this layering of which languages are higher level or lower level? C2Rust doesn't call itself a transpiler, it calls itself a translator.

This term seems completely useless to me.


> What does a transpiler have? What is the common factor between all transpilers?

High to high translation. Limited lowering.

> Who decides on this layering of which languages are higher level or lower level?

You know it when you see it.

Who decides what makes a book a 'horror book'? There's no authority on that either. You know it when you see it.

> C2Rust doesn't call itself a transpiler, it calls itself a translator.

Yes a translator - a translating compiler - a transpiler.

> This term seems completely useless to me.

Not sure why the term seems to wind people up so much - it just add a little extra info.


There are all transglosser anyway. :P


Hmm. Transpile C to "Rust"

edit: I see it is part of a broader toolchain, which makes a bit more sense


I'm curious how it manages to work with the borrow checker.


The end result is unsafe Rust that uses raw pointers. This kind of tool can be used e.g. as a starting point for rewriting codebase piece by piece, retaining the original functionality as a baseline.


Promising!

Has anybody incorporated an AI into c2rust to do the heavy lifting and learning from the compiler errors to self fix the transpiled code?


I doubt an AI could be trained to do that. Anything that cannot be transpiled mechanically probably requires a human decision which will involve analysis and tradeoffs between objectives that the AI could not know about.


With enough work, anything a human does can be automated. Unless you believe human brain does something non-computable (like e.g. Godel), which makes you a dualist which comes with its own philosophical baggage (how do different types of ontologies interact?).


Sure, the machine could "make a call" and pick a way of doing things, but that machine wouldn't be the one supporting the resulting code for years afterward. Past a certain point, some technical decisions become value judgements because of their impact on humans.


One need not delve into philosophy to observe that, in practice, humans can do many things that we have yet to make machines do successfully. I admit that this is like solving the halting problem by saying "yes" (https://xkcd.com/1266/), for better and worse.

(But seriously, I think it's fair to say "as of 2022, no, AI can't do that")




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: