Hacker News new | past | comments | ask | show | jobs | submit login
Adding a Rust compiler front end to GCC [video] (youtube.com)
167 points by blopeur on June 17, 2022 | hide | past | favorite | 41 comments



Because there is often some confusion, note that there are two different projects in this space with different approaches.

The project discussed in the video above is a new front end to GCC (written in C++, as is typical for GCC frontends) and is developed here: https://github.com/Rust-GCC/gccrs

The other project involves plugging GCC's backend into the existing rustc frontend, and is developed here: https://github.com/rust-lang/rustc_codegen_gcc

Both projects intend to eventually allow Rust code to make use of GCC. But they will likely appeal to different sets of users: the former project appeals to people invested in the GCC ecosystem who want to use Rust without installing a whole separate toolchain, and the latter to people invested in the Rust ecosystem who want to benefit from GCC's broader target support.


And then there's mrustc https://github.com/thepowersgang/mrustc , a Rust compiler written in C++, using LLVM.


> a Rust compiler written in C++, using LLVM.

FTR mrustc does not use LLVM, it compiles to C, which is then compiled using a C11 compatible compiler. So it can "make use of LLVM" through clang for that last step, but that's not usually what people mean when they say that some project uses LLVM.


It's important to note that mrustc is not intended for everyday use and instead as a bootstrap compiler for rustc (at least originally).

We would like to develop an alternative that is suitable as a daily rust compiler and that integrates into the existing Rust ecosystem (and makes use of it!)


That's claimed for two reasons - bootstrapping is the first milestone of a working and useful rust compiler. In this case; if it couldn't be used for anything else, at least it could do that.

It's also claimed that "it will only be that" to seem more non-threatening to the main Rustc project. If it was “easy” to do, mrustc would of course support all of the Rust language. Maybe it will get there if there is enough work and interest.

There was this vague concern about splitting the ecosystem. The concern is understandable - to an ecosystem that has been "in control" by a central implementation for a decade. I would file it under growing pains. Rust can't, when it grows up, always be a single-implementation language.


You're saying the project's maintainer is deliberately misrepresenting mrustc's goals in order to prevent malicious interference by the Rust community? Both of those seem very unlikely to me and I can't see anything in your message to provide support for those claims. I see no reason not to just take things at face value here.


In fact the project itself doesn't claim it will only be that. But in the community, this idea is still spread around.


Rather than being afraid of interference, the reason that the mrustc author is so careful in their wording is that nobody has ever bothered to define what it means to be a "compatible" implementation of Rust, which is a semantic and legal hurdle that the implementation in the OP will have to clear. In a more concrete sense, mrustc can't currently be a compatible implementation of Rust, because (by dint of lacking a borrow Checker) it accepts more programs than rustc does (arguably it's fine to accept strictly fewer programs than rustc does and still be called a compatible implementation).


Notably, mrustc is a Rust compiler in the sense that it is able to compile valid Rust code. It does not provide a guarantee that it will reject invalid input. The user is on the hook to work out themselves through some other mechanism that their code is valid before calling mrustc.

This is fine for a project with the goal of providing an alternative bootstrapping path for rustc. The rustc codebase is known to be valid Rust, more or less by definition, since rustc is self-hosting and Rust is defined by rustc’s implementation rather than by a specification.


The mrustc page says:

> This project is a "simple" rust compiler written in C++ that is able to bootstrap a "recent" rustc, but may eventually become a full separate re-implementation.

So it's a compatible statement to say that it's currently a bootstrap compiler.


> There was this vague concern about splitting the ecosystem. The concern is understandable - to an ecosystem that has been "in control" by a central implementation for a decade. I would file it under growing pains. Rust can't, when it grows up, always be a single-implementation language.

This is why formal specifications matter.

The Rust community say languages like C and C++ didn't have specifications for a while. That's true, but it's a flimsy argument because the development velocity of both C and C++ prior to formalization were a fraction of Rust's current velocity. C and C++ also had fewer features pre-specification than Rust has pre-specification. Rust is also continuing to add features.

Without a specification how do we know which Rust implementation is correct and which has bugs? It's easy to point to the reference implementation in 2022, but that could change in five or ten years. What happens if drama from within the reference project causes a fork? People like to imagine forks as easy to differentiate, but it's never that simple. It's almost always very nuanced, gray, and muddy with very good arguments on both sides. Which implementation is the correct Rust in that case? None of this is clear or obvious.


C++ doesn't have a formal specification either. What C++ has is a document that says in informal prose how the compiler is supposed to behave. Rust in fact has such a document too: the reference manual [1]. It even uses spec language like EBNF to describe grammar productions.

You can quibble about whether Rust's reference manual is more or less detailed and useful than that of C++. I'd readily acknowledge that the C++ specification has more detail than the Rust reference, right now. But this is a difference of degree, not of kind.

Honestly, I've long been in favor of just renaming the Rust "reference manual" to the Rust "specification" and calling it a day, so we can stop arguing about whether Rust has a spec and do the more useful work of improving the document that we have. Other languages like Go call their equivalent document the "spec", even though there's no real difference in detail between the Go "spec" and the Rust "reference manual", and nobody criticizes Go for that naming decision.

[1]: https://doc.rust-lang.org/stable/reference/


Except there are companies whose main business is to certify Ada, C and C++ compilers do behave against that prose specification.


> Without a specification how do we know which Rust implementation is correct and which has bugs?

The gcc-rs project answers this question explicitly in their FAQ, which I've linked downthread:

> If gccrs interprets a program differently from rustc, this is considered a bug.

They also go on to say:

> Once Rust-GCC can compile and verify all Rust programs, this can also help figure out any inconsistencies in the specification of features in the language. This should help to get features right in both compilers before they are stabilized.

If you're looking for a specification, additional implementations help that effort, not hurt it.


> This is why formal specifications matter.

note that "formal specifications" generally means "specification written in a formal language", very often in Hoare logic.

Here's a paper that presents one possible way to formalize the building blocks for control flow (if and goto basically) in a very, very, very, very simple language: https://www.cse.psu.edu/~gxt29/papers/controllogic.pdf ; the core is Fig. 3.

Let's be honest: maybe 0.00001% of anyone programming can understand this properly without spending 100 hours on learning the associated formal baggage. Doing something like this at the scale of "real" languages is unrealistic and useless: how many contributors GCC or LLVM would have if the requirement was to be able to parse Hoare logic fluently to transpose e.g. the C / C++ formal spec into code ?


> This is why formal specifications matter.

The most generous response I could give to that is "Maybe".

You mentioned C and C++ several times, but for C++ what actually happened is that they just shipped a half dozen distinct languages with the name C++ in different years. C++ 98 and C++ 20 are similar languages, but only in the same way that the 1998 Ford Fiesta and the 2020 Ford Fiesta are similar cars. They're occupying the same niche, some of the fittings are familiar, others are not. Many of the parts are different.

Rust has no plans to do that. If you have some code that went on the shelf in 2015 for Rust 1.0 and blow dust off it, it compiles with a brand new Rust compiler today in 2022, and works just fine along side brand new code written today. Actually almost all of it could still be pasted in to new code, although some of it would look a bit unnecessary and clunky to a new Rust programmar, "Grandad -", for example they might ask, "Why are you specifying the type here when it would obviously be inferred correctly anyway?". Well, in 2015 that type wouldn't have been inferred.

> Without a specification how do we know which Rust implementation is correct and which has bugs?

Reading exercise for you: Look up "Pointer Provenance" and read about the problem. Then, read whatever version of C++ "formal specification" you think you're relying on. Huh, it doesn't mention provenance anywhere in this document.

You may need to go back and re-read the stuff you read at this point. This is a difficult problem, and the compiler must care about it deeply to produce reasonable machine code for even fairly simple programs. But the specification doesn't mention it. What does that mean?

I'll save you some time: The compilers do not implement the standard, and they haven't for decades. What they implement resembles the specification but not very closely and never where it conflicts with their duty to generate machine code you'd actually be willing to run. Some C++ programmers are very angry about that, but WG21 shows no sign of doing anything about it decades later. C++ 23 still won't fix the provenance problem, it will once again be kicked into the long grass.

Even aside from provenance and similar issues, C++ is riddled with Soundness bugs where your program is meaningless and basically the compiler, being pragmatic, will do something but it's unspecified what. This type of problem relies on a get out in the C++ Standards Document triggered by the phrase "Ill-formed, no diagnostic required" meaning what you wrote isn't a C++ program and none of the rules in this standard apply but your conforming C++ compiler may not even warn you about this, it might compile your code anyway, even though what (if anything) it does is entirely arbitrary.


There are some aspects of 'systems' programming languages that don't have a good history of being properly documented. Pointer provenance, as you mention, is one of them. The semantics of concurrent access to memory is another.

So it's not too much of a mark against Rust that it doesn't have proper documentation for those aspects either.

But there is much, much more to a large programming language like Rust than those things, and Rust is missing documentation (of the sort that intends to be complete and correct) in many important areas that don't have that sort of excuse.


> So it's not too much of a mark against Rust that it doesn't have proper documentation for those aspects either.

There is obviously the benefit that this generally only affects unsafe code (as opposed to C/C++ where the scope is less clear), but Rust just passes the buck to LLVM and inherits the partially baked solutions being used in practice for C++. And if Rust ever changes this, it will involve either splitting the language into distinct versions or breaking strict backwards compatibility.

> But there is much, much more to a large programming language like Rust than those things, and Rust is missing documentation (of the sort that intends to be complete and correct) in many important areas that don't have that sort of excuse.

As somebody who has contributed to both rustc and LLVM/Clang (and would probably rather program in Rust than C++ at this point), I think it would be much easier to implement C++ from scratch than Rust for this reason. The C++ spec is incomplete and some portions are internally inconsistent, but at least there's a real spec. When you hit one of these problems you can just look at what other implementations do in practice. Rust doesn't even have anything resembling a spec.


> When you hit one of these problems you can just look at what other implementations do in practice. Rust doesn't even have anything resembling a spec.

But it does have a reference implementation. The first quoted sentence is what alternative implementations are already doing: looking at what `rustc` does and ask for clarification upstream when necessary. I can envision cases where `rustc`'s behavior isn't what was intended being uncovered when trying to reach parity, and changing `rustc` to conform to an explicit intent, however unlikely this might be in practice.



but doesn't rustc already use llvm as its backend?


It does, but the rustc frontend is capable of supporting several backends, and different backends offer different features (e.g. compilation speed, code quality, platform support).


Is it just me or would other people vastly prefer a blog post instead of a video?

I mean there is 0 chance that I’d watch this in the office. This is basically NSFW. But I like gcc and would probably take that minute to scan the article


This is a recording of a talk Philip and I gave during the Live Embedded Event. We didn't really plan on it being posted here :)

If you are interested in our work, we regularly publish reports (weekly and monthly which would be a better starting point and contain more fun info) on our GitHub organization and on thephilbert.io

Additionally, we wrote a blogpost on the compiler which should get released soon!

I wouldn't watch that video in my office either :D


While I personally prefer articles over videos, I wouldn't expect a talk to be transcribed entirely into a blog, that's a few hours of work there.

Every job and every office is different, but thankfully in my experience it's far from standard to forbid any kind of headphone use during work. I don't know how anyone can get any focus if they weren't allowed to cut themselves off from their surroundings occasionally.

I think labeling all audio NSFW goes too far. Just because it's not safe for your work doesn't mean it's not safe for work in general.


Definitely prefer text to video for content like this, but I don't get the nsfw part. Seems perfectly acceptable to me.


> I don't get the nsfw part

I interpreted that to mean that in the poster's workplace, they can't just put headphones on or play sound through their PC's speakers, but they could read through a blog post in the downtime between tasks.

I've worked in environments like that, anyway, and think they're pretty common.


What environment doesn't allow headphones? (I guess if it's an availability indicator?)


Usually one where you're expected to answer the phone.


>I mean there is 0 chance that I’d watch this in the office. This is basically NSFW

What do you mean?


At some companies, some people equate watching video with slacking off / not working.

And even if that's not the case, unless you have headphones, in a multi-person office, playing a video with sound can be perceived as kind of rude, especially if other people are talking on the phone.


Sometimes it is done as part of a talk and hence it is natural to be a video. But that being said, I normally skip over videos when looking for information on a technical topic. I find, for me at least, my learning to time ratio is far better with a text document than a video.

Not sure why you are being down voted, it seems to be an interesting conversation of video vs written material for highly technical content.


It depends. An article is faster to read and has some other benefits, but sometimes I just want to relax while listening a talk without focusing 100%.


This talk is about https://rust-gcc.github.io


I'm curious if anyone has any feedback on polonius ? Is it still active ? Is it still supposed to be the next-generation borrow checker ?


As I understand it, Polonius is still intended to be the next version of the borrow checker.

The repostory is here: https://github.com/rust-lang/polonius

There doesn't seem to have been any meaningful development for well over a year.

There is some very recent discussion on Zulip which makes it look like there are at least two or three people interested in actively working on it.

https://rust-lang.zulipchat.com/#narrow/stream/186049-t-comp...


It is still active[1] and he goal is indeed for it to be the next borrow checker.

[1]: two months ago: https://www.reddit.com/r/rust/comments/ufmzcl/is_polonius_de...


Anyone have a link to the slides from the video?


Isn't this breaking the Rust language into two?



not at all, it's no different than having gcc, clang, and lcc. It's just another compiler, they are still building toward the spec.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: