Hacker News new | past | comments | ask | show | jobs | submit login
Achievement unlocked: rustc segfault (gist.github.com)
380 points by bcantrill on April 10, 2022 | hide | past | favorite | 73 comments



Perhaps I missed it, but too bad OP didn't submit a fix to LLVM as well, or at least file a bug report there. Sure, rustc was emitting bad IR, but LLVM shouldn't crash! It should catch the issue and exit cleanly with an error message. Probably would have been easier to debug this issue if LLVM hadn't crashed in the first place, too.

Either way, a really fun read. For some reason I enjoy reading debugging stories for bugs that I almost certainly wouldn't be able to solve myself.


> It should catch the issue and exit cleanly with an error message.

IIRC LLVM's IR verification is not enabled in release build. In other words, if you're using rustc with debug version of LLVM the error message should pop up.

EDIT: "...LLVM's IR verification is not enabled in release build..." this is wrong, LLVM doesn't turn off verification based on its build mode. It is up to the user of LLVM, namely rustc in this case, to enable verification. For instance, you can verify IR after each optimization Pass (which is pretty expensive) by configuring `llvm::StandardInstrumentation` properly. Or you can verify the IR before codegen by switching up one of `llvm::TargetMachine`'s options. Clang always enables the latter (verification) by default but disables the former regardless of the optimization level or its build mode.


The rust compiler has an option to enable llvm assertions, but it needs to be set at compile time (of the compiler) in config.toml: https://github.com/rust-lang/rust/blob/1f7fb6413d6d6c0c929b2...

I don't know how these checks compare to what clang is doing.


the root cause of the original problem is caused by invalid IR, which can be caught by IR verification -- which is actually different from assertion.


So would something like miri have caught this if it was in their CI?


...and it's probably not enabled because the Rust compiler is already slow enough as it is? But yeah, I guess it's a fair trade-off having 0.0001% of builds crash if it makes the other 99.9999% a bit faster...


In either case dereferencing a null pointer would still be a bug, right? Or is it kind of "all bets are off" if you feed LLVM bad IR and don't enable verification?


If you call free((void *)-1) in C code, it will very likely crash, but it will crash in libc, not your program. Is that a bug in libc?


I think even deriving the argument might invoke undefined behaviour here, but I'm not certain.

If I understand correctly, it's undefined behaviour to do this:

    int *ptr = (int*)42;
As, in order to avoid undefined behaviour, ptr should only be assigned a valid address of an int, or the 'address' one past the end of an array of int, or NULL (logical zero).

It's possible things are different when the type is void*, I'm not certain.


Make it free(free) if you want it to be a valid pointer.


No, because the defined contract for free() is that you pass it a pointer that was previously returned from malloc(), that you don't try to free() it twice, etc.


That's documented undefined behavior.

Is it documented undefined behavior to feed bad IR into LLVM?


Of course. If there's an input checking mode, and you disable the checks, then you're guaranteeing that you won't supply invalid input.


LLVM makes trade off on enabling/disabling certain checks, including assertions and IR verification, primarily to keep compilation time acceptable in release build. The idea is that we catch as many bugs as possible in debug version of LLVM such that release version can run fast.


LLVM makes use of assertions to validate things like this, but many users of LLVM, including rustc, turn them off for performance reasons.


It is.


Terse affirmative is ambiguous.


If you feed LLVM bad IR, all bets are off. LLVM's assertions and IR verifier are impressively comprehensive, but not a guarantee.

For example, LLVM has a pointer type and a void type, but you may not make a pointer-to-void type, see https://llvm.org/docs/LangRef.html#pointer-type . If you do call `Type::getVoidTy(C)->getPointerTo()` then LLVM will hit an assertion only in a build with assertions enabled. Without assertions LLVM may silently execute UB.


As are all tautologies;)


> It should catch the issue and exit cleanly with an error message.

Probably not. As the author describes, LLVM has to tool to check for invalid IR, which they used to investigate the issue and generate an explanatory error message.


Note that assertions or compiler segfaults or other similar issues can be automagically reduced to the critical input using tools like c-reduce/cvise. Sure, they're designed for C and C-like languages, but they're tolerable for rust, assembly languages, etc. Probably anything that has text with line separators would work reasonably well.

Also for LLVM IR, there's llvm-reduce and -opt-bisect-limit.

Not sure if they'd have helped here but they're generally useful tools for tracking down similar issues.


Wow those look like great tools!

Thank you for sharing. There were GCC segfaults that started appearing on my machine a few months ago, and they magically disappeared after a recent update. Those tools are going to be a massive help next time that happens.


Yeah I did not understand a word of this.


There was a meme in there that I could enjoy at least!


Oh, lol, I guess I get that achievement too. Custom target spec json with incompatible MIPS cpu and abi options was what did it. That was a few years back, but post 1.0 at least.


It’s amazing how in-line assembly is always such a big source of miscompilations/compiler crashes. Both with C/C++ and Rust. With GCC or Clang. It doesn’t matter.


I think it makes sense as a place to expect bugs. It's a rarely used feature which interacts in a unique way with some pretty core parts of the compiler, in a platform dependent manner.


Amusingly, if I recall correctly, the author of this post was the author of Rust's original (never stable, now removed) inline assembly support, years and years ago. Certainly one of the best people in the world to have encountered this bug!


Although in recent years the representation has become a lot more abstract (and people to correctly generate that abstract representation), inline assembler tends to have a lot of monotonous drivel to get right.


Pretty easy to get llvm segfaults, I got that with pony also in development versions. The difference to rust is that this happens with rust compiled binaries also at run-time, in pony very rarely.

https://github.com/rust-lang/rust/issues?q=is%3Aissue+SIGSEG... => 87 + 291

vs https://github.com/ponylang/ponyc/issues?q=is%3Aissue+SIGSEG... => 11 + 50

Now compare that to a safe systems language, e.g. sbcl: https://bugs.launchpad.net/sbcl/+bugs?field.searchtext=SIGSE... => 1 (invalid)

or clisp: https://sourceforge.net/p/clisp/bugs/?q=%7B%22status%22%3A+%... => 15


Not to besmirch Pony (I think it's a very interesting language), but isn't some of this a base sampling error? There are probably a few orders of magnitude more engineers writing Rust (and therefore running `rustc`), so we'd expect crashes in Rust to surface more frequently.


Not in memory safe languages, sorry.


I'm not sure I understand. Rust is memory safe, and the memory unsafety happening here isn't in Rust itself: an IR generation bug in Rust is causing memory unsafety in LLVM, which is written in C++.

(Besides: If GitHub's stats are correct, `ponyc` seems to be >60% C and C++?)


Compilers for memory-safe languages can't have bugs in them?


Compilers for memory safe languages written in the same memory safe languages should be expected to have to same guarantees as any other program written in that language


I think that’s the fifth Futamura projection — compiling a compiler with itself cancels out bugs.


> The difference to rust is that this happens with rust compiled binaries also at run-time,

The list of issues you linked in an apparent attempt to support this claim are almost all (or maybe even all) compile time crashes not runtime crashes...


The post is also about crashes in rustc not crashes in rust programs, right?


The top level link, yes.

The comment I replied to... I quoted the text that made me think it wasn't. Am I misinterpreting it?


Pony committer here, I don't think this comparison is very fair, Rust has way more users, and thus way more eyes looking at it and reporting bugs. We've also had a few runtime segfauts before due to LLVM (mainly when porting to new platforms, like Apple's M1), and we try hard to fix them when we find them.


Getting LLVM to SEGV is depressingly easy.


GCC would never compile this code. Just saying...


LLVM, GCC, MSVC - all are complex enough that you run into internal compiler errors from time to time. Maybe a bit less for MSVC since you don't get to run random development snapshots, but still above 0.


This doesn't seem like a bug in rustc at all. Maybe just by emitting bad IR. But this is actually bug in LLVM.


No, LLVM is configured by rustc to NOT to check its input (for performance reasons), so it's normal for it to crash if fed invalid input. The bug is 100% in rustc's court.


I think it's reasonable to consider it a bug in both: it's a MIR lowering bug, one that gets promoted into a different bug by LLVM.


why? They are using a build with assertions/etc disabled, so it segfaults rather than asserts. When you use a build that is set not to do that, it gives you an error :)


If rustc uses a dependency that has a bug and this manifests itself as a bug in rustc, then IMO rustc still has a bug.


Offtopic. Can we make a piece of software that automatically pays for discovered bugs with some form of cryptocurrency? The idea inspired by Knuth, who offered to pay twice more for each new found bug in TeX.


> Can we make a piece of software that automatically pays for discovered bugs with some form of cryptocurrency?

In the general case, no, not without having AGI. There are a subset of bugs that computers can verify (i.e. for a compiler, it might be asserted that "any input that causes a crash is a bug"), but the majority of bugs will require a human-like intelligence to read them and reason about them to determine that "yes, this is a bug".

Given you have to involve a human in the loop to actually fix it, it seems like automating a bounty for it is fixing a problem that doesn't really exist.

There's also vanishingly few projects that want to pay for literally any bug. The majority only want to pay for critical bugs or security vulnerabilities, both of which are purely human judgements.


Well sure, each and every smart contract has an automatic bug bounty attached.


It's a very bad idea to make any sort of completely automated system for dispensing money, cause you will have human adversaries that will try to exploit it.


One of the reasons aircrafts like Boeing runs on safer languages like Ada.

There's a pretty cool article about it: http://archive.adaic.com/projects/atwork/boeing.html

edit: To those stating it was C++ portion of the compiler code at fault. That's not the point. Ada offers greater protection against segfaults than Rust. Not to mention that part of the compiler still being written in C++ and not Rust doesn't bode well.


> Not to mention that part of the compiler still being written in C++ and not Rust doesn't bode well.

The Ada compilers I'm familiar with are also written partly or entirely in C or C++.


I didn’t read that closely, but isn’t this is a segfault in the compiler, not a miscompilation?


Yes.

A segfault in the portion of the compiler written in C++ no less, though triggered by invalid input from the portion of the compiler written in rust (due to a logic bug).

Unless GPs point is that "rust code can still have bugs" (so can ADA) or "the rust compiler still has bugs" (I'm willing to bet so do all ADA compilers) I don't see how this supports his claim.


I can confirm. I myself or our teams find compiler bugs every year or so, going from pure compiler crashes, to generate code that segfaults yay. Bugs are everywhere. Ada helps you with some categories of bugs and crashes but saying you can't segfault an Ada application is laughingly untrue. I manage that frequently on legacy codebases. And BTW, for the customer, a segfault or an uncaught top-level exception have the same effect. A crash.


Nothing inspires confidence like the compiler crashing. Given C/C++'s attitude towards UB, a crash just means you got lucky and the program really f'd up before doing something seriously nefarious like silent memory corruption or carrying on with nonsensical results....


As a practical matter, though, I'd much rather the compiler segfault, than miscompile my code so it does something incorrect at runtime.

But agreed that seeing a compiler segfault doesn't inspire confidence!


Me too!


This article compares Ada with C and PL/1. This is pretty easy to win - its hard to imagine language less safe than C.


https://yarchive.net/comp/ada.html

(Marc H. Donner and David H. Jameson, "Language and Operating System Features for Real-time Programming", Computing Systems vol 1 number 1, winter 1988, pp 33-62):

> Ill-chosen abstraction is particularly evident in the design of the Ada runtime system. The interface to the Ada runtime system is so opaque that it is impossible to model or predict its performance, making it effectively useless for real-time systems.

More:

https://catless.ncl.ac.uk/Risks/6/36#subj12

> Am I correct in thinking that several (two?) missiles were recently destroyed on launch each of which had their guidance systems coded in Ada? Were the problems which forced the destruction of the missiles the result of bad software design or some inherent ambiguity in Ada syntax?

> I spotted but unfortunately left unlogged a report somewhere which gave an account of a talk by a leading scientist (name?) in the military technology area who expressed grave reservations about the design of Ada. I think the report mentioned that the person expressed little confidence in guidance systems coded in Ada.


In light of decades of demonstrated reliability of Ada use in aerospace applications, a bunch of comments about Ada from the late 1980s are meaningless.


The quoted comment is saying that the Ada support runtime is so high level that it sometimes ends up being unsuitable for real-time programming. This is exactly the kind of concern that would be hard to improve even in recent versions of Ada, whereas it is naturally addressed by Rust.


That's one of those times when looking around at what exists and not only old papers might help. Been doing soft and hard real-time on baremetal (no OS) with Ada tech for a long time. There's a whole of language constructions to help reducing the language set for hard embedded stuff (no dynamic allocation, no specific constructs, no secondary stack,...) and there are the Ravenscar and Jorvik profiles.

Very strange to hear/read about non implementability of hard real-time with Ada when one the guy I hired years ago was one of the OS-for-aircraft-embedded-computer, doing that since the 90s...


Except it ignores the little detail of stuff like Ravescar profile.

https://en.wikipedia.org/wiki/Ravenscar_profile


How do you figure Rust handles this better than Ada? Ada's runtime has always been configurable. You can customise the runtime library fairly easily to strip out things that don't suit your target platform. Most compilers even ship with a 'Zero Footprint (zfp)' runtime which is designed for minimal overhead.

You could make the exact same argument about C if you wanted. C's standard library, as implemented for most operating systems, is poorly suited for real-time systems too. That's why it's modified to be fit for purpose when targeting them.

Whether people know it or not, they've been living in the Ada world for a long time. A good deal of the world's critical real-time systems have been running on Ada for decades. Flight, and fire control systems in aircraft, Many major ATC systems, etc. The list goes on, and on.


I like this sarcastic piece: https://web.archive.org/web/20180207161304/http://home.pipel...

> After a number of top-secret meetings at the highest levels, the "Ada Project" was conceived. ... Its goal was to divert Soviet attention from truly productive computer languages like Lisp, and convince them that only a bloated, grossly inefficient, high order compiled language along the lines of PL/I could be reasonably utilized in the deployment of military embedded systems. The use of a standardized, inefficient language would provide a one-two punch: it would render super-programmers useless, and it would increase the demands on hardware by more than two orders of magnitude.

> The Ada Project was inspired by the unexpected success of the IBM System/360 architecture behind the Iron Curtain. The Ada Project's wizards {the Ada Project was conceived at Kirtland AFB, NM, near Roswell} reasoned that if the Soviets could be lured into copying the 360 architecture, they could also be lured into copying the Ada language, and if this language were fiendishly designed to make real-time systems essentially impossible to program, then the Soviet military machine would grind to a halt.

> Although Ada would also severely impact American software productivity, it was felt that--just as cancer-fighting chemotherapy nearly kills healthy tissue while it kills tumors--the healthier US economy would be better able to bear the severe burden of an unproductive software industry than the Soviet economy could. Thus, while American geeks were inferior to Soviet geeks, our Elbonian hordes could beat their Mongolian hordes.

Presumably modern Ada is better! Even the sarcastic paper hints at it:

> Now that the Wicked Witch of the East is dead, the wizards have finally allowed Ada to evolve into Ada9X, which fixed some of Ada's more egregious dysfunctions. However, even today the brilliance of Ada's original conception still shines brightly through.


> "The interface to the Ada runtime system is so opaque that it is impossible to model or predict its performance, making it effectively useless for real-time systems."

This comment didn't age well. If you've ever been a passenger on a modern jetliner you've most likely trusted your life to real-time systems coded in Ada.

> "Am I correct..."

> "Were the problems..."

> "...by a leading scientist (name?)..."

> "I think the report..."

Terrible. Entirely conjecture. Not one concrete source to substantiate these claims.


I don't deny that, with careful coding review, Ada has the potential to be safer than C++ even with the noted deficiencies.


GNAT is a frontend (written in Ada yes) to GCC which might be one of the strangest pieces of C++ code in this world. And don't go there with AdaCore's SPARK examiner/prover implementation either because it's mostly an Ada frontend to Why3 (I'm simplifying) and you don't want to know what's behind Z3, CVC4 and Altergo. I'm a huge fan of SMT tech but I'm glad I don't have to open the hood.

These tools are enablers. Would it be nice to have an all-Ada compiler? Well, no! I want most of the gcc engineers to work for me! As long as the gcc backend improves, I get more perf! As long as SMT tools improve I get faster automated proof of contracts, or automated proof of far more complex contracts!


This is going to invite all the people who know nothing about the language and the circumstances with regarding to Ariane 5. :)


Agreed. To those seeking more information about this, I always refer to this article because of the links it contains:

http://www.rvs.uni-bielefeld.de/publications/Incidents/DOCS/...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: