Hacker News new | past | comments | ask | show | jobs | submit login

In the Computer Science degree I did, every course that was in the Software Engineering or Formal Methods track started with "Why Software Engineering is important", and then a group of very bad software bugs, this one was always one of them. The slides would then be followed by how the professors believed their course would have prevented them.

Especially funny was the formal verification course that mentioned the Ariane 5. Apparently all the new software in Ariane 5 was formally modelled and verified, but one part of the system was directly ported from the Ariane 4. Because the Ariane 4 mission had been successful they did not verify that (it's an expensive process). The bug that crashed the rocket involved the fact that the Ariane 4 was 16 bit, and the Ariane 5 was 64 bit, it resulted in an integer overflow somewhere leading to a crash.

You can spend millions in painstaking formal verification, and pay for the small part that you did not verify.




An erstwhile colleague of mine worked on the component that destroyed the Ariane 5; it performed exactly to spec, detecting that the vehicle was out of control - due to the integer overflow (in another, unrelated component nb) - and self-destructing it to prevent it crashing to earth. However explaining this subtlety to a rabid press was a different story, and the whole thing became a bit of a PR disaster


I thought the Ariane was destroyed by aerodynamic forces?


self-destroying components sound interesting. do you have any info/links to which type of technology is used in this component?


Nothing too fancy, just strategically placed explosive charges to break up the vehicle to stop it from traveling further down range and disperse the propellant to reduce the size of the explosion.


I think you misunderstood - the purpose was to self-destruct the rocket, so although I don't know the details I'd imagine it was a sensor to detect the vehicle exceeding its flight parameters coupled to either an explosive packet or something in the fuel system to make the rocket go bang


Javascript and PHP, mainly.


I opened the comments track to say exactly this. Every lecturer with any connection to software would bring this up. It felt like the bad (as in, people have died from Therac-25 malfunctions) running gag of my CS studies. Other popular bug choices were the Pentium Bug from 1994 or the Mars Climate Orbiter.


I thought the Ariane 5 problem was due to old code in the guidance systems that they thought would never be called but left in because they didn't want to risk unnecessary change. A "last minute" trajectory and launch timing change due to atmospheric conditions meant that this code did get triggered - it saw the new trajectory as a problem and tried to correct it pushing the rocket out of control (or, at least, outside acceptable parameters for the new launch plan) and causing the perfectly reasonable "I'm not sure what is going on, I'd better blow myself up before I hit something important on the ground" fail-safe to fire. Or am I confusing this with another rocket control error?


It seems you're confusing with an other one. Here's an excerpt from the wiki on flight 501:

> The Ariane 5 reused the inertial reference platform from the Ariane 4, but the Ariane 5's flight path differed considerably from the previous models. Specifically, the Ariane 5's greater horizontal acceleration caused the computers in both the back-up and primary platforms to crash and emit diagnostic data misinterpreted by the autopilot as spurious position and velocity data. Pre-flight tests had never been performed on the inertial platform under simulated Ariane 5 flight conditions so the error was not discovered before launch. During the investigation, a simulated Ariane 5 flight was conducted on another inertial platform. It failed in exactly the same way as the actual flight units.

> The greater horizontal acceleration caused a data conversion from a 64-bit floating point number to a 16-bit signed integer value to overflow and cause a hardware exception. Efficiency considerations had omitted range checks for this particular variable, though conversions of other variables in the code were protected. The exception halted the reference platforms, resulting in the destruction of the flight.

Although the article partially disagrees with tinco: it looks like formal verification was only implemented after flight 501:

> The launch failure brought the high risks associated with complex computing systems to the attention of the general public, politicians, and executives, resulting in increased support for research on ensuring the reliability of safety-critical systems. The subsequent automated analysis of the Ariane code was the first example of large-scale static code analysis by abstract interpretation.


The details are a bit unclear on Wikipedia, and it's taking tens of minutes to download the original report from the European Space agency, so below is my best understanding without re-reading the report.

My understanding was the routine in question was used for re-calibrating the inertial guidance system in the Ariane 4 in case of an extended hold-down period for up to 40 seconds after ignition. Presumably this routine integrates measured acceleration, which can be divided by hold-down-time to find the average error in inertial bias over the hold-down period. The average error in accelerometer bias (in other words, rate at which measured ground-relative velocity deviates from true ground-relative velocity) would then be subtracted from the previous bias estimate in order to get the bias estimate to be used for the flight.

Edit: even though recalibration was never intended to be used in the Ariane 5, the integration routine was left in and continued to integrate acceleration measurements for 40 seconds after ignition.

The Ariane 5 is capable of undergoing more acceleration than the Ariane 4, so it was possible within 40 seconds of ignition for the 64-bit velocity (integral of acceleration) to overflow the 16-bit variable it was cast into at some point. With no range checks implemented for this cast, this routine caused the computer handling the inertial guidance to crash and dump. The autopilot saw the crash dump, but misinterpreted it as a position and attitude update. The autopilot then adjusted the rocket nozzles to correct for the misinterpreted attitude, causing the rocket to start flying somewhat sideways through the air at high speed, leading to breakup due to aerodynamic forces.

Most (all?) modern launch vehicles contain small explosive charges (or lines of detonating chord) to burst fuel tanks, break up the solid fuel grain, and perhaps break up some of the more dangerous pieces if the vehicle deviates too far from the planned flight path. Shortly after aerodynamic forces began to break up the Ariane 5, some internal automated system detected things were going very very badly and triggered the auto-destruct sequence.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: