Hacker News new | past | comments | ask | show | jobs | submit login
Boeing Passenger Jet Nearly Crashes Because of Known Software Bug (independent.co.uk)
161 points by xrayarx 4 months ago | hide | past | favorite | 65 comments



https://avherald.com/h?article=5194536c

and https://www.gov.uk/aaib-reports/aaib-special-bulletin-s1-sla... is the AAIB report

Summary:

> The aircraft took off from Runway 09 with a thrust setting significantly below that required to achieve the correct takeoff performance. Rotation for the takeoff occurred only 260 m before the end of the runway and the aircraft passed over the end at a height of approximately 10 ft. The N1 required to achieve the required takeoff performance was 92.8% but, following an A/T disconnect when the crew selected TOGA, 84.5% was manually set instead. Despite an SOP requirement to check the thrust setting on takeoff, the crew did not realise that the thrust was not set correctly until after the takeoff although they had noted how close to the end of the runway they were. The A/T had disconnected when the TOGA switch was pressed due to a fault with the ASM associated with the thrust lever for engine 1. This disconnect was a known issue with the older type ASMs fitted to the aircraft type. The manufacturer has issued a Fleet Team Digest for operators detailing the issue and the SB for replacing the ASMs with a newer model.


Thanks for the links.

I had to go to the PDF [1] linked from the AAIB Special Bulletin to get a somewhat clear picture of the sequence of events (though if I were a 737 pilot or engineer, it all might have been obvious from the summary):

"Having completed their pre-flight preparation, the aircraft left the stand at Bristol to taxi to Runway 09 at 1041 hrs. The A/T arm switch on the Mode Control Panel (MCP) had been set to ARM during the before start procedures in accordance with the operator’s SOPs. The aircraft taxied onto Runway 09 at 1104 hrs and was cleared for takeoff shortly afterwards. The left seat pilot handed control of the aircraft to the right seat pilot who was to be PF for the sector. The PF advanced the thrust levers to 40% N1 and paused for the engines to stabilise before pressing the Takeoff/Go-Around switch (TOGA) which engages both the A/T in N1 mode and the autopilot/flight director system (AFDS) in takeoff mode. At this point, the A/T disengaged with an associated warning and the A/T arm switch on the MCP was reengaged by the PM almost immediately afterwards. At the same moment the PF advanced the thrust levers manually towards the required takeoff setting before releasing the thrust levers for the left seat occupant to control in accordance with the SOPs.

"When the A/T arm switch was re-engaged on the MCP after initial A/T disengagement, it did not control the thrust lever servos as the pilots expected and instead entered an armed mode. As a result, the thrust levers did not advance to the required thrust setting and neither pilot moved them from the position the PF had set them to. Despite the SOP requiring that the thrust is set by 60 kt and checked as correct at 80 kt, the incorrect setting was missed by both pilots. This resulted in the aircraft takeoff being conducted with significantly less thrust than required, 84.5% N1 was used instead of 92.8% N1, with the associated reduction in aircraft performance."

At this point, it is clear to me that the pilots missed two mandated checks of the power prior to rotation, but it is not entirely clear to me at what point, in this rather complicated sequence of events, things first deviated from what should have happened if both the equipment and pilots were performing as intended - was the A/T disengagement after TOGA selection the intended and expected behavior, or was the first deviation when it went into armed mode after re-engagement?

[1] https://assets.publishing.service.gov.uk/media/665092d816cf3...


Not an expert, but it seems very clear to me that autothrottle is not intended to disengage when you press the takeoff/go around button. The whole point of that button is that you press it when you need to take off or go around and the plane attempts to set itself to a configuration where as long as you point the nose in the right direction then that should happen.

But it does rather sound like it was an expected behaviour. The system has a history of nuisance disconnects, the manufacturer recommends that you reject takeoff when that happens, and for some reason instead of doing that, the crew reacted to the disconnect warning by pressing the button again and, at that point, trusting that it was engaged and working. My reading of that is that the plane regularly has an A/T fault that can be fixed by simply pressing the button again, so this crew has developed a habit of doing that, but this time it was a different fault with different behaviour that their usual fix didn't work for, and they just failed to notice.

The report is clearly limiting itself to describing the difference between what actually happened and what the procedures say should happen, rather than speculating about what the pilots were thinking, but that's the only way it makes sense to this very much not expert that you shouldn't pay too much attention to.


What's damning about the pilots is they utterly failed to monitor their engine gauges, which for those who don't know are always prominently displayed on one of the big displays near the center of the instrument panel and easily visible to anyone anywhere in the cockpit.

Takeoff is by far the most stress the engines will endure during a nominal flight, it is imperative that the engines are monitored closely during takeoff in case they fail so the pilots can respond immediately. We're talking Basic Flying 101 here and these guys failed it.


Failed to monitor their gauges and failed to do their pre-flight checklists.

They might be piloting a shit vehicle, but they put no effort into doing their jobs.


It is intended to disengage upon any physical movement of the throttles.

Normal usage of the TOGA buttons would be to push them and not move the throttles - assuming the throttle has normal non-worn or adjusted throttle friction.

Abnormal usage may well be that the throttle friction has worn or has not been adjusted by maintenance (afaik it's not crew-adjustable on boeing aircraft as it is on smaller light airliners), and thus pressing TOGA may shift the throttle lever enough to disengage A/T.


The title here and the text of TFA reduce the incident to a single point of failure. There are conditions in which the TO/GA mode when engaged will fail to set the calculated and the crew is required to manually set the takeoff thrust. Confirmation of takeoff thrust set should be called out. Was that done? Boeing is under justifiably intense scrutiny for many of its manufacturing and engineering practices but the press needs to apply intellectual honesty in reporting incidents like this. It seems that the crew may have known that the thrust was not properly set when TO/GA was engaged and then manually set an incorrect thrust. Like many incidents, this is likely to have layered causes. Why not report that upfront?


There is a lot of basic airmanship which has failed here. For example from your first day of PPL training you are taught to backup the throttles during takeoff and at low altitude to avoid them creeping backwards. You are also taught to cross check that you've achieved half the rotation speed at 1/3 of the runway available.

Obviously these practices are less applicable to an airliner but the basics are still there. The pilots should be monitoring engine performance and acceleration to avoid these type of issues.

In this case it was a software bug but it would be just as easy for it to have been a wrongly input takeoff weight or temperature to affect the calculated TOGA power.


> Boeing is under justifiably intense scrutiny for many of its manufacturing and engineering practices

Well that yes, but not just the engineering and manufacturing practices or apparent lack of safety, also the extremely suspicious conditions of the deaths of whistleblowers who came forwards against them. So far two are dead.


This is a conspiracy theory. At least one of the deaths is not suspicious, and the person's family have said so themselves.


Well, yes, it's the very definition of a conspiracy theory. That doesn't mean it's automatically false though.


> Well, yes, it's the very definition of a conspiracy theory

What is?

> That doesn't mean it's automatically false though.

No, just significantly less likely than the other going theories.

Note that I wouldn’t put it past Boeing (or actually, more likely, a Boeing investor) to kill. But without any evidence I’d refrain to make such accusations.


> No, just significantly less likely than the other going theories.

The likelihood of the explanation needs to be balanced with the unlikelihood of the two already unlikely events happening independently of each other.

When you're seeing statistically unlikely events, it's downright reasonable to examine whether the game is rigged somehow.

One whistleblower dropping dead is curious timing, but ultimately plausible; it's a stressful situation after all. Two of them drop dead? While it's not established beyond reasonable doubt based on the unlieky timing and strong motive alone, some sort of conspiracy to retaliate against the whistleblowers is definitely one of the less far fetched theories.

Weird and oddly personal criminal behavior is something that happens in the corporate world[1]

[1] https://www.cbsnews.com/news/investigation-ebay-employees-st...


But if one has been confirmed as natural, then we’re back to

> One whistleblower dropping dead is curious timing

right?

I like some cold statistics too but in this case I wouldn’t say they support some machiavellic scenario.


“To lose one parent, Mr. Worthing, may be regarded as a misfortune; to lose both looks like carelessness.”


"Once is happenstance. Twice is coincidence. Three times is enemy action."


The conspiracy theory is that Boeing hired somebody to kill the whistleblowers. But even if they committed suicide (which is more likely), it is because of the immense pressure but on them by Boeing. Which, in my book, is as bad as hiring a hitman.


The guy who died of MRSA in a hospital didn't die of suicide.

That is a failure of our healthcare system.


It's also a circumstance where even if the problem is known, the correct solution may not be to fix it on already flying models.

Complicated professional transportation infrastructure isn't like a web app. Pilots become familiar with the quirks of the machine, and changes to said quirks sometimes require recertification or reevaluation; otherwise, they could jeopardize passenger safety as new behavior interferes with pilots' learned routines.

It is reminiscent of the problem with the Apollo flight computer where it was known that rerunning the initialization program mid-flight would put the machine in a state where it had no understanding of its current position, but because they had already woven the computer core the solution was to come up with a way to restore flight state and correct the error instead of making the error impossible.


Shouldn't there be a design rule that if an automatic setting can fail, then it should be completely manual?


No. Every system can fail. Manual systems fail too (people aren't perfect). Having an automated system there for most of the time when it's more likely to do the right thing than a human is still good for the overall reliability of the system.


Another rule could be that the system should sometimes set a random value and then when the pilot doesn't catch it there should be an automatic report about it (and the plane should refuse to start).


True, but in this case there was a __known__ bug.


>Why not report that upfront?

Because it doesn't pay to be honest.


I don't see any dishonesty here. Boeing has designed a system that sets up pilots for failure. It's Boeing's fault, not the pilots', that the auto-throttle disengages for no reason and that they then have to manually set the correct thrust during an already high workload situation. It's absurd and unacceptable. This is the kind of unsafe work culture that I thought we were done with in aviation.


Blaming Boeing and only Boeing is intellectual dishonesty, blaming the pilots for failing to fly their plane is also necessary. Remember, TOGA and autothrottle are not necessary for safe flight.

It is also explicitly a pilot's job to manage the engines among many other things. It's why there are two pilots in the cockpit, one is in control of the aircraft while the other is keeping tabs on secondary and tertiary needs like radios and cross checking instruments with the pilot in control.

If the aircraft isn't accelerating properly, it is the pilots' job to respond including rejecting the takeoff before it's too late. Takeoff is by far the most stressful part of a flight for the engines, if the pilots are not keeping close tabs on the engines during takeoff then they are fundamentally unfit to fly.

Blaming Boeing and only Boeing is dishonest and doesn't address the problem, but it does pay the medias' bills by stoking the furor of readers such as yourself.


Just because a system isn't necessary for safe flight doesn't meant that a) it can't kill you or b) you shouldn't use it.

If you think that blaming the people will solve the problem, then I'm excited to be the one to introduce you to one of the most important and underserved fields of engineering: human factors, the study of which is a major reason we have safe aviation.


Human factors is an important field of study, yes. But just because we know humans make mistakes doesn't mean we can't hold them accountable for those mistakes when they happen. Our tendency to focus on one thing to the exclusion of the other things is bad for getting a full understanding of what is going on. Two things are true in this incident:

1. Boeing has a known unfixed software bug in their planes that puts pilots in a bad position. It has multiple known failures that look the same and created a learned response that was incorrect. This is bad and Boeing should answer for it.

2. The pilots are ultimately responsible for making sure they are hitting the correct speed for take off and should be aborting if they are having difficulty making that speed. They failed to do so in this case. This is bad and they should have to answer for it.

Both of these are true and neither one excuses the other.


TOGA and A/T are not necessary, but if they're supplied they should work reliably.

If they don't, and pilots have to cover for their failures manually, that's one more avoidable point of failure.

Adding one more avoidable point of failure is NOT acceptable.


That this has to be an either-or crusade against Boeing instead of blaming all the relevant parties speaks volumes about how much fucking damage "journalism" like this does.

Once again: Both Boeing and the pilots along with whoever else is involved in this are all to blame equally.

If you're going to let pilots who don't properly check their engine gauges off the hook because you are far more interested in crapping on Boeing specifically, planes are eventually going to come down regardless of Boeing.


Being operator-conditioning-trained to click through warning signs is a huge problem for everyone.

The car drivers with dashcams on that stretch of motorway they cleared at 30m must have some fantastic vision. I hope it's been uploaded to the Web (like sint maarten but not beach: trucks)


Good argument for the morality of ad blocking


The pilots turned off the autopilot for takeoff due to a known bug.

Flying manually the pilots made an error when setting the required thrust that nearly caused the plane to run off the runway without getting off the ground?

Is that correct? In which case it seems like pilot error.


It's not clear from the article whether

A) The autothrottle disengaged due to the bug, and the throttle defaulted back to some previously set value, without the pilots knowledge.

B) The pilots disabled the autothrottle to avoid a bug, and also failed to set the throttle correctly.


Human errors happen. That's why they take off with the autopilot, but the autopilot doesn't work well enough and everybody AFAIK has to take off manually. Great job from Boeing.


? lol.

Disabling automation because of a bug and doing manual stuff is how things break.

This is true in WebPKI, it’s true in planes as well.


It's actually the right way of thinking! When high-level automation fails you, messing around with the high-level system in a critical moment would probably be your death. Debugging takes minutes not seconds. This is why pilots are trained to drop from a high level automation to a lower one when necessary. Like most learnings about piloting, this insight was purchased with a lot of blood. If you want to know more, check out Children of the Magenta Line: https://www.youtube.com/watch?v=5ESJH1NLMLs


The problem is not dropping automation when it fails.

The problem is dropping automation BECAUSE it fails.

They aren't able to rely on the automation because it's faulty - and as consequence error rate increases.


I'm not comfortable with a future where Boeing becomes niche or "commercially deprecated", with current planes aging and no good maintenence path forward. It will become a safety liability that we passengers will have to bear.

Boeing's demise would create an amazing opportunity for Chinese aviation to make its move. It takes decades to really enter the market but it could, slowly, happen. Embraer otoh is not in a position (huge investment) or just not interested (too risky) to enter those 737/MAX and long haul markets.


> Pilots manually set the thrust level following a software glitch that Beoing [sic] was aware of before take-off.

Oooh... that's gonna leave a mark.


But this part should be more widely remembered:

>The crew manually set the thrust to 84.5% N1 (rather than 92.8% N1 as needed) and continued takeoff. The aircraft rotated about 260 meters prior to the end of the runway and crossed the end of the runway at 10 feet AGL. The crew continued the flight to Las Palmas although whenever they tried to engage autothrust, it disconnected again. The aircraft landed safely in Las Palmas.

10 feet is 3 meters


It is really disheartening to see a cutting-edge engineering company making news for failures of design or engineering, vs in the news for innovation engineering, new designs, and new trials.

Its as if the soul of the company has been taken away forever.

Contrast Boeing with Nvidia, Tesla, etc


It began long ago, with a series of Jack Welch acolytes starting with Harry Stonecipher and James McNerney. They brought hostility to in-house engineering, hostility to unions. That's when Boeing lost its soul.


Ex-engineering company…


Contrast Boeing with spacex, not Tesla.


Tesla way worse than Boeing. Robotaxis lol


I suspect the most interesting part of this is why, precisely:

If this is a known bug, incorrect setting of the throttle and subsequent disconnect of the auto throttle system, why is the aircraft still deemed airworthy while using the auto throttle during takeoff?

How was using defective equipment critical to flight safety if engaged not specifically prohibited in an addendum to the POH?

Because that would require additional training and be a bad look, that’s why, I suspect.

If a FAR23 (light) aircraft had a defective throttle cable that sometimes failed to effect the commanded throttle setting under certain conditions, it would be grounded immediately pending remedial service or, if impossible, at least placarding of the prohibited configuration and modification to the POH to specified prohibit that configuration, as well as implicating changes into training materials if the aircraft fell into the high performance category.

That Boeing is operating at a lower standard of flight safety than is typically required of SLA regulations is an indication of a deeply broken relationship with regulators.


There was nothing wrong with the throttle controls or the engines. The engines were at commanded thrust.

What happened was the autopilot (autothrottle) failed to engage.

Pilots set manual thrust and rearmed the system. It didn’t engage. Then the pilots failed to verify engine performance or throttle position twice during takeoff.

This is dangerously negligent because engine performance may not match throttle setting due to mechanical fault. The computers may or may not detect this.

See: https://en.m.wikipedia.org/wiki/Air_Florida_Flight_90


Exactly. The procedure of taking off with the auto throttles rather than manually controlling the engines should be prohibited in the POH if the software governing the flow of that procedure is unreliable. Sure, double checking it is a really great and necessary step, but you don’t throw down marbles on the floor in front of the aircrew, even if they should be able to step around them.

Mistakes get made. Procedures botched, steps overlooked or conditioned by habit to be checked ok when they are not.

Aviation safety is the result of defense in depth, layered mechanisms and procedures that make loss of life highly improbable. An aircraft that is faulty in a way that defeats these layers of safety is by definition not airworthy.

Often, it can be made airworthy by a change of procedure, a change in operational regimes, or even by a placard on the dash.

In this case, prohibiting the use of autothrottle takeofff until the software is reliable is probably indicated. Something that fails once in a while but usually works is often much more dangerous than something that just doesn’t work at all or is tagged out.


> The CVR fitted to G-FDZS was not removed from the aircraft as it continually overwrites itself, retaining only the last two hours of audio. As such, the recording of the takeoff would have been overwritten during the flight to Las Palmas

In 2017, the date that aircraft began production, we had terabyte thumb drives and 24kbps Ogg Vorbis, why a two hour limit?


> The 737-800 plane cleared runway nine with just 260 metres (853ft) of tarmac > to spare at a height of 10ft.

> It then flew over the nearby A38 road at a height of just 30 metres (100ft) > travelling at the speed of around 150kts (about 173mph).

This got my head spinning, what a jumble of units


By this point, with all the news of Boeing disasters, I almost assume that if you wanted to commit suicide by skydiving without a parachute, a flight to the right height in a Boeing would be more likely to kill you than the jump.


We should call this “Suicide by maximizing shareholder value”


How many people have ridden in a Boeing plane this year and how many have died?


Is that including the two currently in space who are wondering that?


I didn't realize the probability of death in a drop of thousands of feet is under .00001%


What's with Boeing lately? In the last years I see a lot of news related to crashes and problems with those.

Is it me? I did not take a deep look into the topic.

How does it compare to Airbus security-wise, with the real data from fatal crashes at hand?


A well-balanced comparison is well overdue, considering the amounts of one-sided coverage recently. Is it indicative of an issue with Boeing? Or with media reporting? Probably the former, but it's something we should really be more sure of.


Is this the one that was spewing flames yesterday or different incident?


Different one. That one returned to base immediately, it's not safe to continue with a surging engine.


Wow, Boeing again.


It's amazing that Boeing planes are still allowed in the air at all at this point.


Ultimately an unintended consequence of trying to reduce costs. Otherwise they'd just use full throttle.


The engine wear and consequently the risk of engine failure during normal operations goes up with full throttle.

Noise abatement is reducing the throttle when the plane has reached a certain altitude and not relevant for takeoff roll.


Cost and noise, the latter possibly ending up more costly if the allowed noise level is exceeded.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: