A classic tale of engineering gone wrong. Apropos, I designed high-efficiency 20 KHz inverters for the Space Shuttle crew quarters, flight hardware that ended up working flawlessly. But during the design phase, the primary Shuttle contractor revealed that my inverters would be exposed to much higher voltages than they had originally supposed.
When I saw this, I realized we would have to start over and choose different components able to tolerate the higher voltages. My managers disagreed, arguing that if we caused problems, we would be denied a follow-on contract for similar inverters to be deployed in the payload bay. The managers composed a reply to NASA in which there was no problem, things were fine, let's forge ahead.
I was a mere engineer, I had no management authority, and I hadn't been consulted about the reply. When I heard about it, I sat down and wrote a letter of resignation and pushed my letter into more hands than was absolutely necessary. I made some comparisons that at the time might have seemed over the top (like the Apollo file that killed three astronauts, a disaster resulting from lax oversight).
In my case, because of my having distributed the letter farther than was absolutely necessary, my managers were forced to reverse themselves, I was able to redesign my inverters in a safe way, and we got the follow-on contract in spite of not being seen as "team players".
Many years later, at the time of the Challenger disaster, it finally dawned on me that, had I disregarded the overvoltage issue as my managers had wanted me to, and if something had gone wrong, I would have been held personally responsible, because I was the only person with the level of technical knowledge required to make the call, and my managers could disavow any responsibility. At the time, I made the right decision, but for reasons that I hadn't fully thought out -- if my equipment had failed in-flight, I would have been held responsible, and that would have been a perfectly just outcome.
Nothing I've worked on has been nearly to this scale, but having management understand and ignore warnings is commonplace.
Just today, I pointed out that a backend web service system for an important (HR) function was so insecure that simply putting a single quote in an end-user input field would crash it, and that a SQL injection would be trivial. After a little arguing from them ("You don't know what a SQL injection is") they changed their tune to "Just put a limit on the frontend", and when I told them that could be trivially bypassed, they said "No one will ever think to try that." When I pressed further, they continued to "It's an internal app, no one would attack it" then to "And do you want me to put locks on all the doors and cabinets around the office too?"
So that's a case where there's some reasonable level of understanding and just a refusal to act.
I also recall a time (at a different company) when I discovered a public read/write anonymous FTP server setup externally facing with no firewall rules. When I pointed out the problem, they said, "that's what our customers are used to, we're not going to change it." and when I pushed, "No one will ever find it anyways." Literally less than a month later that server was overrun with illicit content.
It happens. Unfortunately, as we've seen with the Volkswagen debacle as well, it happens even when regulation or (in the Challenger case) human life is on the line.
They possibly felt that you were suggesting it was a lot of work to fix this, when in reality it would take a couple of hours to add sanitization checks to the inputs.
You're also overestimating the business impact of a security breach. This is an area where people's moral compass overrides their shrewdness. Notice that Anthem is still in business despite the worst possible breach, for example.
It's worth trying to improve security, if possible. But it takes one of two things: (a) leverage, or (b) patience. One of the main reasons that companies get a security audit is because a different company forces them to. They get an audit in order to obtain a CFD (client facing document) saying that they are secure. Without this, the other company will not do business with them, which is the sole reason why the security audit happens.
The other way -- patiently pointing out ways of improving the situation, and explaining the business merits of allocating time to this pursuit -- does not usually result in meaningful security improvements. This is an unfortunate fact of the industry, partly because security breaches do not usually put a company out of business. There are exceptions to this, but that is the common case.
To be fair, it would be a lot of work to fix the backend. The SQL is dynamically generated for every request, and the code that does so is thousands of lines long, with nested subroutines that construct sub-portions of the query, so there is no easy way to check where things need to be parameterized. Also there is some injection from unusual places (like a search options object separate from the search criteria object which includes the ASC/DESC which, you guessed it, is directly injected).
Realistically, to be safe, you'd have to gut the whole thing and replace it. You can be a "little safe" by doing something silly like replacing ' with '' but that doesn't protect in non-string cases, etc, and I wouldn't propose a solution like that because it could give a false sense of security.
I understand the push-back. The backend is maintained by others and it would add to their workload to secure it. That's a real and actual cost.
As for the business impact, you are right. I just wanted it to be "on the table" that I informed people of the situation, so that if there is an exploit in the future, I don't get the "why didn't you tell anyone this was vulnerable?" or something like that.
I came across an application that had SQL injection vulnerabilities everywhere (100s of ASP files). Query parameters were concatenated into strings over 1000s of lines. It would have been a year's project to properly parameterize every query across every system. Management wasn't too concerned until I showed them that I could drop the entire database (and the other databases on that DBMS) by putting a URL into IE6. Our eventual fix was to pass every query through a regex that would stop the worst queries (DROP, TRUNCATE), while making it a priority that everyone start writing safe queries and fixing bad query generation code when it is seen.
An attacker can't necessarily leverage ASC/DESC injection unless multiple queries can be issued by a single SQL statement (i.e. injecting "ASC; SELECT * FROM...") which isn't commonly enabled in most database deployments. But there are probably other insecurities here.
The solution is to run every input through a function that replaces each single quote with two single quotes. That is all that's required to prevent SQL injection, since it's not possible to construct a valid query regardless of the injection point. (EDIT: This refers to string inputs. Numeric inputs are handled in the obvious way, as is ASC/DESC. Injection into an ORDER BY clause is not usually exploitable. These are all of the cases.)
If you propose this solution to management, they may be somewhat more likely to take action, since it can be applied to the existing system.
I respectfully disagree with your claim that replacing ' with '' will complete the security. Reason: there are non-string cases, so there exists concatenations like:
"WHERE SomeNumber = " + inputValue + " ... "
but inputValue is a string (from untyped xml), but is put into SQL without quotes because SomeNumber is an int type in the database. Since the xml is constructed without validation, an attacker could put any value there, including strings to inject, and do so without using quotes.
People born after the 1970s may not be aware of the Teton Dam collapse, but it was another case of total disregard for many, many expert warnings of disaster in the rush to build something big and impressive. Here's kind of a cheesy documentary clip describing the incident [1], but the book Cadillac Desert [2] goes into some detail if you're curious.
FWIW, there is a four-hour PBS style documentary based on the book. This book is the most prescient thing about the biggest issue ( multiply prescience by the size of the issue and it sorts to the top ) I probably will ever read.
Honestly, if its concerning personal data held by hr I would expect locks on the cabinets. At most places you cannot sneeze in HR without running into security.
They'll pay attention when someone emails them their own salary information. I'm kidding, that would be career suicide. Printing it out and leaving it on the printer would be much better.
This is why one should always lock his computer if he's going to be away from it. (Someone may sneak and print something from it while you're not looking).
The more I think about the Volkswagen debacle, the more I realize that it's an artifact of "nothing you do or say is guaranteed to be private forever."
Having been taken to court I can tell you this is completely true in all facets of life with anybody you interact with. It's dangerous to even joke about misdeeds. Always act with the utmost virtuousness lest it may come back at you someday.
I am told that China very nearly had an Industrial Revolution but that the Mandarins - whose credo was "that which is not required is forbidden, and that which is not forbidden is required" took over and it stopped.
No. I'm disclaiming that although I've never worked on anything where lives are on the line, in my experience, having management ignore, yet understanding, warnings is commonplace.
I then cited two anecdotes where warnings had been given. In one case, a breach followed, in the other, so far it has not. I would assume most warnings that are ignored turn out to be non-issues.
I worked in a company that made CNC machine tools at the time of the Challenger disaster. The actions of Roger Boisjoly gave me the courage to reject a demand from management that I add a "feature" to a control system I was working on to disable all safety lockouts from a single virtual switch.
I refused the assignment. When they didn't fire me on the spot I followed the managers who made the request and when they would ask another engineer to do it, I would warn the engineer that they could be held personally responsible for any injuries or fatalities that could occur.
I fully expected to be fired but surprisingly, management just gave up on the idea. Perhaps my insistence that it was a bad idea convinced them.
I'm not saying NASA didn't fuck up by any means. I'd just like to point out that hindsight bias is a very real effect that can have huge consequences when planning for future events: http://lesswrong.com/lw/il/hindsight_bias/
>Viewing history through the lens of hindsight, we vastly underestimate the cost of effective safety precautions. In 1986, the Challenger exploded for reasons traced to an O-ring losing flexibility at low temperature. There were warning signs of a problem with the O-rings. But preventing the Challenger disaster would have required, not attending to the problem with the O-rings, but attending to every warning sign which seemed as severe as the O-ring problem, without benefit of hindsight. It could have been done, but it would have required a general policy much more expensive than just fixing the O-Rings.
>Shortly after September 11th 2001, I thought to myself, and now someone will turn up minor intelligence warnings of something-or-other, and then the hindsight will begin. Yes, I'm sure they had some minor warnings of an al Qaeda plot, but they probably also had minor warnings of mafia activity, nuclear material for sale, and an invasion from Mars.
>Because we don't see the cost of a general policy, we learn overly specific lessons. After September 11th, the FAA prohibited box-cutters on airplanes—as if the problem had been the failure to take this particular "obvious" precaution. We don't learn the general lesson: the cost of effective caution is very high because you must attend to problems that are not as obvious now as past problems seem in hindsight.
The letter demonstrates foresight, not hindsight; a very long list of equally severe and plausible warnings is needed to make the point that one is unlikely to single out this one letter without the benefit of hindsight.
This is the crux. This letter is clearly very ballsy in a corporate context, I doubt there is a stack of similar memos in the Morton Thiokol archives which if acted upon as a group would have bogged down the launch. But at a minimum, making this argument requires at least one similar example.
Without that alternate case, languorous management is the most likely culprit.
Yes, there is a serious gap of logic in the linked article. In fact, it appears that if the O-Rings were better checked and improved, as the letter indicated they must be, the Challenger would not have exploded as it did.
What were the other warning signs which seemed as severe as the O-ring problem without the benefit of hindsight?
Your quote claims that it would have been much more expensive to address all of these things, but what's the support for that statement?
My (extremely limited, to be sure) understanding is that there wasn't anything else that severe. The O-ring problem was dire, and the only reason it didn't get attention was because schedule pressure and go-fever caused management to minimize the problem.
Note that the warning signs mentioned in the quote were not just theoretical test results, but also several cases where fire was getting where it was not supposed to be during actual flights. Further, there was strong evidence that these incidents were linked to cold weather. "Don't fly when it's really cold" wouldn't have been a particularly expensive mitigation, either, even if you argue that suspending flights until the problem was investigated and fixed was not reasonable. I mean, you're launching from Florida, a restriction not to launch in temperatures under 50F wouldn't have been a terrible burden.
How many other warning sings were as severe as the O-rings and as easy to mitigate? My money is on "zero" in which case this argument is just so much noise.
> My (extremely limited, to be sure) understanding is that there wasn't anything else that severe.
That's my understanding as well. Boisjoly wrote a detailed essay describing the thought process that they went through to arrive at their conclusion that the O-rings were a critical problem (the other three authors are a professor and two students at RIT):
The essay is a response to a criticism by Edward Tufte that the engineers did not present the issue properly during the telcon the night before the launch, and that this was the reason that NASA did not agree to scrub the launch. It makes clear that, at least in the minds of the engineers, there was no other issue even close to being this critical. Granted, they were concerned with only one subsystem, the SRBs--but the fact that this issue also dominated the telcon the night before the launch indicates that there was no other issue even close to being that critical in any other subsystem either.
[Edit: It's interesting, btw, that in the essay linked to above, Boisjoly accuses Tufte of precisely the thing Yudkowsky is talking about in his article: hindsight bias. He argues that Tufte did not recognize or allow for the fact that the engineers at the time had limited information, but talks as though they knew everything that was known in hindsight.]
This is a good read.
I do think the Tufte essay is excellent except for one detail. I don't think it's fair that he places blame on the the engineers for failing to convince NASA management to stop the launch. It is possible that better designed slides would have helped NASA management grasp the problem. But I think the fault is more so on them.
The premise of "failing to convince management" relies on the notion that "management can do no wrong". For example, if you tell your boss that we're about to drive the bus off a cliff, and he doesn't listen, it's your fault.
This is the way things work in many organizations, it's an old-school approach to management, but to me it is purely dysfunctional. When someone tells me I'm about to drive off a cliff, I say, "Wow, thanks! What did I fail to see that got us into this mess in the first place?"
Aside from Tufte's notion of blame, I think his essay is very instructive.
> It is possible that better designed slides would have helped NASA management grasp the problem. But I think the fault is more so on them.
I completely agree, and AFAIK the Tufte essay does not even mention a critical point in this regard, which the Boisjoly paper does. The engineers had already tried to get all flights stopped until the O ring issue could be investigated and properly understood, in the August before the Challenger flight. NASA had refused. So the NASA managers were already aware that there was a critical flight risk, and they had already chosen to ignore it. That means it definitely wasn't a case of bringing new information to management's attention in order to drive a decision. It was a case of trying to get them to change, at least in part, a decision they had already made.
The argument about cold temperatures has to be viewed in that light--it was an attempt to find something, in the absence of good, hard data and a solid understanding of what was going on, that would at least get NASA to delay some flights, since they had already refused to delay all flights. In fact, considered solely on engineering grounds, the argument about cold temperatures was fairly weak (as Boisjoly points out in his essay). But it was weak not because cold temperature flights were almost as safe as warm temperature flights; it was weak because warm temperature flights were almost as unsafe as cold temperature flights! (A previous flight with significant blow-by had been made at a temperature of 75 F, and test stand data showed that the O-rings were not sealing completely at any temperature below 100 F.) But the engineers couldn't say that the night before the Challenger flew, because they had already said it back in August and had been ignored.
> I do think the Tufte essay is excellent except for one detail.
I don't think that one detail is the only serious flaw in the essay. As far as I can tell, Boisjoly's criticisms--that Tufte misunderstood the actual issue (it was blow-by, not erosion), and that he misunderstood what information the engineers did and did not have (for example, they didn't have reliable temperature data for many flights)--are valid.
> "Don't fly when it's really cold" wouldn't have been a particularly expensive mitigation, either
The SRBs were not designed to work below 40F, so it would have been very reasonable to postpone launch (yet again). However, Reagan was going to give his State of the Union speech on that same evening, and putting a teacher in space was supposed to be a highlight of the speech...
I think your comment is great, but I have to disagree with one part in particular. I don't agree that it was a "some minor warning" when President Clinton told President Bush explicitly; "In his campaign, Bush had said he thought the biggest security issue was Iraq and a national missile defense. I told him that in my opinion, the biggest security problem was Osama bin Laden."
"Osama Bin Laden is dangerous" is pretty far from actionable intelligence, though. I think it's an example of hindsight bias to think that if President Bush had been more prudent in dealing with Bin Laden then the 9/11 attacks would have been prevented.
When the current Commander in Chief of the United States of America, the most powerful military in the world, says "the biggest security problem [is] Osama bin Laden," that's something entirely different.
And frankly, I trust the opinion of Richard Clarke more than I trust yours OR MINE.
To be fair, we spent most of the effort fighting in the wrong country (Iraq) chasing the wrong person (Saddam Hussein) who had nothing to do with orchestrating the attacks, nor had any viable "weapons of mass destruction" that were the purported reason for going after him in the first place.
That was 2 years later and had nothing to do with the search for bin Laden. Who they put out a huge search effort for, and continued doing so for the next 10 years.
Personally, I'd be concerned about capturing someone who was responsible for ordering the terrorist attacks that killed thousands of people, but I guess everyone's different.
It's not black and white. If President Bush had been more prudent dealing with Bin Laden, then the 9/11 attacks would certainly have been more unlikely to succeed.
To believe otherwise is to believe, essentially, in predetermination. Hindsight bias does not imply that things would have turned out the same today, no matter what people did in the past.
Right, what I should have written was "I think it's an example of hindsight bias to think that if President Bush had been more prudent in dealing with Bin Laden then the 9/11 attacks would certainly have been prevented.
It's very readable and goes into a bunch of other issues at NASA at the time beyond the O-ring failure, which Feynman treats as a symptom of more general engineering and cultural problems.
Feynman is well-known for his TV appearance where he dramatically demonstrated what happened with the O-rings, live...but in his memoir, "What Do You Care What Other People Think?", he credits being inspired by General Kutyna, who had asked him to join the commission. Feynman describes that a random comment from Kutyna led him down the path of revelation:
> Then he says, “I was working on my carburetor this morning, and I was thinking: the shuttle took off when the temperature was 28 or 29 degrees. The coldest temperature previous to that was 53 degrees. You’re a professor; what, sir, is the effect of cold on the O-rings?” “Oh!” I said. “It makes them stiff. Yes, of course!” That’s all he had to tell me. It was a clue for which I got a lot of credit later, but it was his observation. A professor of theoretical physics always has to be told what to look for. He just uses his knowledge to explain the observations of the experimenters!
However, Feynman admits at the end of his multi-chapter account of serving on the commission that Kutyna had coyly played Feynman:
> Another thing I understand better now has to do with where the idea came from that cold affects the O-rings. It was General Kutyna who called me up and said, “I was working on my carburetor, and I was thinking: what is the effect of cold on the O-rings?” Well, it turns out that one of NASA’s own astronauts told him there was information, somewhere in the works of NASA, that the O-rings had no resilience whatever at low temperatures—and NASA wasn’t saying anything about it. But General Kutyna had the career of that astronaut to worry about, so the real question the General was thinking about while he was working on his carburetor was, “How can I get this information out without jeopardizing my astronaut friend?” His solution was to get the professor excited about it, and his plan worked perfectly.
If we take Feynman at his word, to me this is a great description of how science and bureaucracy and invention (and, sometimes, disaster) works: it's not always just random eurekas, or study-real-hard-and-you'll-figure-it-out...sometimes the information is already well-known, and obvious, but for various logistical/political/bureaucratic reasons, it isn't immediately disseminated or explored.
via: Feynman, Richard P. (2011-02-14). "What Do You Care What Other People Think?": Further Adventures of a Curious Character (Kindle Locations 2923-2930). W. W. Norton & Company. Kindle Edition.
"Kutyna: On STS-51C, which flew a year before, it was 53 degrees [at launch, then the coldest temperature recorded during a shuttle launch] and they completely burned through the first O-ring and charred the second one. One day [early in the investigation] Sally Ride and I were walking together. She was on my right side and was looking straight ahead. She opened up her notebook and with her left hand, still looking straight ahead, gave me a piece of paper. Didn't say a single word. I look at the piece of paper. It's a NASA document. It's got two columns on it. The first column is temperature, the second column is resiliency of O-rings as a function of temperature. It shows that they get stiff when it gets cold. Sally and I were really good buddies. She figured she could trust me to give me that piece of paper and not implicate her or the people at NASA who gave it to her, because they could all get fired."
Engineers at Rocketdyne, the manufacturer, estimate the total probability [of catastrophic failure] as 1/10,000. Engineers at Marshal estimate it as 1/300, while NASA management, to whom these engineers report, claims it is 1/100,000. An independent engineer consulting for NASA thought 1 or 2 per 100 a reasonable estimate.
— Personal observations on the reliability of the Shuttle (Rogers Commission Appendix F) by R.P. Feynman
Actual statistics from the life of the shuttle, last flown as STS-135 8 - 21 July, 2011:
Presumably, those estimates are just for the Space Shuttle Main Engines (SSME/RS-25), give that's what Rocketdyne was manufacturing and propulsion is one of Marshall Spaceflight Center's fortes.
If you still haven't read Richard Feynman's appendix to the Rogers Commission report on the Challenger disaster, here's the closing sentence: For a successful technology, reality must take precedence over public relations, for nature cannot be fooled.
It seems almost unbelievable how such a stern, concrete warning can be ignored. Of course, we look at this memo with perfect 20/20 hindsight, knowing that the O-ring issue caused a disaster.
I am genuinely curious why the warning was ignored. I am hesistant to believe it was through malice or sheer incompetence. Does upper managment get two overblown warnings per week by engineering, so that this critical warning was "drowned by the noise"? Or were there good (at the time!) reasons to focus on other issues first?
I could see managers at NASA seeing their role as a union or labor leader, trying to keep their budget and their people employed at least as much a priority as safety. So they're willing to take more risks because they're in a position of having to balance an equation.
And it may be that the flaw is this uni-dimensional hierarchy that we seem to like, that seem efficient most of the time, but in reality not only miss edge cases but can cause them. The managers simply did not always 100% trust the engineers. Ultimately, the people with the power over go/no-go were more managerial than engineers, were more concerned with the politics, cost, perception of delay than they were at risk assessment.
All spending of money in any company or government is basically the same: someone at the top delegates the vast majority to someone else. And they choose that someone else based on trust. And the longer than chain is, the greater the likelihood someone is put in position where the chain of trust is just flawed. And you hope it gets exposed where people don't die, obviously. But that didn't happen here.
And that's why a bunch of people got angry at Feynman as well as Boisjoly. They exposed that the wrong people were making key decisions. Had they been the right people, then loss of life wouldn't have happened. Anyone that wrong is going to be angry, it's a personality flip of the coin whether they get angry with themselves for their mistake or with others for exposing it.
There was pressure to ignore warnings because President Reagan was scheduled to be present for the launch.
In higher levels of management, things get more political. Looking good and getting exposure count for a lot. No one wants to be the guy that made nasa look bad by slipping the launch date.
In tech, most managers tend to be rated by how well their projects meet schedules and how much operating expense they can minimize, since that's the only thing their managers have any sense of control over. Your boss doesn't want to take the heat for delaying project "glitter unicorn" by a week so you can fix 2 major security holes in the service, one which allows customers to login without a password. And this is why your company won't spend $500 for a data backup system or why it takes 4 months to get your broken keyboard replaced.
It's turtles all the way up.
I can totally see this - I mean, in my job management generates major stress for everyone to get a trivial product update shipped on time. Imagine the pressure to get a space launch to 'launch' on time. Makes our system seem like General Mao's. But better of course.
I was wrong--he wasn't going be attend, but there (allegedly) were plans to have a televised conversation between him and Christa McAuliffe during his State of the Union address that night.
However this may not be true. While trying to find an e-source, I read that there were some politically-motivated rumors that the white house ordered the shuttle to launch for this reason. Feynman investigated that rumor and didn't find any evidence that the white house ordered the launch.
It's possible the detail about an on-air tv conversation between the president and a schoolteacher astronaut could have been fabricated as well to make Reagan look bad, or it could have been based on elements of truth. For example she was expected to broadcast 2 lessons to students from space, so the capability definitely existed. It doesn't seem too far out that they would have liked for this conversation to happen during the State of the Union, though that doesn't mean the white house ordered the launch.
I read this in Edward Tufte's essay Visual and Statistical Thinking (http://www.edwardtufte.com/tufte/books_textb), which discusses how better information displays might have convinced NASA management to postpone the launch.
It also appears as a chapter in his book Visual Explanations.
I didn't have any luck finding sources on the internet (lots of keyword overlap). Tufte is much better informed than pretty much all of us, though it's possible he printed what amounts to an unsubstantiated rumor. FWIW he also served on the Columbia accident investigation board.
I think there were other pressures to launch as well. The launch date had been postponed 3 times already for various reasons. Cynical danek: I wouldn't be too surprised if some manager's performance review was based on how many launches took place on the scheduled date.
That's a good point. It is easy to just say that the others were incompetent without knowing the rate of false positives ie. how frequently were such memos written talking about catastrophes. The issue of rate of false positives vs true positives (i.e.,real problem) is something that most engineers and managers don't have a good grasp of.
this is true for this early warning from the manufacturing company, but the launch day reasoning was chilling. I'll be back with links to the worse parts.
> we look at this memo with perfect 20/20 hindsight
On a tangential note, I've always wondered about this sort of idiom. 20/20 vision is the equivalent of a 100 IQ -- it is equivalent to that of a "typical human". Perfect vision would be 20/ε (you can see at 20 feet what an ordinary human could only see from an infinitesimal distance), and mundane glasses will often correct vision past 20/20.
It seems like such a low bar for hindsight to meet.
Boisjoly later revealed this memo to the presidential commission investigating the disaster and was then forced to leave Morton Thioklol after been shunned by disgruntled colleagues.
I find this phenomenon so difficult to understand.
Seems fairly standard. He provided feedback that could have prevented a disaster and the managers ignored it. So at the very least, the managers would dislike him for the fact that he illuminated their incompetence, but even more so for the fact that he would be a reminder of the massive tragedy that ensued and for which they were partially to blame. With management's resentment, his peers would have a choice, maintain a friendly relationship with him and risk their careers and management’s wrath, or distance themselves from him.
It’s sad, but the natural tendency of organizations is to close ranks in situations like this. It is incredibly tough to perform an open, honest, self-assessment. The procedures need to be in place to force the self-examination and strong leaders need to push for it. Even then, politics sometimes place the blame on the wrong people. I’ve seen this happen time and time again in the military from working with SF in Iraq [1] to joining a squadron just after a disaster [2].
By ending up being the only one who was correct about a catastrophe warning, and being vocal about it in the aftermath, you somehow implicitly throw everyone else under the bus.
Since "everyone else" > "you", majority rules, you go.
It's a bureaucracy problem. I've seen it, and it's also gotten me at least once, when I decided to stick up for principles. My ego left intact, my job did not.
This is also why getting fired should not automatically carry a negative stigma.
I have a counterpoint- I bet that, given any risky endeavour, there are ALWAYS some naysayers/doubters. So the probability that someone at an organization ends up being correct when a disaster occurs, is probably fairly high. Thus, just because you won the disaster prediction lottery, may not entitle you to as much acclaim as you might think.
> I have a counterpoint- I bet that, given any risky endeavour, there are ALWAYS some naysayers/doubters. So the probability that someone at an organization ends up being correct when a disaster occurs, is probably fairly high. Thus, just because you won the disaster prediction lottery, may not entitle you to as much acclaim as you might think.
Conversely, achieving success despite taking a great number of risks in the process, may not entitle you to the acclaim that you do get.
You just found out your company screwed up big, and it turned into people dying and a bunch of geopolitical egg on your country's face, but if everybody keeps quiet, then maybe you'll get lucky and nobody will figure out it was your fault.
But one guy tells the world exactly how you screwed up. Now you're in fear of your reputation and job -- but if he'd just kept his mouth shut, then you might be okay.
You need to read a book called Systemantics by John Gall. It's a hilarious and accurate look at how large systems fail. Although parts are somewhat dated, the principles are the same.
In particular his material on bureaucracies is relevant: when threatened, bureaucracies immediately fight for self-preservation. Boisjoly was threatening the bureaucracy and so had to be eliminated for it to survive. Its actions makes perfect sense in the light of this. Ugly, yes but understandable.
I remember the Challenger explosion like it was yesterday. I'll link here to a scan of the pages from Edward Tufte's book Visual Explanations about what went wrong with the data analysis during the launch planning, which understated the risk of a launch in cold weather.
Can you summarize one or two of Boisjoly's main points?
Tufte's pictures of the (superfluous) rocket graphics showing the temperature vs. his redrawing, placing temperature on the x-axis are very convincing.
1) Tufte had temperature datapoints for the previous launches that the Boisjoly's team didn't know[1]. I assume "know" to mean the complete historical data wasn't all consolidated conveniently at his fingertips before the disaster. Presumably, Boisjoly's team could have gotten it if it had occurred to them to gather it. Essentially, Tufte had the benefit of "hindsight" to make his compelling diagram showing cause & effect.
2) GIGO garbage in is garbage out. The Tufte temperature datapoint assumes that the outside ambient temperature is equal to the O-ring temperature[2] so substituting one for the other is wrong. (E.g. It's wrong to put them both on the same X-axis.)
[1]"He thus supposes that they knew the temperatures at launch of all the shuttles and, assuming they acted voluntarily, infers they were incompetent."
[2]", in addition, mixes O-ring temperatures and ambient air temperature as though the two were the same."
I thought the rebuttal was very well written, if a bit dense.
For (1), you're correct that the engineers did not have the historical data. More than that, though, it did occur to them to request the data, but they were stifled by Morton Thiokol Management (and NASA).
To start with, there were a variety of previous problems with the O-rings caused by variables that appear unrelated to temperature. These problems were resolved, but prevented anyone from seeing a pattern. It was only on the basis of the single data point in SRM 15 that Boisjoly requested temperature data in advance of the launch.
Obtaining such data was far from simple, because, as you mention, the temperature of the O-ring isn't the same as the ambient air temperature. Thus, obtaining the data was relatively involved and required knowing many variables: time on the pad, the gradient of ambient temperature, the temperature at which testing was conducted, and so forth. For this reason, the engineers didn't compile the data themselves (unclear what process they'd need to get the data).
The engineers thus requested the data in advance, but had not received it. They had precise data on only two data points (at 53 degrees and 75 degrees), so the rest of the data in the chart was compiled after the fact.
"The data necessary for a calculation of O-ring temperatures was thus not collected all along during the shuttle history. And when Boisjoly asked for that data in September, along with much other data, any one of which might have been the crucial missing piece to explain the anomalous cause, it was not supplied. In fact, the engineers received none of the data they requested."
* Both axes of the chart are inaccurate. The temperature on the X axis intermingles ambient and O-ring temperature, but they're not the same (imagine ocean temperature vs air temperature). The vertical axis, O-ring damage, is semi-relevant, but the important question is whether the O-rings held a seal (degree of blow-by). O-ring damage probably correlates, but seems like a proxy that may be misleading.
* The data available in the chart was not available to the engineers at the time. There was only one previous failure (SRM 15) that led anyone to believe temperature itself was an issue. They had two valid temperature data points (SRM 15 and SRM 22), and correctly pointed out that another launch at 29 degrees was completely outside their tested range and would be inherently risky. It's not clear to me that a scatterplot with two datapoints would be that effective.
Furthermore Boisjoly criticizes that Tufte presumed to judge the engineers, for the following reasons:
* The engineers previously recommended no launch should occur, months in advance (due to previous O-ring issues), but NASA overruled them.
* The engineers recommended, and their managers accepted, that the low launch temperature was outside their test database. NASA came back and requested proof that the temperature was dangerous, but the engineers could not comply, because the parameters were extreme enough that they had not tested them. They couldn't prove a negative. The managers at Morton Thiokol and NASA then jointly overruled the engineers.
* The idea of putting together a chart was not something the engineers considered, because there were a variety of previous unrelated problems in O-rings (each resolved afterwards) that muddied that data, and muddies the data that Tufte presents.
* Tufte himself failed to research or note the pivotal information, and thus misrepresented the situation the engineers were in, and the data available, while imputing that the engineers should be held morally responsible for not presenting data they didn't have (and then going on to present such data himself, compiled after the fact and at his own leisure, incorrectly).
That reads like a knee-jerk reaction to the (implied) accusation that the engineers were at fault for not presenting their case correctly. It doesn't dismiss Tufte's arguments for clear visual presentation of data; you could say that both of them are right.
I saw the Challenger explosion live. I was a sophmore in High School in Florida and standing outside at lunch. Shuttle launches were a fairly regular thing and I'd seem many. But this time I could see the trails as parts fell away to earth. It wasn't like the fuel tanks the fell away usually. I was talking with a guy named Wes. He said "that doesn't look right". It didn't. After lunch there was announcement that the shuttle had exploded.
It doesn't seem that long ago really. The weird thing is.. in that same amount of time in the future I'm going to be a 75 year old man:)
I walked in to the TV showing the launch. I hadn't known it had exploded. I didn't realize I was seeing a replay.
I spent my time as a kid living on Vandenberg Air Force base, and my father would drag me out to Minute Men missile launches early in the mornings. I saw several launches. Even two launched at once.
I had seen many launches before, so I had no expectation that they could go wrong.
Watching the shuttle explode like that was shocking.
> I had seen many launches before, so I had no expectation that they could go wrong.
This mindset is interesting. Issues like plane and train crashes grab headlines and are typically catastrophic, but end up being far more rare than, say, a fatal car crash. Similarly, the Space Shuttle had two major incidents in 30 years, and the Concord had only one, but they were very public and very catastrophic.
Is it the rarity or severity that make them such big news?
Rarity is certainly a big factor even from an objective viewpoint. Concorde instantaneously went from the safest airliner flying to the most dangerous with that crash, just because it didn't fly much. Flying on a Space Shuttle was one of the most dangerous activities out there, judging by the historical failure rate.
It helps just in terms of grabbing your attention, too. "Man drives into lake" doesn't make the news. "Man drives Bugatti Veyron into lake" does, just because the unusual car makes it more interesting.
In the case of Concorde and the Shuttle, the machines in question were not only rare, but beloved by many, and seen as a national symbol. It's almost like assassinating the country's leader.
To be honest, I think you are - to an extent - falling into the same trap that the managers fell into at Thiokol: namely, assuming that your view of the situation is complete and that input from other work groups can be disregarded.
In reality, it's a very big problem for companies to disappoint their large customers (as NASA certainly was for Morton-Thiokol at the time), and I suspect that the engineers would have been quite annoyed at management if the ultimate result of a launch no-go was fewer contracts, lower pay and/or job losses.
This isn't to say that management did the right thing, of course, just that "lol management just throws some buzzwords around and never attempts to understand the problem" is basically the same attitude that caused this disaster - just seen from an engineering viewpoint.
I am "management" where I work, and I'd drive my company into the ground and make everyone unemployed, before I allowed my defective product to kill someone.
Duh. You think a single person at NASA would have reported otherwise had you asked them before the Challenger disaster?
Social pressure can influence people into engaging in wishful thinking. They made a horrible judgment call, and we should remember that that tragedy was authored by them. But if you think you NEVER would have fallen for it... you're not being self aware. There's a pretty decent chance you would have.
As someone born in 1995, I don't have any real emotional connection to the Challenger explosion. It's pretty bizarre to think that children born after the 9/11 attacks will probably feel the same indifference.
If you are an engineer, it would be good to try to learn from this and similar accidents. I.e., noting your lack of emotional connection is beside the point, the question is, what can you take away from what happened?
Of course, it's extremely important. I'm currently listening to the Freakonomics podcast on the Challenger explosion[0]. I was only noting the difference being born a few years apart can make when it comes to significant shared cultural memories.
You will notice this sort of thing more and more as time goes on.
For me the "ah hah" moment was when it hit me that "Where were you when you heard about Challenger?" was my generation's version of, "Where were you when JFK was assassinated?"
Another easily observed one is music. Very few people are aware of much that happened in popular music after they hit 25.
You can listen through 65 years of pop music made so far. Then, if you live 80 years, you only lose about 45,8% of all pop-music you could have heard during your lifetime. That's not half bad.
If you include some blues, jazz and classical, that percentage goes down significantly.
Regards, 29 year old with 4 year long amnesia about music.
As someone born well before the 9/11 attacks -- were there any positive consequences to the emotional attachment they provoked in the public? Indifference would obviously have been the correct response. What actually happened was not dissimilar to stabbing yourself in the face because you think a mosquito might have landed on it.
I think every generation has their big events and I think they overlap. For baby boomers, it's the JFK assassination (maybe other things). For Gen X'ers like me, it's the Challenger (and 9/11 I'd argue). For Millennials, it's 9/11.
This letter is an excellent illustration of why good writing matters, and why bad writing can be disastrous.
The most important sentence in the letter ("The result would be a catastrophe of the highest order - loss of human life.") is at the end of the third paragraph, and is effectively hidden by two paragraphs of dense, technical jargon that I, as a layman, cannot understand at all.
I honestly think that if he had just taken that sentence and moved it to the end of the first paragraph, 7 astronauts' lives could have been saved.
I strongly doubt it and I don't think you've worked in an engineering environment if you think this is true.
This document is written as a credible engineering analysis of a distinct problem and uses extremely strong terms. It isn't addressed to the general public, or an article on Buzzfeed, it is an interdepartmental memo from an engineer signed by his manager to the Vice President of Engineering.
Your example of "clear writing" would have just made this engineer look hyperbolic and likely undermined the issue even more. In a document like this, stating the physical problem up front is clear writing, and then, once the engineering problem is stated, he immediately states the possible outcome and then describes what he says as the management failure to allocate the appropriate resources, and how to fix it.
How it comes off to you as a layman doesn't dictate whether or not this is good writing. He wasn't writing it to you.
I respectfully disagree. Safety is of the highest priority, so if there's a serious risk to human life, there's nothing hyperbolic about drawing attention to it. If you can't do that, there's something very wrong with your engineering culture.
To help the busy reader, the first paragraph of a memo should summarize the whole document. (Like the abstract of a technical paper.) This document is about an engineering problem and it's serious consequences, so they should both be mentioned in the first paragraph.
I think it's extremely misguided to assume that the problem here was a matter of writing style. I think it's naive to assume that the people who received this memo weren't aware that it was bringing attention to an engineering issue that could lead to loss of human life. I think it's misguided and not supported by the evidence to assume that was the gap in understanding that lead to the problem. They're working on rockets. People working on rockets know what the stakes are. These are the risks engineering projects like this are structured around dealing with.
And I think it's pretty disrespectful to the engineer who is still haunted by this to say that what he really should have done was switch some sentences around and that would have totally solved the problem.
This is a strongly worded document.
Making safety a priority doesn't mean starting every engineering document with the words "loss of life" which is a really common outcome of engineering failures on programs like this. Making safety a priority means putting the risk of an engineering failure up front and knowing that an engineering failure in a life-critical system is critical. Making safety a priority means people don't have to tell you what the stakes are every single sentence, because everyone already knows and so what you really communicate is how much risk there is, not the fact that risk exists. Making safety a priority means even if you're working on an engineering problem that wouldn't lead to loss of life, you fix the thing because you might be wrong and it might be part of a correlated failure one day that does lead to loss of life. Making safety a priority doesn't involve writing engineering documents in a way that makes them more amenable to skimming.
It's a rocket. Engineering doesn't have to go that far wrong on rockets for people to die. When someone sends a letter to a VP of engineering which begins with "This letter is written to insure that management is fully aware of the seriousness" then everyone who receives that memo is paying attention and if they aren't then it's not the writing skills of the people involved that are at fault.
I don't think a single person who was aware of the O-ring issue was unaware of the stakes. That didn't show up in any reports on the panel. What did show up was they estimated the risk of the problem wrong. The first sentence of this engineer's letter went towards establishing the seriousness of the engineering failure. Because that was the part that needed to be communicated most clearly.
Engineering degrees universally require a technical writing course. Engineers universally revile it as busy work. This memo is Exhibit A for why written communication is critical for engineering, and I imagine an assignment to rewrite this memo to be more effective would be a good way to underscore the importance of clear and concise writing.
I upvoted this comment because the point about the importance of writing. However, I don't think conciseness and clarity was the problem here. In 1985 if an engineer personally composed a long letter you should automatically know it's important. If you read this letter at all the danger is made clear.
First fix the grammar mistake ("insure"). Then put the Bottom Line Up Front. Bold it if possible.
Cut out the fluff like 'this letter is written' and 'a jump ball as to the success or failure.'
Rewrite the second paragraph with some verbs and a proper subject.
Not that any of this really matters in the end. The final decision was fully in the hands of management and NASA during a teleconference, during which argument was made.
What stands out to me the impact of loyalty on corruption. Despite doing obviously the right thing: Boisjoly later revealed this memo to the presidential commission investigating the disaster and was then forced to leave Morton Thioklol after been shunned by disgruntled colleagues.
Elsewhere in the discussion[1], we learn that key information about the O-rings had to be carefully, anonymously disclosed to protect the jobs of several people, including a prominent astronaut.
How do we avoid corruption when loyalty uber alles is the rule of almost all organizations?
For small enough organization that "loyalty uber alles" prevents corruption. What is small enough? For a military company it's around Dunbar number. Squad that actually works together on single issue is often optimal around 8 individuals. Company, squad and brigade are most important formations, as they have dis-proportionally high number of things expected to handle independently.
For this NASA o-ring bullshit, simplest solution seems to be having the technical managers and the astronauts in the same in-group. Now your loyalty is about not getting your mates killed.
If you need a big organization, the trick is to have that internal network of squads and companies to work somehow non-corrupt and nice manner. This is where it gets hairy. Basically you can go with assumption of corruption and apply "transparency" or monetary incentives (sub contractors). Or you can assume that corruption does not happen. Sometimes the mere assumption that corruption does not happen stifles it. Practically for big companies failing less than your competitors is adequate for success.
It's unlikely that one would be corrupt against ones squad members. So as squad member, you only have to worry about corruption of outside shit. People who affect you and are not member of same in-group are the problem from any individuals point of view.
>How would that approach stifle it?
Most people are good by nature. The only instance I've stolen from work was a situation where it was expected that I might steal from work. So I kind of showed to them that I'm more cunning than they are careful. A challenge. Another point comes from self image, if everybody sees you as corrupt asshole, you see yourself as corrupt asshole. So you might as well act on it.
In general people have strong tendency to act as is expected of them. Stronger than the tendency to act as they are told.
This was a terrible thing. So many saw this coming and nobody listened to them.
In my mind, what made this tragedy even worse was the way the program itself was conducted. You learn new modes of transportation and hardware by using it, many times to the point of exhaustion. We should have built a dozen orbiters and flown the shit out of them through hell or high water, learning as we went along. Instead we built 5 and every time something happened we backed farther away from the entire manned space program.
A lot of time was wasted, and the lessons we didn't learn? Somebody else is still going to have to learn them someday.
I think it's emblematic of the whole Shuttle program to look at the flight test program. There basically wasn't one. In particular, look at the various abort scenarios, and then look at the abort testing. Don't look too hard for the abort testing, because they didn't do any. There was pretty much zero resiliency in the system, and they knew it.
Here's an interesting thing about the Challenger (and Columbia) disaster: We find it to be particularly devastating, the same as a major natural disaster or terrorist attack. But on paper, they don't even compare. The explosion of the Challenger killed 7 people and cost NASA around $40 billion. The destruction of the world trade center killed over 2600 and the cost to the private insurers alone was over $40 billion. Hurricane Katrina killed over 1800 and cost at least $100 billion. Not even the same ballpark. So why the big emotional impact? Because they stood for something important.
Today, there's no way I could get away with writing an entire page of context before getting to the impact. I wonder how much context the recipient had and what reading expectations were like back then...
Leaving aside the fact that the subject includes 'Potential Failure Criticality', the key impact line "The result would be a catastrophe of the highest order - loss of human life" occurs less than 150 words into this. It's the 5th sentence (though they're admittedly long sentences).
Are you having serious engineering-related discussions with your management where they can't read 150 words before getting to the hook? (serious question, not snark)
Is everyone communicating via twitter or something? (sorry, that was a bit of snark)
If I was writing that to most bosses I've had, the first line would include the words 'death', 'critical', 'failure', 'negligent', etc. I would assume it would be skimmed otherwise. Even still, I would guess that a lot of the stuff in the middle would be skimmed and I'd have to reiterate my point in the last sentence.
I disagree. This is a fairly poorly-written persuasive letter, and we have ample evidence for its lack of persuasive powers. The letter opens with useless content "This letter is written ..." which is self-evident. Then it proceeds to the grammatical error of "insure" which almost made me stop reading. If I was writing a letter like this, and I wanted it to have some effect, this would be the first sentence:
The total loss of a future shuttle mission and the death of its crew is a near certainty with our current booster o-ring design.
Then I would omit everything else in the original letter except the request for staffing.
So, I certainly wasn't trying to claim that the letter was a well-written persuasive one. However, I was more shocked with the implication that engineering management can't be expected to read past the first line of a memo if it doesn't hook them.
In most organizations I've worked in (generally large ones), the rank-and-file engineers/scientists are below-average communicators (a stereotype, I know, but it's been true). They choose the wrong level of technical detail for their audience (or send it to such a large cross-section of people that there is no appropriate level for everyone). They certainly make grammatical mistakes. It's up to engineering management (who are generally better communicators) to pull the salient bits out of the communication and then pass that up the chain in a more clear manner. I certainly can't imagine a manager going "well, Darren's letter that warned me of a likely loss of human life used 'insure' instead of 'ensure', so I'm just going to assume that the rest of it is drivel and go on about my day".
Your way would certainly have been better. Related note, one of the best writing classes I ever took with regard to its effect on my current work was actually a journalism class. The notion not to bury the lede has been key as attention spans wane (apparently more than I'd realized).
While I agree with you about the difference between "insure" and "ensure," that's not a universally accepted grammar rule. I think the may even be in the minority; The New York Times, for example, uses "insure" for all instances.
It does? I have a 5th Ed. New York Times Manual of Style and Usage right here and it clearly distinguishes between the two. Under the entries for both ensure (p 120) and insure (p 170) it gives examples of the other.
Well, I guess that's what I get for believing my first Google hit for "insure vs. ensure." Nonetheless, I see a lot of highly literate people using "insure" for all cases.
> Then it proceeds to the grammatical error of "insure" which almost made me stop reading
Let's hope the management reading the letter wasn't so fickle. Oh wait, hmmmmmm.
> The total loss of a future shuttle mission and the death of its crew is a near certainty with our current booster o-ring design.
From the point of view of the engineer, that would be a misrepresentation of the facts. An exaggeration.
You're basically outlining the difference between engineers and marketers. I don't mean that as a sleight - perhaps the situation called for a bit of "marketing".
- If you believe that is not some kind of technical report, then the jargon doesn't belong there.
- But how would you call it? Memo? If yes, isn't it a memo about an technical problem? Or an another way to ask: How to describe a technical problem without using tech jargon?
Many people don't have sufficient time to do a good job. You need to optimize the layout of data so that the critical message is clearly delivered first.
An ultra-executive summary is what your memo/email subject and first line should be. Then expand out with more and more detail as necessary.
There's a great book that attacks dense writing and how to fix it [1]. It is a quick read.
For example the author takes this sentence:
Pelicans may also be vulnerable to direct oiling, but the lack of mortality data despite numerous spills in areas frequented by the species suggests that it practices avoidance.
... and turns it into this:
Pelicans seem to avoid oil spills by avoiding the oil.
This sentence is shorter, but it also contains less information. I would say the 75% reduction in sentence length comes with a 66% reduction in contained information.
Ha, spoken like a true engineer. However, if the point of writing the sentence is not data storage but communication, then a long, convoluted sentence can be less successful than a shorter, more readable one. Maybe I picked a bad example. Here is another one:
Before:
Perception is the process of extracting information from stimulation emanating from objects, places, and events in the world around us.
After:
Perception extracts information from the outside world.
Yes, the first is more specific and more detailed, but the "outside world" includes things like objects and places and doesn't add much to the gist. The second one is certainly more readable.
I often see writing (especally from engineers) that is meant to be informative that is overwhelmed with irrelevant information. That is difficult to parse. Sometimes you want to pick out just a few relevant facts and leave out anything that isn't strictly needed so your audience understands the point.
It depends on your audience and you reason for writing what sort of information is needed and what is not needed. Its not an easy skill.
Interesting. I'm curious, do you find the technical content to be the 'thickening agent' or the style?
I work in a related field, and am vaguely familiar with both the general makeup of this type of rocket and this accident in general. I also tend to write longer, more complex sentences.
I did not find it to be particularly 'thick', and I'm curious whether it's because I'm more familiar with the technical details and so avoided the glossy-eyed stare from that aspect or from the fact that a few dozen words in a sentence doesn't faze me.
The former would mean this shouldn't bother anyone practiced in the arts of spaceflight (such as the VP of engineering), but the latter would have been a problem.
Both. But the problem is not the thickness per se, it's that the lede is buried beneath the technical specifications. The "thickness" compounds the problem.
"we stand in jeopardy of losing a flight along with all the launch pad facilities" should be in the first sentence of the whole letter, not the final one.
Interoffice memos were often used to document important issues that had been previously discussed. I'd be really surprised if this issue hadn't already been brought to management's attention verbally. In the early 90's (pre email), I recall having to put things in interoffice memos so that there was a paper trail of meeting discussions.
Exactly. The memo was a way to put a stake in the ground that can be impossible at a meeting that someone else is conducting.
At a meeting, if the organizer chooses to move on to another topic after you present an issue, you have no choice but to go along. Here's a Shuttle Flight Readiness Review at Kennedy Space Center:
Currently, the (large) meeting room at NASA JSC where flight decisions are made has many big red hand pieces at which anyone who has an issue can break in.
I don't think this was the initial memo to call attention to the issue, but the final "I am putting my objections in writing" memo after he realized it wasn't getting fixed.
From reading quite a bit of historical correspondence, yes, people wrote much more fully developed memos and letters back then. It's something that we have lost in the modern age.
I'd bet that this letter was written longhand or dictated and handed to a typist. That may be part of the reason why.
I would think that when someone writes a document like this that the context is really important for the record and that they're not so concerned with making an "impact".
I remember hearing about how bad communication played a role in the warning not being heeded. But reading the memo now for the first time, I don't agree with that at all.
Don't forget this warning was to a VP of Engineering and was referencing previous issues that were being worked on. There seemed like plenty of context there to me.
It is perhaps sad, but I'm heartened by this letter. I had heard that warnings were given in advance, but this is clear (in fact, I find the "catastrophe of the highest order" language to be less impressive than the clear and defined "loss of human life").
Why am I heartened to see that someone foresaw a lethal accident and was ignored? Because it was foreseen, and the consequences understood, with clarity. Getting people to take clear warnings seriously seems a more easily solved problem than getting us to be able to recognize the danger in the first place. A single person in management, or a few people, being dense is fixable. Groupthink where everyone assumes someone else is checking for problems and no one does is harder to fix.
Now, just because it's an easier problem doesn't make it an EASY problem, but still, easier.
I bet there were other such warnings, many of them valid, that just didn't turn out to be catastrophic.
If there are any high level managers here, please tell me, does it seem like your people are constantly alerting you of dire risks? Does it feel like you are inundated with worriers?
I think the way to look at it is from the point of view that people have before the event happens. You also weigh in all the warnings that you are receiving.
No matter what you build, if it's complex enough you're always going to have individual predicting doom. The challenging part is filtering the signal from the noise and owning the decisions you make.
Isn't something missing? I was under the impression that the accident happened due to the cold. Nowhere in the memo does it say anything about how to avoid the problem, nor does it call for a full stop of launches until its fixed. It seems to be referring to a known issue without explaining what that is or what to do - specifically. Or did I miss that?
They knew the o-rings were being eroded away during the flight, but the secondary o-ring was 'squishy' enough to fill in the gap and prevent the erosion from actually destroying the vehicle. While the erosion was unexpected, they figured the 'backup' was doing its job, and actually ended up increasing the predicted safety margin (i.e. it's only working 1/5th of the way through, so we have a 500% safety margin, yay! despite the fact that any erosion was unexpected in first place)
The problem was, the cold made the secondary o-ring stiff enough that it didn't 'squish' as much as it had in previous launches, so the o-ring failed completely.
Imagine being on the team that discovered and warned about this issue, and then a year later watching the replays of the shuttle exploding on television. Just knowing that's what had happened and your worst fears at come true.
My parents in law worked with Greg Jarvis at Hughes who was on that mission. They had a touching memorial at Hughes for his wife according to my mom in law. Very sad that it was preventable.
I think many of us here have been Boisjoly at some point, though with much lower stakes. This is how people are, they want to do the thing. Doesn't matter how much you warn them.
This memo was not sent to the administration, it was sent to the management of Morton Thiokol, who could reasonably be expected to understand basic engineering phrases like "loss of human life."
It's a real PITA to engineer safe systems (see e.g. IEC 61508 and ISO 13849 in industrial automation), but it saves lives, and if you're in a position where Sales, Production, etc. is trying to get you to rush the Engineering work so they can make deadline, you've got to find the backbone to say "no," even if it hurts you professionally.
You'd think so, but they probably saw "loss of human life" and went "hysterical. ignore.".
It's actually really hard to warn people. Warn them loudly and strongly and they think you're scaremongering and being unnecessarily negative. Warn them quietly, and nobody listens. Warn them just right, and they'll take it under consideration, but go ahead anyway.
When of course the inevitable happens, as the guy who predicted doom, you'll be blamed, because "it wouldn't have happened if you hadn't made a self fulfilling prophecy".
It was not sent to anyone in the political administration. It was sent to the VP of Engineering at the company that manufactured the solid rocket boosters.
When I saw this, I realized we would have to start over and choose different components able to tolerate the higher voltages. My managers disagreed, arguing that if we caused problems, we would be denied a follow-on contract for similar inverters to be deployed in the payload bay. The managers composed a reply to NASA in which there was no problem, things were fine, let's forge ahead.
I was a mere engineer, I had no management authority, and I hadn't been consulted about the reply. When I heard about it, I sat down and wrote a letter of resignation and pushed my letter into more hands than was absolutely necessary. I made some comparisons that at the time might have seemed over the top (like the Apollo file that killed three astronauts, a disaster resulting from lax oversight).
In my case, because of my having distributed the letter farther than was absolutely necessary, my managers were forced to reverse themselves, I was able to redesign my inverters in a safe way, and we got the follow-on contract in spite of not being seen as "team players".
Many years later, at the time of the Challenger disaster, it finally dawned on me that, had I disregarded the overvoltage issue as my managers had wanted me to, and if something had gone wrong, I would have been held personally responsible, because I was the only person with the level of technical knowledge required to make the call, and my managers could disavow any responsibility. At the time, I made the right decision, but for reasons that I hadn't fully thought out -- if my equipment had failed in-flight, I would have been held responsible, and that would have been a perfectly just outcome.