Count to ten when a plane goes down

noisy_boy · on July 22, 2014

Firstly, firing the intern doesn't make sense - it was a mistake waiting to happen and he just happened to do it at the wrong time.

Secondly, the punishment meted out should be:

1. Proportional to the degree of carelessness (in this case not that much since he accidentally hit a wrong key adjacent to the right one, didn't mow down anybody while driving drunk)

2. Inversely proportional to the likelihood of the error (in this case the likelihood was very high since the reset key was a. uncovered/single-press b. right next to single reset key).

3. Proportional to intention (this was a completely unintentional error)

If you say, that the punishment should also be dependent on the degree of damage, I would say that the responsibility of managing the risk of such damage wasn't his but of the person responsible for implementing such a high risk design. If such a person is not around, find the person who approved such a design. Government departments are usually very good with paper trail.

jaredandrews · on July 22, 2014

The author has actually responded to this on his blog: http://johncbeck.tumblr.com/post/92502108047/so-what-did-you....

    So what did you do after you got fired from the embassy?

    What I didn’t say is that it was the last day of my summer internship. The next summer they invited me back again. Everyone understood it was a mistake, but by officially firing me, someone had been punished … :)

So I guess it wasn't a big deal after all.

nathanb · on July 22, 2014

I think you're wrong to ignore the consequences of the actions as an input to the punishment. A small amount of unintentional carelessness that causes huge damage could still be punished. One could argue that a certain degree of mindfulness in critical situations is a job requirement, and casually making a careless -- though unintended -- mistake demonstrates a lack of mindfulness indicating that the person is not properly qualified for the job.

I understand that we don't want a culture that fires people for making the sort of mistake anyone might make. But to be so careless on a day that is clearly an exception where something more important than standard business procedure is going on, can't you at least see why firing the intern for such a lack of mindfulness might at least make sense, even if you disagree with it?

I interned for a government organization that maintains hydroelectric dams and the software that controls them throughout the Southeastern US. A careless mistake could -- in the worst case -- cause blackouts, cost the company millions of dollars, or even cost lives (if the data-control feedback loop caused a turbine to spin up at the wrong time or to fail to shut off in an emergency). And, as is quite common in organizations with non-software-engineers running the show, the development processes were entirely haphazard. The environment was such that it would be really easy for me to push unreviewed code, or to make a stupid deployment mistake, or to be careless in a number of ways that the system didn't protect me against.

But it was OK, because they hired smart, competent people who understand the need to triple-check, if necessary, before committing. People who understood the gravity of the situation, and who didn't phone it in if they weren't feeling it that day. If I demonstrated that I wasn't one of those people, I would fully expect to be fired.

woodchuck64 · on July 22, 2014

> I think you're wrong to ignore the consequences of the actions as an input to the punishment.

This equivocates on "consequences" of actions, though. It's obvious that the consequences of hitting F7 before the incident were understood by all responsible to be low enough that any intern could be expected to make the right decision. After the incident, the consequences of hitting F7 were sharply increased such that no future intern would ever be allowed to make that decision. But then you can't make an argument that assumes "consequences" were the same at both points in time.

We make this fallacy all the time probably because we're designed by evolution to reassess the morality of an action based on consequences. It works as a social heuristic for shaming or rewarding people but it makes no rational sense that the morality of an action should retroactively change based on future consequences. You can see similar behavior in our rewarding athletes for profound genetic advantages, or punishing criminals for profound genetic deficits. The consequences somehow redeem or condemn, and they should do neither.

bonaldi · on July 22, 2014

No, the consequences were the same before and after the incident: a total system reboot. The varying factor here was temporal: it was usually a low-risk action when the office was empty, a high-risk one when the office was full.

The negligence on the intern's part was to make decisions and act without regard for risk as if he was in the low-risk window despite the evidence he was actually in the high-risk one (all the already-active PCs).

It makes perfect sense that the punishment should reflect inappropriate regard being given for known consequences. That's what negligence is.

woodchuck64 · on July 22, 2014

> No, the consequences were the same before and after

I'm talking about the perceived consequences, not the actual consequences. The fallacy here is to perceive low consequences at one point in time, perceive high consequences at a later time and then try to change history such that low consequences were never really perceived.

> The negligence on the intern's part was to make decisions and act without regard for risk as if he was in the low-risk window despite the evidence he was actually in the high-risk one (all the already-active PCs).

He was in a perceived low-risk window. The perceived consequence of accidental reboot was already figured in and was already perceived to be low. Else why would the F7 key be next to F6? It is certainly unfair to expect someone to perceive high-risk when everyone else perceives low-risk.

> It makes perfect sense that the punishment should reflect inappropriate regard being given for known consequences. That's what negligence is.

The perceived consequences were low-risk, therefore the known consequences were low-risk.

bonaldi · on July 22, 2014

> He was in a perceived low-risk window.

... because he was negligent. "Oh, all the computers are already on? That only happens when Washington's waiting on something. Oh well, I'll carry on like this was any other low-risk morning"

> Else why would the F7 key be next to F6?

Same reason why "rm -rf " is the one keystroke away from disaster. Perceived risk has nothing to do with it.

woodchuck64 · on July 23, 2014

> "Oh, all the computers are already on? That only happens when Washington's waiting on something. Oh well, I'll carry on like this was any other low-risk morning"

Because those situations were also low-risk mornings. He only saw that pattern when people left late. He had no reason to expect that people would be working early in the morning because that situation had never occurred. Further, a secretary playing a computer game in the morning suggests business as usual, no one working.

> He was in a perceived low-risk window. ... because he was negligent

No, someone else set up the computers and software with F6 and F7 command functions side by side and then evaluated the entire network as low-risk for interns under all situations. It is perfectly reasonable for an intern to take the same low-risk perspective as his superiors.

> Same reason why "rm -rf " is the one keystroke away from disaster. Perceived risk has nothing to do with it.

Perceived risk has everything to do with it. It is inconceivable today that an intern would have unrestricted access to a company's file system and be literally a few keystrokes from disaster. The key reason for that is because perceived risk now is much closer to actual risk. In 1983, no one had a clue about the kinds of things that could go wrong. Understanding real risk is a painstaking process requiring time, trial and error.

hackuser · on July 22, 2014

> it was OK, because they hired smart, competent people who understand the need to triple-check, if necessary, before committing. People who understood the gravity of the situation, and who didn't phone it in if they weren't feeling it that day.

In my experience that is not nearly sufficient for implementing any process that can't tolerate errors. It is necessary to have conscientious people of course, but they still are humans. Given the opportunity for 2,000 hours a year, year after year, they will screw up.

Humans are very bad at following procedures. For recent examples, consider the people operating our nuclear missiles and those protecting our bomb-grade nuclear materials. If even they don't have enough motivation to follow procedure ...

samspot · on July 22, 2014

I got a strong impression from the article that he had no idea there were extra people there, or that there was even a critical situation. He wrote as though he was doing a mundane daily task, and said he was really surprised his boss was even there. Given those details, he had no reason to have a heightened sense of awareness. He also mentions that a reset of all workstations should have had no impact at the time.

In this situation, the secretary playing the game is just as culpable as the intern, which is to say, not really responsible.

nathanb · on July 22, 2014

Well, he mentioned that he noticed an exceptional circumstance right when he arrived -- notably, that computers which were usually his responsibility to turn on were already on. You could argue that a less negligent (and more aware) individual would have extrapolated that into a heightened sense of awareness.

67726e · on July 22, 2014

There is a wholly different level of responsibility and ability to ensure quality in your scenario that is simply not present in the article.

In one case accidentally pressing the wrong key deleted incredibly important data, while in your case you have plenty of time to review and ensure quality at your leisure.

acdha · on July 22, 2014

He's used a longer version of that story to view this as a management lesson – it's definitely not just “blame the intern”:

http://globis.jp/774-2

kyrra · on July 22, 2014

Sorta weird how the 2 articles differ from one another about being "fired".

Count to ten article:

> I, naturally, felt terrible and was, appropriately, fired.

Honesty Wins article:

> But, naturally, that day was my last day of work at the American Embassy. But, not because I was fired; although, I might have been fired if that day didn’t just happen to be the last scheduled day of my summer internship.

acdha · on July 22, 2014

Bureaucratic jit-jitsu:

http://johncbeck.tumblr.com/post/92502108047/so-what-did-you...

“Oh, don't worry boss, we sacked the fool who made that mistake!”

genericuser · on July 22, 2014

I find it surprising that the exact key he fat fingered wasn't burned into his long term memory. Not that it would of been helpful to him down the line, actually the level of obsessing which would of taken place after the fact to remember it this many years later would of been quite counter productive.

It is simply that given the simplicity and consequences of the error, it is the type of thing I generally see people beating them selves up over until they can not forget it.

(In case you are saying to yourself but he did remember the key, look at the two versions in one he says F6 machine reboot F7 all reboot, and in the other he says F7 machine reboot F8 all reboot, indicating that while I hope he knew the keys functions then, he has since forgotten the exact key, or is substituting F keys for story telling purposes)

HCIdivision17 · on July 22, 2014

I bet this entire article comment thread would be completely different with this additional context. Thanks, it certainly paints a better light on the situation!

zeidrich · on July 22, 2014

Punishment is not a good way to correct behavior.

Your number 2 is particularly wrong. For punishment to work in affecting behavior, you can't punish for a very unlikely event, especially accidental.

Punishment changes behavior by making people anxious and afraid of the punishment. If you punish something that's very unlikely, it does no good. It's like if pushing ctrl-F restarted the stations, and the guy has never pushed ctrl-F before, and never been punished for it. It's very unlikely that he would ever push ctrl-F. But he happens to trip getting up and accidentally hit ctrl-F while he's catching his balance. Does that warrant a heavier punishment? It's more unlikely, certainly, but what would the punishment change about his behavior?

Punishment works because you are afraid of it. It works because you want to avoid it. But accidents don't happen because of defiance or a rational decision making process.

If you were punished moderately and frequently for a common mistake like hitting F7, it could correct behavior because you would be more vigilant when hitting F6. Having it be proportional to the degree of carelessness in terms of correcting behavior is not important. If someone is more careless, they will get more frequent punishment. If the punishment is too strong, it will just make people fearful instead of correcting the behavior.

Firing someone who consistently makes mistakes is a corrective action, not punitive.

Punishment is generally more of a cultural thing and less of a means of correcting an issue. Punishment is expected, so it's delivered. In western culture we have a particular need to find someone responsible and punish them. Rarely though do you feel "I don't want to get punished, so I am going to do this right." but it's not uncommon to think "I don't want to get punished so I'll avoid this altogether."

Corrective behavior is better when it's not punitive. Look at the design of the software, correct that problem. Look at the systems that allowed this to happen, correct them. Work with the staff and find out why this could happen, help them correct it. If people are punished for writing the software poorly, they're just going to cover up the flaws that they find instead of bringing them to light to correct them. If staff are punished for making mistakes, they're going to hide them instead of seeing if they can fix them.

Punishment is often just a game to abdicate responsibility. "Oh, it wasn't my fault. It was his fault. The proof that it is his fault is that he got punished for it. I've done my part to solve this problem."

Especially in complex environments like corporations and government, I think that the last thing you should do is look for a person to blame. Instead of looking for the person responsible for implementing the design, or the person who approved it. Look at why it was implemented, how it was approved. Instead of pinning it on an individual, pin it on a system.

I think you should only look at an individual if they are committing malfeasance for the purpose of benefiting themselves outside of the system. If the person approved the design because they weren't aware of the potential risk, then find out why. If they approved it because there was supposed to be another safeguard to stop it from accidentally happening, find out why that wasn't there. If they approved it because they gave the contract to their friend who wasn't the best decision, and overlooked issues for a cut, then go ahead and blame them.

If there's a problem with the person, say the designer was just irreconcilably bad, then remove him. If it's a problem with training, then train him. If it was something he did as a greenhorn in the past, and now he's much better, then for God's sake don't punish him for a mistake he made years ago when he was put into a project that was more important than the skills he was hired with, unless he grossly lied about his skills.

mikegreco · on July 21, 2014

The author states they felt it was appropriate when they were fired. In what world would it be appropriate to get fired for a single, simple, incredibly easy to make mistake? Doubly insane when there were exactly zero safeguards in place to prevent the mistake from being made.

dimitar · on July 22, 2014

According to his next post:

What I didn’t say is that it was the last day of my summer internship. The next summer they invited me back again. Everyone understood it was a mistake, but by officially firing me, someone had been punished … :)

basicallydan · on July 22, 2014

Well, that clears it up - and it's quite a clever way to tick the arbitrary "someone took the fall" box.

apples2apples · on July 22, 2014

Itoh should have been fired.

nathanb · on July 22, 2014

Why? How did Itoh demonstrate that the company would be better without him than with him?

apples2apples · on July 22, 2014

He was the one who was more responsible for the incident than the intern.

HCIdivision17 · on July 22, 2014

As acdha above notes, he was. But the author incredulously blurted out that he himself had hit the button when he was told his supervisor had been fired over the event (maybe ten minutes later).

The reality was that his boss took the fall for him, which is awesome and terrible. Much of the discussion in this thread has been a tempest in a teapot due to missing context.

His supervisor took the fall to protect him, he was fired on paper, but it was his last day anyway, and he did actually get to work at the embassy again, as it really was a simple innocent mistake.

Though certainly one with serious, long-lasting consequences.

Edit: http://globis.jp/774-2

nathanb · on July 22, 2014

OK, probably I'm just dumb and have poor reading comprehension (and will get downvoted again for asking a simple question), but can you explain why Itoh was responsible?

It seems that the translators could have saved their work more regularly -- perhaps they hold some of the blame. Obviously the poster could have thought a bit before hitting the button -- he holds all the blame for the resetting of all the terminals. How is Itoh "more responsible"?

apples2apples · on July 23, 2014

My reasoning is that Mr. Itoh put an intern in charge of a system that could cause major damage. It's like giving the intern keys to your AWS console and shitting your pants when he terminates all you EBS root disks that you didn't back up.

Mr. Beck wasn't culpable because he didn't understand the full effects of his actions or the tension of the current situation. Ioth should not have let Beck in the door that morning and he should not have given that much power to the intern.

Scuttles · on July 22, 2014

Makes sense (and is convenient). The people at the embassy had to have someone to blame, and report back to their superiors that they had dealt with the situation appropriately.

bonaldi · on July 21, 2014

The world where you know you that a) you have an incredibly powerful key with no safeguards at your fingertips and b) you might be in a breaking news situation and nonetheless you go for the key right next to the dangerous one carelessly enough that you miss?

Think about it: Unix is equally as "insane". If you're the guy on the console who meant to clean out some crap dir and accidentally typoed "rm -rf /" and then caused an international crisis you're going to get fired too.

Then years later HN will call for Dennis Ritchie to get fired instead.

megablast · on July 22, 2014

Also, the situation where somebody has to be fired.

I imagine that someone wanted someone's head, so whose head should it have been? They guy who wrote the system couldn't be fired, he was in a different company. And maybe a macro has been assigned to that key, so it wasn't his fault anyway.

colanderman · on July 22, 2014

The person in charge of minimizing risk to their internal systems.

Unfortunately, most small companies have no-one who fills that role, or if they do, it's the same person who both has the power to fire others, and is unwilling to entertain the notion that they themselves are at fault.

nmjohn · on July 21, 2014

Except he didn't know there was a breaking news situation. He said pushing the wrong button wouldn't normally be a big deal.

bonaldi · on July 21, 2014

He said he came in to find the system running, and the only time that happened was when Washington was waiting for info.

And even if you don't buy that, if pushing the button's not a big deal it doesn't need all the safeguards everyone's yelling for. (Had such safeguards been in place he might equally well have seen them, thought "oh, nobody's in, this will do what I want anyway" and approved it).

vacri · on July 22, 2014

That button needs a safeguard even if it's not an international incident. Even if only one person would lose a few hours' work from an all-nighter, work for something that's not so important, it's still someone's work.

And given that it's right next to a 'single terminal reset' key, it should be immediately obvious to anyone who's ever used a keyboard - mistakes can and do happen, even when you're fluent.

coldpie · on July 22, 2014

And yet, to this day, Firefox has both Ctrl-Q (close all Firefox windows without prompting) and its neighbor Ctrl-W (close current tab) and refuses to change that or provide remappable keyboard shortcuts. One of the biggest UI failures I'm aware of in 2014.

https://bugzilla.mozilla.org/show_bug.cgi?id=52821

hrjet · on July 22, 2014

This in combination with "Restore all tabs on startup" not being the default is a disaster.

The first thing I do on a new Firefox installation is enable "Restore all tabs on startup".

hack_37 · on July 22, 2014

Even without that flag enabled, you can restore all tabs when closing/reopening Firefox. History > Restore Closed tabs.

hrjet · on July 22, 2014

Thanks, good to know.

schnable · on July 22, 2014

This is in all apps in OS X, that's the standard keymapping.

noisy_boy · on July 22, 2014

Ctrl-Q exits on Thunderbird too. Since in MS Outlook that combination marks a mail as read, for keyboard-heavy users switching between the two, it is no fun.

iancarroll · on July 21, 2014

One might wonder if that workstation showed the online and offline status of all the computers.

kstenerud · on July 22, 2014

The worst unix disaster I ever saw happened to one of my co-workers. He was working on a client machine, logged in as root because he needed to compile and install some complicated software. As he was working, he did an ls -l /bin and copy-pasted it to a text editor so he could make sure everything was installed correctly. Unfortunately, after returning to his console, he accidentally hit paste. Most of /bin was actually symlinked somewhere else. As you know, ls shows symlinks like this:

lrwxrwxrwx 1 root root 20 Apr 27 17:02 cc -> /etc/alternatives/cc

Guess what happens when you paste a whole list of those into a console as root?

lcedp · on July 22, 2014

That's fascinating.

To prevent this from happening with me, I've added the following line to my `.Xdefaults`

    URxvt.perl-ext: confirm-paste

(I'm using `rxvt-unicode`)

http://i.imgur.com/joHRXaH.png

talmand · on July 22, 2014

The worst unix disaster ever? Could you elaborate for non-unix people such as myself?

smwht · on July 22, 2014

The important character here is the '>'. This redirects output to a file and overwrites the file. The lrwxrwxrwx will only print an error, but the redirect to the target executable will erase the target.

For example:

  $ echo "asdf" > foo
  $ cat foo
  asdf
  $ lrwxwrwxrwx 1 root root -> foo
  lrwxrwxrwx: command not found
  $ cat foo
  $

So basically, this zero'd out every executable on the system.

talmand · on July 22, 2014

Yep, that's bad.

Out of curiosity, what was the solution to fix all that?

mianosm · on July 22, 2014

Possibly a cp or an scp from a remote system with a working set of binaries.

kstenerud · on July 22, 2014

scp was in /usr/bin, so we could at least copy enough basics from another system and recover the rest from a backup. Needless to say we lost the client contract.

Tyr42 · on July 22, 2014

Oh no, that redirect!

tshaddox · on July 21, 2014

But typing "rm -rf /" is significantly harder to do accidentally than typing F7 instead of F6.

graylights · on July 22, 2014

Not really, a lot of novice unix users are of the habit of removing files with -rf switch. I cringe everytime I see it.

The command "rm -rf ~/blue/" is just a single space key from being equivalent to "rm -rf /" with "rm -rf ~/blue /"

jonreem · on July 22, 2014

On any modern system it's actually "sudo rm -rf / --no-preserve-root" and then entering your password while staring at the command.

"rm -rf ~/blue /" will not come close to deleting / unless you are in the habit of running every command as sudo, even ignoring the presence of --no-preserve-root

jfb · on July 22, 2014

Much, much, much the worse is "rm -rf ~ /blue". I don't give a crap about 99% of the stuff outside of $HOME, but of course, the stuff in $HOME is the stuff that's trivial to destroy.

lloeki · on July 22, 2014

You're missing the point:

    $ cd dir where there are source files and temp files
    $ rm *.tmp # or so you think
    '.tmp not found'
    # too bad

daurnimator · on July 22, 2014

Except when: (these are terrible lessons to learn)

1. You type it into the wrong system (D'oh)

2. You have run `mount --bind / /somewhere/else` then `rm -rf /somewhere` a week later

:(

colanderman · on July 22, 2014

It boggles my mind that --one-file-system is not the default :/

daurnimator · on July 22, 2014

> Not really, a lot of novice unix users are of the habit of removing files with -rf switch. I cringe everytime I see it.

Every few days I remind myself of this.... then I have to delete another directory with a git repository in it, and end up add the -f in again

couchand · on July 22, 2014

I run into this all the time, too. Now my -rf usage is almost always wrapped by this:

    rmgit()
    {
        git status
        read -p "Are you sure? " -n 1 -r
        echo
        if [[ $REPLY =~ ^[Yy]$ ]]
        then
            rm .git -rf
        fi
    }

anon4 · on July 22, 2014

I think this would be a bit better for interactive cases. Note: written just now, I haven't actually felt the need for this safeguard... yet.

    rmrf()
    {
        (echo "The following files are going to be deleted!!!"
         for FILE in "$@"; do
             echo "<<<" "$FILE" ">>>"
         done) | less
        read -p "Are you sure? " -n 1 -r
        echo
        if [[ $REPLY =~ ^[Yy]$ ]]
        then
            rm -rf "$@"
        fi
    }

colanderman · on July 22, 2014

Doesn't help against network filesystems mounted in subdirectories. --one-file-system (which really ought to be the default) prevents this.

mandalar12 · on July 22, 2014

What would be the "good" alternative ? I often tries "rmdir" or "rm -r" if the directory is not empty and very often there are some "protected files" so I add -f. Thus it happens that I directly lauch "rm -rf".

couchand · on July 22, 2014

Watch it fail first, verify that the failure makes sense, check to see if there's a way to delete one file with -f before deleting the rest with -r. Use -rf only as a last resort, and only by appending the f to an already-failed command whose syntax you've validated.

artursapek · on July 22, 2014

More easily, "rm -rf ~" with a premature "Enter"

corobo · on July 22, 2014

do the -rf after the directory to avoid this in future

Premature enter just results in:

[web@server /]$ rm ~

rm: cannot remove `/home/web': Is a directory

krylon · on July 22, 2014

Hehe, I did something like that once, except that I typed "rm -rf ~ /blue/". There was no /blue/, but I managed to wipe out my home directory, and I did not have a backup. :-| It was on my personal machine, so at least I did not delete anybody else's files, but I still got burned hard enough to learn a valuable lesson.

svachalek · on July 22, 2014

The / key was right next to Enter on a lot of old keyboards. It was quite easy to type 'rm -rf /tmp/garbage*' and have a simple fumble turn it into 'rm -rf /'. I mean, there's this guy I know, he did that once.

australis · on July 22, 2014

There's also the chance of it happening when writing a script

Such as this classic:

https://github.com/MrMEEE/bumblebee-Old-and-abbandoned/commi...

jzwinck · on July 22, 2014

Ten years prior to that, Apple had a similar bug in one of its installer scripts on OS X. I have a hard time finding much about it online now, because it happened at a time when OS X and the Internet were a lot less popular than today, but what I recall is that an unexpectedly customized installation directory (say with spaces or one level closer to "/" than the default) would cause the installer to delete a whole lot of things.

iopq · on July 23, 2014

I once installed GGClient in C:/Program Files/ instead of any particular folder

so of course, I said "I'll just uninstall it from there and install it in the correct folder" and it proceeded to delete C:/Program Files/

bentcorner · on July 22, 2014

There's also this one from Pool of Radiance's 2001 release:

http://www.rpgfan.com/news/2001/1416.html

Uninstall wipes out your Windows directory.

Paradigma11 · on July 22, 2014

I still have nightmares from this buggy mess. To add insult to injury the shop didn't want to take the game back afterwards.

thinkpad20 · on July 22, 2014

The one that I did only a few months ago was something like

    $ cp -r path/to/some/directory path/to/very/important/directory
    $ (run some commands to verify copy did what I wanted)
    $ rm -r path/to/some/directory path/to/very/important/directory

Of course, all I had meant to do was delete `path/to/some/directory`, but I just pressed 'up' in my history and switched `cp` to `rm`. Of course I hit Ctrl-C in an instant, but my FS was already hosed...

thefreeman · on July 22, 2014

Not really, on my keyboard at least / is directly next to . and you could feasibly be clearing out a directory or something with rm -rf .

jessaustin · on July 22, 2014

It's my habit to never use the -f flag until I get those annoying confirmation messages. I <CTL>-c to cancel that command, then scroll up and add the flag to run again. I think this is a good habit? Anyway, the worst thing I've done along these lines was resetting a dev DB that had seen considerable un-backed-up configuration work. I couldn't blame rm for that.

colanderman · on July 22, 2014

Eh, "rm -rf $TEMPDIR/$TEMPFILE" in a shell script is just a couple typos away from deleting everything on the network. Yet I've seen people put crap like that in build scripts even after they've previously inadvertently deleted half the network drives.

Fortunately, despite rm's poor choice of options and bash's poor default handling of variable name typos and the obvious PEBKAC, backups saved the day here.

People are human. Policy ought to reflect this.

sentenza · on July 22, 2014

Be honest, who doesn't have

rm -rf *

in their shell history? Now it's just one accidental twitch away.

TimWolla · on July 22, 2014

I suspect everyone tried to remove all dotfiles and dotfolders with rm -r .* as well...

pjc50 · on July 22, 2014

Many years ago I wrote a kernel module for my own use in response to a similar incident. It checked to see if the calling process was deleting a file called ".landmine" and killed the calling process if it was.

Far from perfect - it depended on the order of deletion - but a more general solution than preserve root. Of course it still requires the user to mark things they consider "important".

canjobear · on July 21, 2014

Are you trying to say it's OK because Unix behaves analogously?

It is definitely a problem with Unix also.

bonaldi · on July 21, 2014

It is. Which is why everyone in Unix who types "rm -rf " then types their next character _very carefully_ and reads the line before committing.

I'm trying to say that when you've got something dangerous without safeguards, you take care around it. Not taking care of known-dangerous things and causing severe damages as a result is an arguably good case for dismissal.

graylights · on July 22, 2014

The proper answer is to type "rm <dir> -rf" otherwise you're risking a stray strike on enter.

gatehouse · on July 22, 2014

The other proper answer is to have a good backup and recovery system.

goblin89 · on July 22, 2014

> The other proper answer is to have a good backup and recovery system.

And, of course, that should've been the solution to OP's incident with the Korean Airlines flight 007. Backups, surprisingly, are scarcely mentioned at all in this whole thread.

mreiland · on July 22, 2014

rm -rf / is not quite the same as the f7 key restarting all machines sitting right next to the f6 key to restart a single machine.

you cannot fat finger rm -rf /

claudius · on July 22, 2014

  # rm -rf /tmp/bla

  # rm -rf / tmp/bla

mreiland · on July 23, 2014

astrodust · on July 22, 2014

You can fat-finger enter before you're done typing.

mreiland · on July 23, 2014

that's a pretty fat fucking finger...

thrownaway2424 · on July 22, 2014

You forgot the part where the only reason he was fucking with the F6 key in the first place was to play a game. That's irresponsible and grounds for firing.

ghayes · on July 22, 2014

Actually, from the article he was a system administrator and another employee had been playing a game which froze her own terminal. The author did nothing wrong except press the wrong button (and to your point: not report his coworker for playing games on her terminal in her free-time).

bdunbar · on July 22, 2014

> not report his coworker for playing games on her terminal in her free-time).

I was enlisted in the Marines, MOS as a programmer (4063), 1989 - 1993. I never really programmed, but spend my time as a small computer support guy.

Computer games were officially forbidden, but unofficially tolerated, provided one was discrete. I suspect the same 'don't ask don't tell' policy applied to EUCE at the embassy in question.

Sea story. My team was once directed by our boss, the Major, to 'sweep' the command for 'games' and remove them from computers. This took the better part of two weeks, and was massively unpopular with our peers. 'A Marine On Duty Has No Friends', we repeated to ourselves. We even got into the spirit of things and deleted games from _our_ computers.

Near the end of this evolution I hand-carried some paper into my Major's office. He was, yes, playing a computer game.

He did at least have the grace to look embarrassed.

nerfhammer · on July 22, 2014

He publicly embarrassed USG, POTUS and a major US ally. SK is going to call up the state dept and demand an explanation. Someone has to be fired. This isn't some startup in California where everyone just plays it cool. The termination of his boss and his boss's boss and his boss's boss's boss all the way up were probably considered as well.

jheriko · on July 22, 2014

I think you will find they embarssed themselves.

Responsibility flows upwards, not downwards. Its just unfortunate that the people at the bottom are often carrying the people above far more than they should...

pfisch · on July 21, 2014

He was probably fired because they realized they couldn't put a summer intern in control of such a critical system. I would make the same call.

mikegreco · on July 21, 2014

If you go with this line of reasoning, whoever put him in that position should also be fired, and their boss should be fired for putting someone in charge who made such a poor decision in the first place.

jasonwocky · on July 22, 2014

The person that should be fired is always the person who has responsibility for the amount of budget represented by the loss.

e.g. No intern that needs permission to get a box of pencils from the supply closet should ever be fired for putting a mistake into production that costs a company $100,000. If a company loses $100,000 on a mistake, you look to the person in the hierarchy who manages budgets of that size. It's their job to make sure the safeguards are in place to prevent losses like that.

In government it's difficult but not impossible to put a dollar value on losses like this. In this case, whoever was in charge of that network, and could request budget to build safeguards (whether software or training) against such mishaps, was ultimately responsible. Firing the intern is just shit rolling downhill.

lotsofmangos · on July 22, 2014

Firing the intern means that the story you just spun to your boss about how this all occurred won't be contradicted by the intern and you might not get fired.

davidw · on July 22, 2014

Interesting and very sensible comment - it's one of the few here that adds some real value to the discussion.

hueving · on July 21, 2014

Only part of that makes sense. The person putting an intern in such a position of control should be reprimanded, but it wouldn't make much sense for the next level higher because there isn't a blatant mistake. Hiring someone that turns out to make a mistake isn't as blatant as giving an intern the power to shut down a mission critical system.

toyg · on July 21, 2014

They might have been, we just don't know.

pessimizer · on July 21, 2014

That may have happened.

dm2 · on July 22, 2014

I do agree with you.

An equally sufficient solution would have been to install a safety switch on any button with that much importance. Something like this but probably smaller, or just a plastic cover that fit over the F7 key: http://www.thinkgeek.com/product/15a5/

A fireable offense would be lying about the action or trying to cover it up.

Should the lady who asked for the reset be fired for playing a game and asking him to reset the computer?

Should the technician that didn't install some sort of safety be fired for not foreseeing this issue?

He would be much less likely to make the same mistake in the future than the person who would replace him.

If there are terminals that could erase a presidential report and there is no backup available, you send a non-critical staff member to guard every one of those terminals, or at least put a sticky-note in the middle of the monitor.

I'd say several other people deserved to be fired for this, but the intern was not one of them.

mseebach · on July 22, 2014

Appropriate or not, it's probably what was to be expected in what I imagine even in 1981 was not the most enlightened HR management regime (the US foreign service). Also, 32 years is a lot of time to wash away the bitterness of having been unfairly fired from a summer job, especially if you, as the author, ended up doing pretty well for yourself.

Also, whether or not it was appropriate is completely irrelevant to the story being told.

emn13 · on July 22, 2014

I expect the HR management back then was more enlightened than it is now - the quick-to-pounce media and politically instigated witch hunts (terrorism, save-the-kids, etc.) have ensured that cover-your-ass is more and more a necessity.

billmalarky · on July 22, 2014

That sounds like rose colored glasses. The cold war had its fair share of witch hunts. Granted this was the 80's not the 50's but lets not forget McCarthyism was based around democracy vs communism.

"That Korean announcement and the slow response by the US President—both caused by delayed real information—caused decades of conspiracy theories."

Spooky23 · on July 22, 2014

HR was called "Personnel" in those days. It was different.

The reality here is that an intern doesn't have civil service protection, so it is quick and easy to dispose of them. Going after the supervisor may take longer, so if rapid action is needed, they'll fire the first person who serves "at the pleasure of" the executive.

emn13 · on July 22, 2014

Oh, I don't think people are any worse - I just think there's more adherence to the letter of whatever regulations there are now. The principles of zero-tolerance (three strikes...) have been widely applied, regardless of the nuance of some situation.

Note also that it turns out the OP was "fired" but immediately rehired too...

ufmace · on July 22, 2014

On the other hand, they're hiring people to do these things, not computer programs. Shouldn't we expect them to notice that when doing routine-thing-x, it's awfully easy to accidentally do catastrophically-dangerous-thing-y, and thus it would be a very good idea to be extremely slow and deliberate when doing routine-thing-x?

I have to routinely create and drop databases on my local system. Our production databases, which I also have to connect to, contain hundreds, maybe thousands, of person-months of work. I realized that it would be a good idea, before issuing DROP DATABASE commands, to deliberately stop and double-check what server I'm connected to. Luckily, I haven't screwed that one up yet.

logfromblammo · on July 22, 2014

I sure am glad that I never accidentally pushed an unlabeled and unprotected "get fired immediately" button. If it is important to not have all the workstations on site shut down at once, go to the control terminal and disconnect the keyboard before the system startup employee comes in without any clue as to what is going on and starts his ordinary daily routine. Maybe write a note and wedge it into his keys?

Based solely on the shortened account, it was not appropriate at all to fire him. Convincing him that it was is just doubly inappropriate. There may be more to the story, but as it is, it looks like angry scapegoating against a hapless, lowest-level employee.

rrss1122 · on July 22, 2014

When so many people higher up are given incomplete information or even downright embarrassed on the world stage because of one simple mistake, I feel it is appropriate. It is still, however, doubly insane that one person's stray keystroke can do all of that.

sergiotapia · on July 21, 2014

Insane, but somebody had to pay for that screw up and you can bet your ass it wasn't going to be the guy managing the newbie 23 year old. It'll be the newbie 23 year old himself.

ProAm · on July 22, 2014

attention to detail is a more valuable skill than people realize

chengiz · on July 22, 2014

IMO cleary he is not telling us the entire story.

njharman · on July 22, 2014

In the world of bureaucratic need for scapegoats.

endtime · on July 21, 2014

> In what world

Japan, I guess? I've never been there but the story was consistent with my impression of their work culture.

_0nac · on July 21, 2014

This was the American Embassy, which follows American work culture.

Also, in Japanese companies, it's basically impossible to fire people. They can, however, be assigned to a desk in a windowless room and be given nothing to do for several years, until they take the hint and "voluntarily" quit.

patio11 · on July 21, 2014

c.f. http://www.nytimes.com/2013/08/17/business/global/layoffs-il...

Suffice it to say that I am aware of situations created by a societal expectation of lifetime employment which make the above article look positively sane. (And I recently learned that, in some cases, what I had assumed was just an ironclad social contract actually is legally enforceable, which blows my mind.)

eru · on July 22, 2014

Not true. See correction to article.

nandemo · on July 22, 2014

The correction is unclear... Since there are always exceptions, of course it's not the case that "dismissing a permanent employee (正社員) is always illegal". But there's in fact a (somewhat vague but broad) provision in labour law and also　precedents that make it very difficult to legally fire a permanent employee in normal circunstances.

Basically you can legally fire a permanent employee in the same sense that a civil servant in most countries can be fired: if the employee does something egregious, like stealing from the company, not showing up for a long period of time with no reason, etc. Certainly not for incompetence, or even　if the company has been in the red for several years in a row.

E.g. Japan Airlines went basically bankrupt (technically a restructuring) and even so they had trouble laying off part of the staff.

wildpeaks · on July 22, 2014

The Japanese way is good in the sense that, as long as you have Internet, you could make your startup without worrying about putting a roof over your head or finding an office space to work from, and you get a still-full salary to bootstrap it without having to put the time.

You even get access to a pool of other soon-to-available engineers to work with if you're stuck with other poor sods in the room.

Definitely another scenario than the being suddendly kicked out of the door by security right before the week-end with a box of your belongings and, if you're lucky, a tiny check to not starve until next week.

billmalarky · on July 22, 2014

I highly doubt you are allowed to retain ownership of anything you create on the job though.

HillRat · on July 21, 2014

It was the AMEMB in Tokyo, though the basic principles apply to ay bureaucracy answering to political masters. Interns don't get AFSA (or, at the time, AFGE) union representation, and somebody's head was going to roll for that mistake, even though the company that programmed a non-confirmed global reset into a single keypress was truly at fault. Fair? Nope. Inevitable? Yep.

mathattack · on July 22, 2014

Lots of fields are like that. You generally get paid better because of the risk. (I hope he was!)

ck2 · on July 22, 2014

How about when Russia returned the data recorders after years of refusing to South Korea - made a press spectacle of it - and then South Korea discovered the recorders were empty and missing the data tapes when the press was gone.

Or the US navy crew who received medals after shooting down the Iranian airline.

Once there is loss of life, it is 100% politics afterwards with little to no practicality, just look at all the mass shootings where there were zero changes afterwards. We simply do not value life, it is politics first.

nness · on July 22, 2014

> Or the US navy crew who received medals after shooting down the Iranian airline.

You make it seem like they received the medal for having shot down the plane. In reality, those who were awarded medals, were awarded Tour of Duty medals for their time spend in a combat zone. I believe the distinction is important, particularly since that class of medals are routinely awarded to individuals during their time in the military.

ck2 · on July 22, 2014

If a police officer shot and killed innocent bystanders, should they get achievement awards for doing their job otherwise?

My answer would be no, you failed at your job regardless.

Same thing with military.

mason240 · on July 22, 2014

They didn't fail. They were ordered to shoot down a plane, and they shot it down.

maaku · on July 22, 2014

That's a very narrow minded position.

Gosh I hope you never screw up even once, cause you'll never live it down.

tobltobs · on July 22, 2014

I assume you are american. Would you be also so generous if the majority off killed people would be americans?

maaku · on July 22, 2014

Wtf does being American have to do with anything? And no, it doesn't make a lick of difference. The given situation was a fighter pilot who shot down a commercial airliner under orders later at some point in his career being given a medal for some completely, 100% unrelated reason. Maybe he deserved to be tried for war crimes. Maybe he also deserved that medal. Can you see that this is not cognitive dissonance?

nness · on July 23, 2014

It wasn't actually. Iran Air Flight 655 was shot down via a SM-2MR surface-to-air missile. Its the first paragraph on the Wikipedia article.

This is an insanely important distinction, since a fighter pilot would have had the opportunity to visually inspect the craft beforehand. Where in this case, the accident was made due to incorrect system information which misidentified the aircraft as being anything other than a commercial airline.

ZoFreX · on July 22, 2014

He didn't screw up once, he was criticised for being over aggressive and initiating fights on multiple occasions.

maaku · on July 22, 2014

Who are are talking about?

72deluxe · on July 22, 2014

Would you say that making a mistake in software is equal to taking a human life?

maaku · on July 22, 2014

Depends on the what the software does.

72deluxe · on July 24, 2014

Not sure why the downvotes, but I think for the application software that probably 98% of us here write, it isn't as important as we'd like to believe in comparison to a person's life.

Either that or life is really really cheap.

evilduck · on July 22, 2014

If you were the developer for a Therac-25, yes, yes I would.

jackschultz · on July 21, 2014

The issue form most new stories is that even when the truth comes out, the great majority of people will never hear the actual facts. One issue is because news stations move on from caring about the story quickly. Or, the bigger issue in my opinion, is that people won't believe the new, correct facts since the old ones will have been engrained in their head. Solving both these issues would be really helpful for society, but are obviously damn hard to solve since we haven't really gotten anywhere in this space.

Aardwolf · on July 22, 2014

When there is such chaotic news story, I usually switch from news to Wikipedia. That has all the facts and continues the story even after all media lost interest.

justizin · on July 22, 2014

It's a sign of poor management that someone has to be fired when something goes wrong, outages are learning situations for all involved, and it is widely held that the person who took the action that caused an outage is not responsible, but that all involved are responsible.

See John Allspaw's Swiss Cheese Theory : http://www.kitchensoap.com/2012/02/10/each-necessary-but-onl... .

[ Edit: I guess it's not Allspaw's model, but he applies it to systems engineering rather well - http://en.wikipedia.org/wiki/Swiss_cheese_model ]

"Accidents emerge from a confluence of conditions and occurrences that are usually associated with the pursuit of success, but in this combination—each necessary but only jointly sufficient—able to trigger failure instead."

The person who pushed the button is not at fault, the manager is not at fault, the guy who designed the button is not at fault - all are jointly responsible.

Blaming the intern does, however, reflect extremely poorly on Itoh and everyone else in the chain of command. A superior who demands retribution for a simple mistake that happened to cause him or her pain is basically worthless.

But, I forget, we're talking about Ronald Reagan.

tokenadult · on July 21, 2014

I remember this time sequence very well because I was living in Taiwan when the incident happened. Yes, people who lived in east Asian time zones saw news reports that appeared to be based on knowledgeable sources that the plane might have landed safely with all passengers alive. This explanation of why the Western-aligned diplomats and military officials based in east Asia didn't have complete information when they were interviewed by the press is quite interesting, and explains puzzling memories I have from that day.

jere · on July 22, 2014

>And let’s hope that there is no stupid 23-year-old with his finger on an important keyboard in this information chain.

No. This is something you would read in Design of Everyday Things where Don Norman would totally shame the the engineers who made that system. Software shouldn't be designed with the assumption that no one makes errors.

ak39 · on July 22, 2014

What I find incredible to believe is that this problem could have happened without the F7 erroneous keystroke by a human. A simple power outage could have resulted in this exact same catastrophe.

Why didn't the backups work? System wasn't "robust" enough. (Did I just use the word "robust"?)

tootie · on July 22, 2014

Alan Cooper's About Face is a great book on interaction design for computers and one of his axioms is "Hide the escape lever". Basically make sure the ejector seat control isn't right next to the throttle.

kosei · on July 21, 2014

Really brave of the author to share this story. I know most people would be afraid to admit this kind of a public "mistake".

dictum · on July 22, 2014

Further reading for those who want to be disabused of the concept of human errors:

https://en.wikipedia.org/wiki/The_Design_of_Everyday_Things

abcd_f · on July 22, 2014

> That Korean announcement and the slow response by the US President — both caused by delayed real information — caused decades of conspiracy theories.

I appreciate that the OP was a part of the situation, but conspiracy theories were not caused by this.

It was time of very high tension between the US and Soviet Union. So when a plane veers off the course into not just Soviet airspace, but into an explicitly cordoned off top secret area, ignores all communication attempts, ignores the presence of fighter jets and just keeps on flying, then the situation itself is a fertile soil for conspiracy theories.

brudgers · on July 22, 2014

'It was a time of very high tension' doesn't quite capture how different it was.

Through the glass of a yellow newspaper box, the Miami News headline that the Soviets had shot down a plane carrying a Congressman. My first thought was "This is the war." Not 'a' but 'the'. The primary stance of the US military was squared off against the USSR and had been for more than 30 years.

steven2012 · on July 22, 2014

Wow. I got goosebumps when I read that article. I'm old enough to actually remember when KAL 007 was shot down, and while I wasn't old enough to hear about the conspiracy theories, I do remember the thing about people being safe and landing in Russia. To think that this was just a small mistake on the part of someone, which caused international ripple effect, and who later blogged about it is really something incredible.

userbinator · on July 21, 2014

"With great power comes great responsibility."

Incidentally, "features" like this are why I don't trust systems that have some centralised control - IMHO giving any one individual (or organisation, in many cases these days) such power over others is not a good thing.

disputin · on July 22, 2014

Scapegoat. The ritual expulsion of the evil spirits wrapped neatly in a little parcel to appease the elders and thereby prevent them blaming each other - harmony continues in the hall of power. Meanwhile the problem was in the process, not the employee, so nothing has been fixed, and the guy who had learned the lesson is no longer there, and so the problem will recur with the next lamb to the slaughter.

ryanobjc · on July 21, 2014

I disagree that it was appropriate that you were fired, but interesting story all about.

kysol · on July 21, 2014

For security reasons I think that it might have been justified, considering the events that had just taken place. Still a crappy way to go out though.

brown9-2 · on July 22, 2014

What security reasons? Firing the author didn't change what had already happened.

kysol · on July 22, 2014

People were bat shit crazy in the middle of that Cold War. If someone randomly decided to turn off machines without notice, even if they said "whoops accident, my bad", their actions would have instantly thought of as sabotage.

I'm not agreeing with the outcome.

sitinaud · on July 21, 2014

That fact that he wasn't too concerned about having accidentally reset all the computers in the building suggests that he may not have had an appropriate temperament/attitude for a sysadmin managing critical systems.

Karellen · on July 21, 2014

Or, you know, he had a perfectly good reason to think that accidentally resetting all the computers in the building at that time would not be a problem:

"Not long after I arrived in my office, I received a call from a secretary in the Agriculture Department who liked to play a computer game before her workday started. Her favorite game had a bug that regularly froze her workstation. [...] I realized that I had mistakenly hit F7 and reset all the workstations in the embassy. This realization didn’t bother me much, because no one except the Agriculture section secretary was usually on the computer system this early in the morning."

I'm sure I'd have thought something like: "Phew! Glad I made that mistake now, rather than at 11am when everyone was half-way through their morning's work. Likely no harm done at all, and I'm going to be really careful with that command in the future. Yup, definitely dodged a bullet there..."

sitinaud · on July 22, 2014

Yes, he had a pretty good reason to think that probably no major damage was done, and this was sufficient to comfort him. This kind of carelessness about the possibility of causing harm or having caused harm suggests to me that he wasn't taking his responsibility as seriously as he should.

ryanobjc · on July 22, 2014

you're really reading too much into a simple story, from over 20 years ago, etc.

In any case, we already know where your line of thought leads, risk adverse cultures ultimately stagnate and wither away.

lotsofmangos · on July 21, 2014

I often suspect that most of the work involved in keeping a power hierarchy going, is involved with trying to pretend that this kind of shit doesn't happen all the time.

lamontcg · on July 22, 2014

And then conspiracy theorists latch onto this kind of shit, but believe that it must be malicious silliness...

Why didn't Reagan respond immediately? Well, he was waiting to hear from Chancellor Gorkon that the KAL flight had been successfully beamed aboard and was en route to Pluto, of course... Clearly they'd have their shit together better so it couldn't have been a 23-year old rebooting all the computers accidentally and wiping out hours of critical work -- that would just be ridiculous...

Loughla · on July 22, 2014

Conspiracy theorists are just 20th and 21st century prophets, really. They search for meaning in an all too often meaningless world.

It's comforting to think that people can control the direction of every choice in the world, and that someone is at the helm.

It's uncomfortable to think about the daily series of random, unconnected decisions that drive the direction of our species.

lotsofmangos · on July 22, 2014

I'm not sure that it is more comforting to think that there is someone at the helm, as much as anyone who aspires to be considered to be at the helm has to keep pushing that story, so it gets repeated more often and with better special effects than the story about there not being anyone at the helm.

Actually being in control of stuff is very difficult, but convincing people that you are in control of stuff is pretty easy as we are all suckers for narrative. The main ways to disrupt a power narrative is to spread other narratives or for a situation to occur that upsets the existing narrative, so getting people to make up new ones. This explains why totalitarian governments can collapse so quickly, which wouldn't be possible if the people running them were actually in control of anything.

privatedan · on July 22, 2014

My first thought after reading the article was that it was ridiculous to fire/scapegoat the author for hitting the wrong key, too. This has happened to me before, where a single keystroke ( in my case, a line break in a config file ) caused me to take down a production system. My punishment? Designing a more robust system that would protect itself from a badly formatted config file. To this day, ten years later, a similar error has not been repeated, despite several attempts of people to push bad config files to our production systems. If I had been instead fired, no doubt a similar, but perhaps not exact, error would have been repeated every year or so.

If I had made the same mistake twice without any attempts to fix the situation long term, then, yes, I think that would have been a fire-able offense.

If you're working with people who care primarily about their own positions and egos without regard to the team as a whole, well, be prepared to be thrown under the bus when it comes time for those people to protect themselves.

Schwolop · on July 21, 2014

Thanks for posting this. I found https://news.ycombinator.com/item?id=8062683 yesterday but yours appears to be the direct link to the author's blog, which I had missed.

joewaltman · on July 21, 2014

Great story....thanks for your willingness to share.

0xbadcafebee · on July 22, 2014

> On this day, I highlighted her workstation and hit the F6 key to reset. But my screen went temporarily black and then seemed to be starting again. I realized that I had mistakenly hit F7 and reset all the workstations in the embassy.

Ugh.

Those with automation capabilities: keep this lesson in mind, because it will happen to you in production one day. 'dsh -a reboot' is incredibly easy to type and can have disastrous effects. Creating abstraction layers around common admin tasks can help catch simple mistakes and give prompts before dangerous behavior.

jheriko · on July 22, 2014

I hope you fought that firing...

... incompetence like that comes from having F6 next to F7 and no checks or authorisation needed for a potentially dangerous action etc. Processes should be designed for people to make the common mistakes... its what they do.

jheriko · on July 22, 2014

nm. just seen the follow up. :)

hyperliner · on July 22, 2014

"My boss, a >> Japanese << computer engineer named Itoh, poked his head in the door. "

hmmm, I am pretty sure Mr. Itoh was not Japanese working in the American embassy. I am pretty sure he was American.

billmalarky · on July 22, 2014

If he was a first gen American his culture would have been greatly shaped by Japanese culture.

joshuaheard · on July 22, 2014

I thought the headline meant to count to ten when a plane goes down...while you are in it!

instakill · on July 22, 2014

Me too. I still don't know how counting to 10 will help you press the right key on your keyboard though.

feld · on July 21, 2014

Reset all computers in the embassy with F7? No warning prompt?

Fire the idiot who wrote that function.

smacktoward · on July 21, 2014

In fairness, it was a different world back then. There were so few people administering computer networks that you could generally assume someone who was doing so had been thoroughly trained; and the thing about highly trained people is that they tend to view things like failsafes and safeties as pointless time-wasters.

"I know what I'm doing when I hit F7, but the damn system makes me sit there for 30 seconds before it does what I told it to do! Piece of junk."

The result was that software in that era tended to come with a lot more sharp edges. The age of the Recycle Bin that would save you from yourself didn't arrive until administering systems became something the general public was expected to do.

tzs · on July 22, 2014

Ahhh....the days of sharp tools, no failsafes, and young programmers or admins.

I recall that time I wrote a batch manager for the VAX 11/780 at Caltech High Energy Physics. It consisted of a program to monitor the batch queue and start jobs as scheduled ("BATch MANager", or "BATMAN"), and a program for users to submit jobs ("Run Overnight Batch INput", or "ROBIN").

The configuration file for BATMAN was stored in /etc/batman.

During development, I occasionally had to "rm /etc/batman". Of course, out of habit, as soon as I typed "/etc/" my fingers would automatically type "passwd", and once I did not catch this in time. Oops. It happened to be a Sunday morning at around 7AM, and I had to call the other admin, who handled backups, to come in and restore that file. He was annoyed.

The second time I did this, he was pretty pissed.

The third time, I fortunately had been working at the terminal we had in the machine room, and managed to shut down power to the machine before the write buffers were flushed, and the file was OK after fsck. I didn't have to deal with an angry co-admininstrator that time. Just angry physicists.

The other admin (Norman Wilson, in case anyone knows him or he reads HN) then made a link named /etc/safe_from_tzs to /etc/passwd to stop my nonsense once and for all.

That worked until the first time I wanted to overwrite /etc/batman instead of rm it.

That led to a cron job that maintained a copy of /etc/passwd in a separate file, and periodically checked to see if it were missing or misformatted, and restored it if so.

dkural · on July 22, 2014

One would think after the first two times you'd find a better way to do this, realizing your infrequent but habitual mistake. Why didn't you change any of your practices after the first two screw ups?

thaumaturgy · on July 22, 2014

Not speaking for tzs, but back in the good ol' days, everybody was pretty busy. A lot of software got written by operators a little bit here and there in between running jobs and moving paper around the building and that sort of thing; paid programmers were frequently dealing with change requests from business departments, all of whom wanted their thing done yesterday; and depending on the size of the organization, there might be a PFY or two, but they generally weren't allowed anywhere near production hardware.

For example, one of my early jobs in IT involved running batch programs that produced reports on a mainframe designed for the punch card era. It had moved on from punch cards, but all of the batch jobs still expected them as input, so they were stored instead as "digital cards" in the job files themselves. The operator -- me -- would be responsible for bringing a job up on the terminal, changing each occurrence of some two-letter code in each card file to some other two-letter code, some date code to some other date code, and so on. Each batch job might be just one step of half a dozen or so required to produce paper printouts from the database. The terminal emulator did not have a find & replace function. Naturally, I screwed up jobs on a regular basis.

This mainframe ran mainly on COBOL74. Over the course of a lot of unpaid overtime, a few hours here and there for several weeks, I gradually wrote a variable interpolator in COBOL that could be called as the first step of a batch file and would replace all occurrences of a variable tag with an input parameter passed to the job. Instead of pulling up a job file and replacing a bunch of two-letter codes, you'd just run the job with the two-letter code as a parameter, and this program would rewrite all of the data cards in the batch file. COBOL has no string operators or a string data type, but I found a way to abuse some system calls to make it work.

So it took weeks to fix the most common operator error in that shop.

IT staff spend more time on Facebook, Reddit, HN, and online gaming now than we ever had available for fixing processes back in the day.

tzs · on July 22, 2014

I'm not absolutely sure on the number of times. It is possible that I only rm'ed it twice, and then Norman made the link to end that. This would have been around 1981, so there has been some memory fade.

autokad · on July 22, 2014

are you that angry co-admininstrator ? just curious

dkural · on July 22, 2014

hahaha :) I didn't mean the comment to sound angry - nope not the co-admin. I do have a running interest in stuff like continuous improvement, organizational excellence etc. so consider this field research!

viraptor · on July 22, 2014

> you could generally assume someone who was doing so had been thoroughly trained

No amount of training can prevent something like this. It's like today's browsers where the tab can be closed with ctrl+w and the whole window with ctrl+q. It doesn't matter how many times you've done it and how used are you to the position of the 'w'. One day you will close the whole window by accident.

thefreeman · on July 22, 2014

Personally I agree. Mistakes happen, everyone has accidentally hit the wrong key at one point or another in their life. I was pretty surprised how seemingly fine he was with being fired. At the same time, I guess the net result of the mistake was big enough that it did kind of require a response, and it has been about 30 years since it happened.

GoodIntentions · on July 22, 2014

IMHO, firing someone who owns up to a keystroke mistake like that is wrong. Good managers fix the problem, weak managers fix blame.

Root cause analysis + countermeasure might have boiled down to "operator error due to shitty interface" + "we will tape a guard over F7 key, since it will never ever get fixed in software"

viraptor · on July 22, 2014

> the net result of the mistake was big enough that it did kind of require a response

That's a dangerous way of thinking. 9/11 was big enough that it required a response. Not sure if we'll ever reverse the airport security stupidity that was such a response.

smsm42 · on July 22, 2014

This is a result of a way of thinking called "Politician Fallacy": "We need to do something. This is something. Therefore, we need to do this". Of course, 9/11 required a response - however it didn't require just any response, it required appropriate response. TSA is not one, and it starts to be more and more clear to more and more people. OTOH, firing somebody who caused the network to go down at the critical moment may be entirely appropriate - one of your responsibilities is not to make such mistakes, you failed at it, you're fired.

vehementi · on July 22, 2014

I never understood the rationale of "you made a mistake, so you're fired". By making a mistake, the employee has increased her value in that she will never make that mistake again. If you're going to replace the employee you have to pay to hire someone even better (to recoup costs of talent hunt, training) and someone who somehow won't make a typo. It just seems like a situation that is strictly worse than keeping the current employee.

smsm42 · on July 22, 2014

>>> By making a mistake, the employee has increased her value in that she will never make that mistake again

This is a far-reaching conclusion. That assumes that a) no mistakes can be prevented before they happen for the first time and b) every mistake can be prevented after making it. The truth of either far from obvious. Moreover, it is routine in our culture that sever mistakes are punished - e.g. if you make a mistake of driving drunk and cause harm, you'd probably be punished, not lauded as model citizen since you'd never make the mistake again.

Moreover, if no punishment follows the mistake, why the mistake would not be repeated? What would be the motivation to avoid the repetition of the mistake - do you assume the sympathy for the co-workers would be enough? It is not always a sufficient motivator.

>>> It just seems like a situation that is strictly worse than keeping the current employee.

That assumes employees are a fungible commodity, and if you pay the same money you always get the same one. This is not true - you can find employee which would be more attentive, or one with more experience.

vehementi · on July 22, 2014

If you believe that you can find an employee that is more attentive or with more experience, and you are not laying off your employees right now to find those better employees, what the hell is going on? Are you just hanging out, basically sitting there knowing you have suboptimal employees, and eagerly waiting for them to fuck up so you have an excuse to axe them? You know your employees are (dun dun dun) capable of mistakes but it's expensive to lay them off so you're watching like a hawk for when you get to upgrade them?

The key difference here to me is mistakes vs negligence. Employee makes a typo -> It's a mistake. Not severe, negligent incompetence. It's a learning experience. The company is worse off by firing that person who has experience

If someone is slacking off? Yeah, fire them, that's not a mistake, that's negligence. You email a colleague in another time zone asking for help and they ignore you because you didn't CC their manager? Yeah, fucking fire that person.

I mean, in fact we have an industry based around the fact that people make mistakes: it's called software testing. Should we be firing developers when they make a mistake (i.e. their code has more than zero bugs)? That would be ridiculous. You're not even punishing them in that case - they're going to use their current salary at your shop to leverage a higher salary at the new place they (effortlessly) land at, whereas you're going to spend tens of thousands of dollars to hire that mythical developer that you should have fired this guy for a year ago?

meepmorp · on July 22, 2014

> By making a mistake, the employee has increased her value in that she will never make that mistake again.

Personal experience tells me this is not always true.

vehementi · on July 22, 2014

You're right, I shouldn't've used "never". But that mistake is an experience and people learn from mistakes. Now that person is less likely to make the mistake again.

I mean, the goal of the business is to create value / profit and find people who add value to your organization. Not to judge and suss out people who you discover are capable of making a mistake and saying "AHA! I FOUND YOU! You were an imposter all along not worthy of paying! Time to start from scratch again!"

sliverstorm · on July 22, 2014

IMO the fact that we have multiple examples of these kinds of accident-prone key pairs does partially exonerate whoever did this particular F6/F7 bit.

dceddia · on July 22, 2014

I can't count how many times I accidentally hit the Save Game button instead of the Load Game button in Half-Life. The keys were literally, like, right next to each other.

jessaustin · on July 22, 2014

...the whole window with ctrl+q.

OMG I've never done that but now that I know about it I'm very afraid. If I do it tomorrow I'm blaming you.

sushid · on July 22, 2014

Thankfully, Chrome has a built-in feature to prevent this from happening (on OSX at least). Just go to Chrome > Warn before Quitting and make sure there's a checkmark next to the option.

Now, if you accidentally press Cmd + Q, it should prompt a "Hold Cmd + Q to Quit" instead of actually quitting.

tremendo · on July 22, 2014

Or, Settings > On Startup... "Continue where you left off". This will restore your tabs after launching Chrome.

Houshalter · on July 22, 2014

I disabled that warning because it's annoying every time I want to close the browser, even without the dangerous keyboard shortcut. If it happens, you can go to the menu and find "recent tabs" or just ctrl+shift+t.

sp332 · on July 22, 2014

If you do this in Firefox, you can go to History -> Restore previous session. This will bring up all the tabs you had open last time it quit.

viraptor · on July 22, 2014

This helps: https://addons.mozilla.org/en-US/firefox/addon/customizable-...

Also, I made a mistake, it doesn't close the window. It closes all windows at once. Be afraid!

pooper · on July 22, 2014

ctrl + q very appropriately quits the application. If this came as a ticket to me, I'd close it as working as expected.