Type in the exact number of machines to proceed

csmattryder · on Oct 27, 2020

I've seen this called "pointing and calling" [1], Japan's train drivers use the technique to force themselves to perform actions and take notice of the current environment.

I personally took it to heart, it's a good system for forcing a cache miss in the brain - make sure you're on "database production" or "database localhost" etc.

[1] https://en.wikipedia.org/wiki/Pointing_and_calling

brundolf · on Oct 27, 2020

I've only been in the job field for six years, and yet:

My first boss accidentally deleted our QA database, meaning to delete a local copy

A later boss accidentally deleted our production database, thinking it was the clone that he had just made (which luckily we still had)

Both of them were very experienced developers in their 40s. Nobody is beyond this kind of mistake.

jlmorton · on Oct 27, 2020

War story time. Long ago, I worked for an interesting company that insisted on running its entire business on Linux desktops, all the way back between 1999-2002. Imagine running StarOffice/OpenOffice, Thunderbird, Netscape Navigator, etc, for your entire business back in 2000, including your executive team, marketing teams, everyone, most of whom had never even heard of Linux before.

Anyway, this being Linux, everyone's home directory was mounted on NFS. All our builds were standardized with a tool called SystemImager, which we could use to push out updates to everyone's desktop whenever we wanted. If there was a new version of KDE, we could pretty easily push that change out.

Sometimes it was convenient for me to work on updates to these images by chrooting into a directory containing the "image," which was really just an rsync tree. And sometimes, when updating these images, it was convenient to mount our NFS home directories in this chroot environment, so I could access things like an archive I had just downloaded on my own desktop.

And eventually we had lots of different images, and the old ones were using up a lot of disk space, so I decide to clean up some space removing the old images. And these are fairly large images, with lots of small files, and this was before SSDs were a thing, so it made sense that deleting them was taking a while, and I stepped out to grab something to eat.

As I was eating lunch, I started getting the tech support escalations. But this wasn't that unusual, our users routinely had problems with the environment we had provided. They hated it, because it was in many ways terrible, and they made sure we knew it. So I wasn't terribly alarmed. I didn't think any major changes had been made, so I didn't hurry back.

By the time I leisurely returned from lunch, half the NFS home directories for our users were gone, along with all their documents, emails, bookmarks, or whatever else. Suddenly it hit me what had happened: at some point, perhaps months earlier, I had left our NFS home directories mounted within one of these image chroots. And now I had sudo rm -rf'd it.

We had backups, but they were on tape, and it took several days to restore, with about a day of data loss.

abrookewood · on Oct 27, 2020

That sinking feeling and cold panic when you realise what you've done. God that is horrible.

gomox · on Oct 28, 2020

My favorite version is when that UPDATE or DELETE SQL query that you expected to finish instantly takes a few seconds before giving you your cursor back.

beefield · on Oct 28, 2020

If someone just gave me a tool to show me the expected wall time of query before actually running it, I would be quite happy. I would not even need that much of accuracy, anything up to one order of magnitude would be useful, and even up to two orders of magnitude I would use occasionally.

yencabulator · on Oct 28, 2020

https://www.postgresql.org/docs/current/using-explain.html ?

beefield · on Oct 28, 2020

Nobody has ever been able to give me a function query cost -> wall time with any accuracy.

dasyatidprime · on Oct 28, 2020

You probably knew this already, and there's probably better solutions if you're not in the manual sysadmin world, but after I did that on a personal machine a few decades ago (I think it was?), I got in the habit of using `--one-file-system` when doing major recursive rm operations that weren't meant to cross filesystems. Or `find -xdev … -delete` for anything more selective.

dmurray · on Oct 28, 2020

It seems better to alias rm to "rm --one-file-system", assuming major cross-filesystem deletes aren't something you do all the time that should be made as ergonomic as possible.

coredog64 · on Oct 28, 2020

Similar story, except we were using an NFS appliance that took hourly snapshots. As soon as we figured out what was happening, we had the storage team save off the latest snapshot. It was 1TB of data (a lot for the time) and took a week for us to restore.

aprdm · on Oct 27, 2020

A lot of companies still work in a similar fashion to what you described, maybe with root squashed, but still, very possible to have something like that happen now a days!

I remember someone hit a bug with docker exec --rm years ago where it started deleting some NFS files that it shouldn't...

Huggernaut · on Oct 28, 2020

This reminds me of a time when a colleague and I were investigating some persistent D-State processes that were occurring when container processes were being exec-ed.

Once on the box, we wanted to create a container with utilities in the fs but didn't want to download an image tarball or look through the rootfs layer directories for one to use, so we just bind mounted host root onto another directory, beside the config file we were using.

This worked like a charm. Until we rm -rf'd the config directory and deleted host root in the process.

In our case, fortunately the consequences were minimal as all workloads were stateless. The container scheduler moved all the workloads to other hosts and the host scheduler noticed this VM wasn't responding any more and rolled a new one. The whole thing resolved itself in about 5 minutes with no interaction from us - so that was pretty neat.

stmw · on Oct 27, 2020

That's a very sad worry story, hope it turned out OK. Sorry you and the users had to go through that.

eljimmy · on Oct 27, 2020

Oh man - this one is anxiety inducing. I feel like this would haunt me for years.

brlewis · on Oct 27, 2020

>very experienced developers in their 40s

I'd say they were experienced developers. Only after accidentally deleting databases were they very experienced developers.

PopeDotNinja · on Oct 28, 2020

I once cloned a directory for standing up an environment via Terraform. I modified all of the environment variables and config and ran it. It worked perfectly. Except I’d forgotten to wipe out the Terraform state, which meant that in the process of creating a new environment, it completely deleted the environment I had cloned. That was my initiation into very experienced :)

BalinKing · on Oct 27, 2020

This may not have been their first time, though :-P

__d · on Oct 27, 2020

Some time ago, it was common in Unix sites to have an NFS filesystem mounted on all machines that contained locally-built binaries to augment those provided by the operating system. At this site, we used a bunch of different platforms: OSF/1, Solaris, Linux, HP/UX, etc. So we had a large filesystem containing the source code, and built binaries for all the different platforms, and this included heaps of things, from Bash upwards.

A colleague of mine accidentally ran rm -rf on this filesystem.

It was taking a loooong time, so he realised and killed it, but not before it had removed a heap of stuff. Because this was something that could be rebuilt, it wasn't backed up, so we had to go through the process of downloading the tarballs, and recompiling everything for all the different platforms. It took a few days to recover most of it, and weeks to completely restore things.

The day after the incident, when he arrived at work, he found his keyboard was missing a few keycaps. It took him a while to realise that there were four gone: 'R', 'M', '-', and 'F' ...

Good times.

dataflow · on Oct 27, 2020

Reminds me of when I accidentally deleted a virtual hard disk I had a few years ago, because I'd copied it earlier and I thought I still had the other copy left. Only afterward did I remember I'd done the exact same thing to the other copy earlier... thankfully the information on it wasn't critical, but it was kind of terrifying to realize it very well could have been.

sgustard · on Oct 28, 2020

I have been that boss. Is that you, Wendel? In any case: the deletion even had a "type your app name to confirm" prompt, but I knew I wanted to act on production; the issue was deleting the wrong one of multiple production databases. The takeaway was to grab a second pair of eyes to review any dangerous operations.

aidenn0 · on Oct 28, 2020

I deleted our production CRM database meaning to delete the test database. While my boss was running queries on the database for setting my quarterly bonus.

Good news is that I was deleting the test database to ensure that the recovery from backups was properly automated, so it wasn't down too long.

davedx · on Oct 28, 2020

Yup. Senior dev here, my own devops config screw up wiped out all production sales order data earlier this year. Had to restore from multiple backups, took a while. Stressful experience.

Consider network partitioning so dev/test/accept just has 0 contact with prod.

contravariant · on Oct 27, 2020

Ironically there seems to be no time more prone to these kinds of mistakes than when you're trying to prevent or fix them.

jrott · on Oct 28, 2020

Most of the worst production issues I've been involved with have come from trying to fix a minor issue and then somebody making a mistake. The way our brains are wired to handle stress isn't really useful for debugging complicated problems.

Aeolun · on Oct 28, 2020

One of the best pieces of advice I’ve ever gotten from a manager (in regards to production issues):

First, calm down.

I’m still amazed that he could be so calm when I’d just deleted a bunch of stuff on a clients production environment.

May not have been the most lucrative company I’ve ever worked for, but it was definitely the best one.

myself248 · on Oct 28, 2020

Ever since hearing about point-and-call, I've started using it in the kitchen when turning on the stove. I used to destroy one or two pans a year by turning on the wrong burner, but it's now been about a year and a half and I haven't screwed it up yet.

The knobs are labeled with a terrible little glyph meant to indicate which is which, and I've supplemented this with plain-english Brady labels "front left", "front right", etc. Now I speak the words above the knob, and point to the burner. It felt goofy at first, but now it feels normal, and like I'm tempting fate if I skip it.

giantDinosaur · on Oct 28, 2020

I'm curious how exactly you managed to destroy pans. I've never destroyed a pan in my life, and take no particular precautions - is this a common thing? Is this more common with non-stick stuff or something?

ajb · on Oct 28, 2020

Not the op, but non stick pans will burn if the pan is heated while empty.

pantalaimon · on Oct 28, 2020

I think non-stick pans are a fad. A well greased iron or steel pan works much better and is impossible to destroy.

rpeden · on Oct 28, 2020

Can non-stick pans even be a fad when Teflon coated cookware has been popular for 60+ years?

542354234235 · on Oct 29, 2020

Totally a fad. I bet they won't be popular for more than a century or two.

myself248 · on Oct 29, 2020

The non-stick ones especially, but even plain metal pans will warp if they get hot enough. And then they don't sit flat on the burner, which might not matter on a gas stove, but contact with an electric burner is pretty important.

dirkt · on Oct 28, 2020

Not sure how it is in other countries, but don't the knobs when going left-to-right always correspond clockwise to the burners, starting at the lower left? And the oven knob is to the right?

I've never seen a different arrangement.

ajanuary · on Oct 28, 2020

They differ a lot. The first two results on Google image search for me show anti-clockwise from far left [1] and clockwise from front-left [2].

[1] https://www.blomberguk.com/appliances/integrated-appliances/... [2] https://www.ikea.com/gb/en/p/smakoka-gas-hob-stainless-steel...

dmurray · on Oct 28, 2020

My four knobs go front to back. I don't know what order they're in - the glyphs are fairly readable to me. I've seen this arrangement plenty, it's not unique.

MaxBarraclough · on Oct 27, 2020

Worth mentioning that, assuming the single study on the matter can be believed, the pointing and calling method is extremely effective in reducing the incidence of silly mistakes (that is, mistakes made in simple routine tasks, by competent individuals).

Unfortunately, it strikes many as looking rather silly, so it hasn't been widely adopted.

js2 · on Oct 27, 2020

I learned a technique from a gray beard[0] when I worked as a student sys admin for the CS dept over two decades ago. Whenever typing a destructive command, he'd take his hands off the keyboard and drop them to his side, re-read the command, then put his hands back to press enter.

I do this whenever I'm on a production server (which is rare anyway). I use different colored prompts for local and remote shells.

[0] Technically he had no beard and if he had, it wouldn't have been gray.

encom · on Oct 27, 2020

Re: Beards, color of:

Mine started turning grey in my mid 20s.

Could be related to me doing the electricians equivalent of deleting production DBs. I've drilled through the comms cable to payment terminals during opening hours. I've run over a copper gas line with a scissor lift. And yes, I've cut live 230V cables with hand tools.

That sinking feeling in your stomach you get immediately after doing something bad - it's universal across professions.

Thankfully, I've never fucked anything major up, and I've had my hands in hospitals, power plants, ISP fiber backbones, police stations and whatnot.

MaxBarraclough · on Oct 27, 2020

This reminds of a quote from, I think, Discworld.

> You're a survivor.

> But I've nearly died, dozens of times.

> Exactly.

andreareina · on Oct 28, 2020

Sounds like Rincewind. See also the octogenarian barbarians who are so deadly precisely because they've had a lifetime of experience of not dying.

mangamadaiyan · on Oct 28, 2020

That would be Cohen the Barbarian (aka Genghis Cohen) and his cohorts, collectively known as the Silver Horde.

Xylakant · on Oct 27, 2020

> I've drilled through the comms cable to payment terminals during opening hours.

A friend of mine who does fire alarm systems was tasked to install one at a bank branch. He found out the hard way that one of the cables for the safes safety system wasn’t in the place where it should have been according to the plans. Safe’s safety system hosed, bank branch closed for repair.

brundolf · on Oct 27, 2020

Different-colored prompts for different machines is a great thing to do (I've been doing it for years), and very easy to implement

MaxBarraclough · on Oct 27, 2020

Solid tip. For GUI-enabled servers, use distinctively coloured wallpapers. I recommend bright red for production machines. The image itself can be just about anything, provided the colour is clear.

Doesn't hurt to use an image that's related to the server's purpose, and to put the name of the server right there in the wallpaper somewhere.

brundolf · on Oct 27, 2020

That's nifty, but sounds like more effort than changing a single color in one's .bash_profile

gknoy · on Oct 29, 2020

Using iterm2, you can set a "badge" (large text overlay) on a terminal tab. I have a short shell function (`ib foo`) that sets the badge to arbitrary text. It's NOT as good as setting the terminal theme, but it's still very helpful to use it like this:

    ib production && ssh production-machine
    ib demo && ssh demo-machine

It's definitely helped me when testing a fix on a demo or staging instance, and has helped me avoid doing it on production accidentally.

MaxBarraclough · on Oct 27, 2020

That's true, but depending on the configuration it may benefit all future users of the server.

TylerE · on Oct 28, 2020

Abel prompt that shows host name, username, and git branch

morelisp · on Oct 27, 2020

A similar tip I picked up long ago: If you're typing a dangerous command, first type a `#` (or `--` if it's SQL, etc.), then the command. Then read it. Then go back to the start of the line and remove the comment and run it.

grossvogel · on Oct 28, 2020

I always do destructive SQL commands in two steps: first run a select using the WHERE clause you intend to use and verify which records will be affected, then hit the up arrow and edit the beginning of the query leaving the WHERE intact.

I also like adding redundant conditions to the WHERE so a typo in any single one of them won't sink me.

tatersolid · on Oct 28, 2020

For the rare but critical manual SQL mod our common safety measure is to wrap every DELETE or UPDATE in BEGIN TRAN...ROLLBACK TRAN first. Run on test systems or snapshots multiple times, checking the result inside the transaction.

Finally, change ROLLBACK to COMMIT only when you are positive all is well.

mjevans · on Oct 28, 2020

IIRC (without checking the manuals) data-definition commands might not be covered by such transactions: such as altering, dropping tables and possibly truncates.

andreareina · on Oct 28, 2020

PostgreSQL is quite good about DDL being transactional. So I was surprised (tbf, I shouldn't have been) when Redshift autocommitted after a TRUNCATE. But DROP TABLE is transactional, go figure.

tatersolid · on Oct 29, 2020

DDL is transactional in Microsoft SQL Server as well.

maynman · on Oct 28, 2020

I do the same thing. I also keep auto commit off and make sure the rows updated looks correct before committing the change.

MaxBarraclough · on Oct 27, 2020

A related Bash command: alt+# to prepend a hash symbol and submit the line, so you can return to it later (through Bash's history) to run it.

Fradow · on Oct 28, 2020

I use an alternate version on SQL: when running any modification on any kind of sensible database (which is a bad practice in itself, obviously, but sometimes you don't have a choice), always type in the WHERE clause before the table name (added bonus: do a SELECT first with that clause to see what you are modifying).

That way, if you accidentally send it, the command fails and nothing happens.

haimez · on Oct 28, 2020

For SQL, always BEGIN first. If you’re unsure, run it as an EXPLAIN first. Then fire. Then commit it or roll back.

rkagerer · on Oct 28, 2020

Especially useful if you're remoted in over a laggy connection.

morelisp · on Oct 27, 2020

I've done this for several years (also after seeing a video about Japanese railway operations). It doesn't seem to catch on.

It's also not perfect; it does not catch mistakes concerning "non-local" state, e.g. configuration files in /etc merging with one in . merging with some command line options. (Personally I try to avoid writing tools with defaults of this sort, but especially Java developers seem have different opinions.)

Unfortunately if you do P&C and still make the mistake due to the aforementioned tooling, you look even stupider.

myself248 · on Oct 28, 2020

Around industrial machines, I've long held and promoted the view that the machine is _trying_ to kill you, _trying_ to damage itself, _trying_ to ruin the workpiece. Only by outsmarting it at every turn, and having safeguards against every mishap, can you go home at the end of the day.

When something happens despite all that, just step back and realize how much worse it could've been, and how successful your safeguards have been up 'til that point.

Then look carefully at the procedure. Is there something about the naming or structure that could be more clear? Can you think of near-misses that resemble the failure you just experienced? Are you using boobytraps in production? Symlinks and overlay filesystems seem clever in the moment but they're bound to subvert our intuition someday. Perhaps you should get in the habit of always using full absolute paths, for instance.

There's always another gotcha, but if your workflow doesn't look as over-the-top safety-silly as aerospace, you're not doing as much as you could be. (Hint: It's not silly.)

blantonl · on Oct 27, 2020

Watch and listen to pilots as they complete checklists. They point and callout each item, switch setting, etc.

waterhouse · on Oct 27, 2020

I searched Youtube for examples of this. This is a little bit staged, but it seems to be a real checklist they're going through: https://www.youtube.com/watch?v=JG7SkOQDDt0

Though they're not perfect. They said that one pilot is supposed to read the item, the other pilot say the answer, and the first pilot visually confirm it; but at 1:42, I noticed the first pilot say "emergency exit lights", hear the confirmation, and move to the next item without her eyes moving away from the list.

I'm not sure which of several possible conclusions to draw from that. ("Humans suck", "it is indeed staged", "the procedure has enough redundancy that the chance they're both careless on a given step is small", "the pilots feel that the emergency exit lights aren't particularly important", ...)

andi999 · on Oct 28, 2020

Routine is the killer. Have a look at the fatal maglev train accident in Germany. Service car was on the track. Presence of service car in service bay (and not on track) can be seen visually by operator (driver) in the control centre when turning head. (If I remember correctly)

morty_s · on Oct 27, 2020

Came here for this.

A: “Passing control”

B: “Taking control”

A: “You have control”

B: “I have control”

This is how I remember it (6174, UH-1Y).

rurp · on Oct 28, 2020

Rock climbing is remarkably similar. When a climber begins up a route the standard exchange with their belayer (the person managing the rope and keeping them alive in a fall) goes something like

A: "Belay on?"

B: "Belay on"

A: "Climbing"

B: "Climb on"

Then the climber begins.

It's interesting to me that highly regulated and totally unregulated activities have evolved extremely similar processes. I suppose having your life on the line is a good motivator to follow best practices.

Talanes · on Oct 28, 2020

For the ultimate low stakes version of this, when I played World of Warcraft in my younger days, tank swaps would communicate the same way.

chipsa · on Oct 28, 2020

"On belay?"

"Belay on"

Swapping the order of the words helps further.

BalinKing · on Oct 27, 2020

As a fun fact, I was taught a shorter version (during private pilot instruction):

"You have the control."

"I have the control."

IDK if it changes between aircraft types, commercial/private/military cultures, or if it's just coincidence.

pacaro · on Oct 28, 2020

Or as we learned from Sully, when you're the captain, and you take charge and responsibility quickly, a simple "My plane" suffices

regularfry · on Oct 27, 2020

Yep, I had this version in a military trainer.

cortesoft · on Oct 27, 2020

The TCP handshake IRL

euler_angles · on Oct 28, 2020

This kind of positive assertion handover of control is still standard for very good reasons.

staunch · on Oct 27, 2020

And pilots will even callout that their action had the desired effect:

"Flaps up selected"

"Flaps are indicating up"

There's a lot to learn from the way airplanes are engineered and operated.

nemosaltat · on Oct 27, 2020

Prior Navy Nuke here. We called it PRO (Point, Read, Operate)- we’d point at the thing we were going to manipulate, state what we were manipulating, and announce the completed action.

For certain procedures we had a second party (“reader”) observing and acknowledging each part of each step.

Operator (Gesturing anti-clockwise while pointing at valve XYZ) Operator: Opening valve XYZ. Reader: Opening valve XYZ, aye. Operator: Valve XYZ is open. Reader: Valve XYZ is open, aye. Operator: Indications of flow Reader: Indications of flow, aye.

People can still get complacent, and things can still get missed but the deliberate mentality goes a long way. Now when GitHub makes me type out the repository name before I can delete it, I sometimes copy/paste... YOLO.

AtlasBarfed · on Oct 27, 2020

I've noticed from pair programming that the person navigating with a mouse is far less able to read and interpret their surroundings or pick up typos while typing, than an observer that simply has to watch what the other person is doing.

Like when clicking on a file in a directory you just entered and looking for the file, the observer can literally locate and point to the file for the mouse user 5-10x faster than the mouse operator.

The observer seems to interpret the information that results from the directory listing faster than the person who just did the double-click to enter the directory because they don't have the muscle coordination context switch and can immediately move to interpreting the results.

It's probably because mouse manipulation uses brain infrastructure that is more recently evolved, but observe-react is a lot earlier in the brain processing pipeline evolutionarily, and a lot more refined/involved.

Fr0styMatt88 · on Oct 28, 2020

Since I have a vision impairment, I'm sure the effect is amplified very much for me, but using the mouse is such a massive break in flow:

- First you have to lift one hand up off the keyboard and put it down on the mouse. This may or may not mean taking your eyes off the screen.

- Then you need to find the mouse pointer on the screen

- Then you need to aim for what is usually a relatively small target and move the pointer there.

- If you're right-clicking, the right-click menu usually presents more small targets you need to aim for.

- If you need to use the keyboard, again you have to move your hand over to the keyboard from the mouse.

For finding the pointer, I developed this unconscious habit of slamming the mouse pointer to the very top-left of the screen. It's difficult though when on someone else's machine, where your brain isn't used to the pointer velocity or where multi-monitor means that slamming the mouse to the top-left actually puts the pointer on another monitor.

People look at me in awe when I'm using a two-pane file manager but honestly not having to take your hands off the keyboard and not having to move your eyes off the screen gives so much better flow. It's also why I like the UI of Blender - one hand on the keyboard and one hand on the mouse at most times.

yobert · on Oct 27, 2020

I think this is because writing software is so much more than operating switches and controls. I really hate pair programming for this reason, but I love industrial-style controls and protocols involving multiple people.

robaato · on Oct 27, 2020

Back in the '80s I worked on a financial system (SWIFT interface) for an Italian bank. It went operational and we observed 2 operators effectively doing "pair operating". We just thought it was weird Italian style socialising - one had the keyboard and the other was chattering away with a commentary. But they were surprisingly effective!

I accidentally learned when teaching a course at a site with too many people for the available machines, that pair exercises was very effective - I got lots more questions and overall learning went way up. If the pair discussed it and couldn't find an answer they would have the confidence to ask. On their own, neither would probably bother and just wait for me to go through things.

Sharlin · on Oct 27, 2020

And it should be kept in mind that almost none of those procedures were intuitively obvious things to do. As the saying goes, safety standards are written in blood.

quercusa · on Oct 27, 2020

And then there's the interactions with the Air Traffic Control system: flight plans, Standard Approaches, charts, etc. It's very impressive.

acdha · on Oct 28, 2020

Back when I shelled into servers more, I really liked having my deployment put the environment in the prompt and set a red background on production for similar reasons. It only takes a small change to jar you out of habit.

YeGoblynQueenne · on Oct 27, 2020

>> I personally took it to heart, it's a good system for forcing a cache miss in the brain - make sure you're on "database production" or "database localhost" etc.

Yeah, ouch. More ouch if it's the other way around- you delete the test database and it's not the test database.

(long story)

kbenson · on Oct 27, 2020

> you delete the test database and it's not the test database.

> (long story)

I think you can skip the long story, as most of us can tell a story similar in theme if not specifics (and sometimes, probably some similar specifics too). ;)

With great power comes great responsibility (to not completely screw stuff up because you were on autopilot for a second...)

YeGoblynQueenne · on Oct 28, 2020

Indeed. God damn muscle memory.

throwaway894345 · on Oct 27, 2020

I worked at a company where someone deleted the production database by accident and the snapshot mechanism hadn't been working AND the alerting for the snapshot mechanism was also broken. Fortunately someone had taken a snapshot manually some weeks prior and they were able to restore from that and lose relatively little data (it was a startup, so one database was a big deal, but weeks worth of data was not such a big deal).

txutxu · on Oct 27, 2020

I worked at a company were someone deleted the production RDS and all the snapshots.

Typing the confimation and requesting to delete the snapshots.

He had two brosers open, one for development (of cloudformation, etc)... but someone did ask him to change a thing in prod.

Both browsers were identical. Only the account in the top right corner did change.

Both cloudformation stacks were identical (instance names, etc).

He had been all the morning launching and deleting the dev environment.

Team mates were joking loud around his table before the moment it did happen.

Sadly, he got fired (the company was proud of it's cost savy choices, didn't have other backups than a few days of snapshots, probably CTO choice).

Gene_Parmesan · on Oct 27, 2020

Firing the person who happened to be at the wheel when a mistake like this occurs never seems like the right choice to me, especially if their performance to-date had otherwise been good.

Everybody has off days, or just instances where circumstances misalign in just the wrong way. To pretend otherwise is silly; instead, it's the leader's/team's responsibility to ensure that those sort of off days don't lead to massive losses via redundancy & the sort of measures we're talking about here & in the OP. Firing somebody in these circumstances just acts to severely reduce morale, since we all secretly know in our hearts that it very easily could have been us.

Firing in this case just seems retributive. It's not going to bring the lost data back, and you've just eliminated the very person who could have told you most about the chain of events leading to the incident in question to help you guard against it in the future. These incidents usually sound simple at the surface level ("I clicked the button in the wrong window") but often hint at deeper, perhaps even organizational, issues. A lack of team focus on reliability/quality, a lack of communication or trust about decisions made (or not made) by higher ups, or so on.

And they are probably the single least likely person to cause a similar incident again -- that person will now likely be double and triple checking their commands for eternity.

jacobsenscott · on Oct 27, 2020

Agree. There is never a single cause to this kind of error. It takes a village. Someone didn't name things properly, someone else didn't store backups properly, someone else gave everyone root access to production, etc. It was inevitable the database would be deleted - doesn't matter who actually did it.

If your CTO scattered those landmines all over then "not stepping right" is not an error. It just sucks.

greedo · on Oct 27, 2020

Sometimes. And sometimes they make the same mistake over and over.

We had an admin in charge of our storage. He had worked with our old vendor's SAN for years, then we got a new SAN. Trained him/certified him etc. He "accidentally" shut down the entire SAN. That brought down the entire company for over 9 hours.

Fast forward two years later, he screwed up again and caused a storage outage affecting about 1100 VMs. Luckily not much data loss, but a painful outage.

Then a month ago, he offlines part of the SAN.

Some people never learn, and recognizing this early is usually better than letting someone continue to risk things.

dataflow · on Oct 27, 2020

3 mistakes in... >2 years? I feel like it's really hard to tell if the problem is really the person at that point. Have you had others perform the same job for a similar duration to see if they avoid the same mistakes?

nitrogen · on Oct 27, 2020

If you made a list of every mistake each person makes in 2-3 years, and omitted all other detail, pretty much everybody would look like a terrible person. Context, frequency, etc. all matter.

If particular systems or people are seeing a high frequency of mistakes, maybe the system design is at fault, not just the person. Obviously it's hard to do in practice, but the ideal is to design systems that are mistake proof.

greedo · on Oct 27, 2020

This is just the mistakes made in the SAN/Storage part of his responsibilities. As we used to say in World of Warcraft, "Can't heal stupid."

jodrellblank · on Oct 28, 2020

> "He had worked with our old vendor's SAN for years, then we got a new SAN."

Great way to invalidate years of experience. Presumably from your telling of the story, he didn't cause problems with the old vendor's SAN?

> "He "accidentally" shut down the entire SAN."

So, was it an accident, or was it an "accident"? You can't have it being a mistake if you're also hinting it was deliberate and malicious.

greedo · on Oct 28, 2020

He was trained and certified on the new SAN, and surely some of his prior experience on the legacy SAN would translate. Just as moving from AIX to RHEL/CentOS wouldn't invalidate all your skills and experience.

It was a real accident when he shut down the SAN the first time. I don't know why I put it in scare quotes.

Lex-2008 · on Oct 27, 2020

> These incidents usually sound simple at the surface level ("I clicked the button in the wrong window") but often hint at deeper, perhaps even organizational, issues.

These words reminded me a story of similar/different "flaps" and "landing gear" controls on a plane - where crashed airplanes were also blamed on pilots first, before a trivial engineering/UI solution was implemented: https://www.endsight.net/blog/what-the-wwii-b17-bomber-can-t...

Huggernaut · on Oct 28, 2020

Nickolas Means has an absolutely wonderful set of talks on themes like this. Particularly relevant here I think, is his talk: "Who Destroyed Three Mile Island?" - which goes through the events that occurred at the nuclear power plant, the systemic problems, and how to find the "second stories" of why failures occurred.

https://www.youtube.com/watch?v=1xQeXOz0Ncs

_asummers · on Oct 28, 2020

There's a really good book describing this phenomenon called Behind Human Error. It speaks of "first stories" and "second stories" and how in analysis of incidents, it is all too common to stop at the first story and chalk it up to human error, when the system itself allowed it to take place.

ahoka · on Oct 27, 2020

"Both cloudformation stacks were identical (instance names, etc)."

This is why it's a good practice to include the environment name in the resource names when it makes sense. Even better, don't append the env name, but use it as a prefix, like ProdCustomerDb instead of CustomerDbProd. I also like to change the theme to dark mode in the production environments as most management UIs support this. One other neat trick is to color code PS1 in your Linux instances, like red for prod, green for dev.

greedo · on Oct 27, 2020

I have my background colors configured for each environment so when I'm shelled into a server, I know exactly what I'm working with.

brlewis · on Oct 27, 2020

I'm too lazy to do this manually for each server, but I change the hostname color in my prompt based on its hash.

https://gitlab.com/brlewis/brlewis-config/-/blob/master/bash...

nitrogen · on Oct 27, 2020

One other neat trick is to color code PS1 in your Linux instances, like red for prod, green for dev.

This is definitely a nice one to add. Though I did work with someone once who believed that all servers should be 100% vanilla and reverted my environment colors.

In container-only shops with no ssh, this is less of an issue, and instead you rely on having different permissions and automations for different environments.

YeGoblynQueenne · on Oct 28, 2020

That's very similar to what happened to me - except I didn't delete any backups, thank the Great Old Ones. And I didn't get fired.

Basically, I had a habit of starting a new SQL Server Management Studio instance in its own window for each database I was working on. At some point this struck me as wasteful, for some reason, so I closed all my windows and opened all the databases in one window. Then sometime after that I went to delete the test database as a routine maintainance task, but of course I was used to clicking the database at the top of the left pane in SSMS, which was the test database when it was the only database in a window... but now happened to be the production database. Then five minutes later I got a call from the client company that used our system, to ask me if there was any maintainance going on because everyone's client had just crashed.

The horror when I realised.

It was educational, though. I don't think I'll make that particular mistake ever again. And my bosses were ace to be fair, probably because I worked my ass off to correct the mess that ensued.

shezi · on Oct 29, 2020

When I worked in production environments, I used to set up little Firefox userscripts that would add a banner or anything visual to the production site. It's entirely client side and easy to customize.

dheera · on Oct 27, 2020

> I've seen this called "pointing and calling" [1], Japan's train drivers use the technique to force themselves to perform actions and take notice of the current environment.

The concept makes sense, though I don't quite fully get how to translate it to other contexts besides train driving where unexpected and unpredictable events come up all the time. Let's say you're driving a car and the traffic light turns red. Do you point at the traffic light, say "red", point at your brake pedal, say "brakes", and then hit the brakes?

apozem · on Oct 27, 2020

In high school, I drove a 1993 Toyota Tercel. It was a functional, reliable car, but it had no keyfob to lock the doors remotely.

Getting out of your car, pressing the lock button on the inside of the driver's side door, and shutting the door are all routine, boring actions that make it easy to forget your keys inside the car. The keys can go in all kinds of places as you climb out of the car - jacket pocket, pants pocket, center console. It is very easy to lock your keys in your car.

I quickly learned to hold my keys in one hand, say out loud, "Keys in hand," and then lock the door with the other hand.

This technique is perfect for any repetitive action that could go wrong with non-trivial consequences, and there's lots of that in everyday life.

wruza · on Oct 27, 2020

I'm always using the "phone keys cigarettes money" mantra together with patting on my pockets before opening any outside door.

GauntletWizard · on Oct 27, 2020

I wake up in the mornings with "Shit Shower Shave" and leave the house with "Wallet Watch Testicles Spectacles". Simple mnemonics work, doubly so if you actually say them out loud and check them each off.

Symbiote · on Oct 27, 2020

It is normally "spectacles, testicles, wallet and watch", since that would traditionally make the Catholic sign of the cross.

But maybe you're a Satanist, in which case the reverse order probably makes sense.

robaato · on Oct 27, 2020

"Nuns on the run" (Robbie Coltrane and Eric Idle)

https://www.youtube.com/watch?v=YqXZ9YoRD50

encom · on Oct 27, 2020

I do that exact some thing, and I haven't smoked in 3 years. The downside is that if I'm supposed to remember to bring something, in addition to those 3 things, I'm extremely likely to forget it. If it's super duper important, I tie it to the door handle.

blackboxlogic · on Oct 28, 2020

To remember to bring a physical object, I leave my keys on it. Downside, sometimes people will bring my keys to me when they find them in strange places, like the fridge.

scott_s · on Oct 27, 2020

That’s me approaching a blue mailbox with my letter to send in one hand, and my keys in the other.

edgyquant · on Oct 27, 2020

I just put a spare behind the license plate

Gene_Parmesan · on Oct 27, 2020

Definitely a good idea. In the subject of the analogy (software incidents) I think both should be done -- a regular and habitual focus on important/high risk commands via procedure, and preparations for the time when the inevitable still happens because people are people and it's impossible to fully predict all potential sources of unintended consequences. A lack of habitual focus when important consequences are at stake could lead to an over-reliance on the safety nets, and you really don't want your safety nets becoming routine. Otherwise you'll need safety nets for the safety nets.

kube-system · on Oct 27, 2020

Repetitive tasks are exactly what pointing and calling helps with. The intent is to prevent the brain from going on autopilot for a task that happens exactly the same way 99.9% of the time, in order to prevent disasters that last 0.1% of the time.

Traffic lights are a lot more random (and therefore mentally engaging) than the types of things train conductors are pointing and calling.

An automotive equivalent of a situation that would benefit from pointing and calling is something like this: https://www.consumerreports.org/car-safety/guide-to-rear-sea...

eg.: "Car parked, ignition off, get child"

Timpy · on Oct 27, 2020

Whenever I have something in my hand that I'm about to put down for a second in the exact absent minded kind of way that would leave me searching all over the house for it 5 minutes later, I say it out loud. "Headphones on the table by front door."

roland35 · on Oct 27, 2020

Embarrassingly I once lost a hamburger while still holding it.. I had my arm propped up on a the back of the chair and it was just out of my peripheral vision. Not my smartest moment.

adiM · on Oct 27, 2020

I lost my sunglasses when I was wearing them! We were going to a state park for a hike. It was a 2 hr ride for which I was wearing my sunglasses but forgot. As we came out of the car to start the hike, I spent 5 minutes searching for my sunglasses in my backpack until my friend asked what I was searching for .... Maybe I should be saying "sunglasses on" from now on

mamon · on Oct 27, 2020

Funny, there is a Polish rhyme [1] for children based on the same concept: a person searching the whole house looking for glasses which they were wearing all the time :)

[1] https://blogs.transparent.com/polish/okulary-by-julian-tuwim... (scroll down for english version)

uranusjr · on Oct 27, 2020

I believe the trick is to anticipate failure, and call out the normal thing instead. So you’d always slow down at every light, and only speed back up after calling out green. This is what all drivers are actually supposed to do, although I fully realise nobody practically does that, which is why we get so many automobile accidents all the time.

toast0 · on Oct 27, 2020

Only speed back up after calling out green and intersection clear.

I don't necessarily always do that, and don't make audible calls, but when driving at night or in inclement weather, I try to make extra effort to check for unexpected cross traffic.

nemetroid · on Oct 27, 2020

The pointing and calling performed by Japanese train drivers is very much about expected events. "Green signal" would be one of the most common call-outs. For example:

https://www.youtube.com/watch?v=afjPmN0GT04

Green signals are pointed at at 2:58 and 3:29.

bo1024 · on Oct 27, 2020

Your example is a reactive event. Something happened in your environment.

This idea is more useful for situations that you are initiating, and where feedback is not immediately obvious.

An example could be turning your car’s lights on at night. Before starting the car, you force yourself to point to the switch, say “lights on”, and do it.

I use this with keys. When leaving my office, house, or car, I hold up the key in my hand and establish sight (I don’t say anything out loud). Then I lock the door.

notJim · on Oct 27, 2020

I'm a photographer, and I used to get annoyed that I'd have little distractions on the edges and corners of the frame, because I was focussed on the subject and overall composition. I trained myself to sort of bounce my eyes around the sides of the viewfinder when pressing the shutter (think like the DVD player menu). Now I almost never forget to check.

leetcrew · on Oct 27, 2020

I don't think it really applies to stuff like driving, which almost has to be muscle memory to work at all. even with something routine and non-urgent like switching gears in a manual, the steps have to happen faster than you can say what you're doing.

a good example from normal life is (physical) key management. I used to always forget my keys when walking out the front door, which was a big problem since it locks automatically. to solve the problem, I made my back right pocket be the designated "key pocket". I now slap my right butt cheek whenever I leave a building. it might look weird to observers, but I have not once forgotten my keys since I implemented this system.

cecilpl2 · on Oct 27, 2020

After losing my wallet several times and not having a clue when the last time I had it on me was, I implemented a similar system. I now habitually triple tap my three designated pockets for phone, wallet, keys, every time I walk through a doorway.

That way, if any of them are missing, I know they must be in the room I just left.

detaro · on Oct 27, 2020

http://ars.userfriendly.org/cartoons/?id=20000501 ;)

cortesoft · on Oct 27, 2020

I do a "wallet keys phone" mantra when I leave a building.... has a bit of a melody to it that I always repeat

tsomctl · on Oct 27, 2020

I do that too. The important thing is to pat your pocket before closing the door. Twice now I've done it 2 seconds too late.

SkyBelow · on Oct 27, 2020

Invert it and I think it works. Always prepare to stop at an intersection. Then point out it is green and call out you do not need to engage in stopping.

It may seem silly, but if we asked people who drive 30+ minutes every day if they have every accidentally ran a stop sign or red light, I suspect the numbers would be quite high (though they likely happen at times/places where chance of accidents are the smallest, such as empty roads late at night).

shezi · on Oct 29, 2020

I teach my children to point in the direction of where cars can come from before crossing the road. He used to just swing his head around before, now he has to search directions and point there to direct his attention and it works excellently.

As others have pointed out, this is for repetitive tasks that your brain wants to automate away, but you really want to keep in attention.

hrktb · on Oct 27, 2020

It can be used for exactly the same purpose: checking the environment before doing the action.

E.g. force yourself to read the “production” part of your prompt before running the command. Point at the user name before deleting its record. Read aloud the version name before sending it to deploy.

It really makes a different between just glancing at the info, and having to parse it as part of an action.

jrumbut · on Oct 27, 2020

Let's say you get a request to delete users #s 1, 17, 152, and 43.

Now you can have the request and database administration tool open and point and call at the numbers and any queries and make sure you are deleting the right users.

saberdancer · on Oct 27, 2020

OpenShift does this by forcing you to write the name of the project you are about to delete. It was something that used to annoy me but reading this I understand it is a good call from their side.

rachelbythebay · on Oct 27, 2020

I do that when I drive around. Car on the side street. Kid over there... with a ball. Hidden left turner in 3...2...1... yep.

I love finding out that this stuff works.

nailer · on Oct 27, 2020

I do things like

  const HARD_CODE_TEST_DATABASE_FOR_SAFETY = 'unit-testing'

  destroyDatabase(HARD_CODE_TEST_DATABASE_FOR_SAFETY)

1. Avoid silly terms our industry should have ditched years ago, like 'drop'

2. Making sure that nobody will ever change HARD_CODE_TEST_DATABASE_FOR_SAFETY because they thought it should 'always be the active database' or whatever.

justinlloyd · on Oct 28, 2020

I have had many disasters in my software career because I jut wantonly hit "Y" without thinking about it.

I have noticed, since learning to cook at a professional level in the kitchen, that I point and call out a lot more in my other activities too. "From hot behind" and "knife" and "oven is over temp" to "Saw blade is live" and "circuit is live" in the workshop to "production server" and "erasing records" in database maintenance. Some days I feel like Sigourney "I have one job damnit" Weaver in Galaxyquest. It's a useful stop-think-go sanity check.

uyt · on Oct 27, 2020

This is true for NYC subways too! https://www.youtube.com/watch?v=i9jIsxQNz0M

greenyoda · on Oct 27, 2020

The video doesn't really explain why conductors point at the signs - it just says "to prove they're paying attention". Paying attention to what? The answer is that they are verifying that the train is correctly positioned in the station so that all of the doors will open on the platform.

Explained here: https://www.nydailynews.com/new-york/mta-conductors-point-st...

tialaramex · on Oct 27, 2020

This comes up every few weeks on HN but nobody has ever offered any statistics that would suggest this is as good let alone better than just having the trains handle alignment automatically. It's a task humans are bad at and machines are good at, so just giving it to machines makes more sense, modulo unions.

London Underground hasn't had guards for decades at this point, and the Docklands Light Railway hasn't even had drivers (there is a member of staff who is trained to be able to drive it on every train, but they are usually doing other things) since its creation. If they're misaligning often enough for it to be possible for New York to be statistically better I haven't seen anything about it after repeatedly asking.

jpcooper · on Oct 29, 2020

Actually what exactly is the member of staff doing on the DLR that is necessary, other than answering tourists' questions and putting a triangular key into a receptacle at every stop and then turning it? I have not been able to figure this out.

In the Netherlands, the NS has two types of trains that go between towns. Intercity and Sprinter. Sprinters have someone who will walk onto the platform at every stop, or failing that, lean out of the carriage, verify that no one is getting in, and then step into the train again to put the key into the receptacle and then turn it. Following that, the doors close. In contrast, there is no such person on Intercity trains; they do fine without. There may be a conductor who checks tickets. In comparison to the DLR, both Sprinter and Intercity trains have drivers.

Is there some requirement or function that I am missing that requires a dedicated member of staff to perform this key-turning ritual at every stop on the DLR and Sprinter, or is this simply to appease the unions?

It could be that Sprinters are meant to be more lenient towards people running to get on than Intercities, which might have a stricter schedule.

tialaramex · on Oct 29, 2020

It's a GoA 3 system, so it isn't designed to be safe without a human staff member on every train. There are GoA 4 systems which do not need a human but the DLR isn't one, so while it would seem to operate normally if you just let passengers operate the doors - when anything goes wrong those passengers are in trouble because the system design assumes a trained member of staff is there to fix it and now there isn't.

That triangular key opens a panel by the front left seats of the train, which reveals a complete set of controls for manually driving the train which that member of staff is trained to use. If the GoA 3 system has given up when the train is just out somewhere random then "just get out" while technically possible since there's a walking route along the side at all times - is clearly not ideal even for able-bodied passengers, so in fact the member of staff will drive the train manually to a station unless obviously that's impossible somehow (e.g. terrorists blew up sections of track either side like a Hollywood movie).

Because humans are bad at driving trains, they aren't allowed to move at full speed, they can either let the GoA 3 automation oversee everything (e.g. it won't let them go anywhere it wouldn't be willing to go) at a reduced speed or when that's not useful they can switch off all automation and move at a crawl with no oversight.

Every morning the first train of the day on each route is driven in the first of those two modes, because overnight human maintenance teams sometimes manage to leave tools and equipment on the line and the automation doesn't know not to drive the train into a welding kit left on the track by some idiot who just discovered his wife is leaving him or whatever. So the human staff member's job is to drive the train (with the AI preventing them smashing it into other trains) while looking out the front window for problems.

jpcooper · on Oct 30, 2020

That clarifies things. Thanks for the in-depth explanation.

viraptor · on Oct 28, 2020

I try to do that during incidents. I'm not 100% there since it's no a company rule, but it helps me at the time and later when writing up details: "I see <behaviour X>", "<Y> should fix it because <Z>", "I'm starting to do <Z> now and seeing ...", etc.

It also helps when Z results in a total meltdown and you need to pull in more people to help out, so they have context of what happened.

Qu3tzal · on Oct 28, 2020

French firefighters do this when arriving at a scene. The first messages sent over the radio will say:

- I am... (who you are and where you are)

- I see... (describe what you see in simple non-ambiguous terms)

- I do... (what action you are taking now)

- I ask... (ask for reinforcements if necessary, you may be asked to justify yourself more)

xvf22 · on Oct 28, 2020

Killed just under 1k access points when they all upgraded on one go. They had no problem erasing the firmware but when they all tried to download the new one at once it killer the service and we ended up with a lot of blank APs. The conformation message for 1 or 1000 APs is unhelpfully "This will overwrite all existing system images. Are you sure Y/N"

m463 · on Oct 29, 2020

> forcing a cache miss in the brain

That is an interesting way of looking at it.

I think a router analogy might be more precise - more like fast path / slow path - where when most packets come in they hit the fast path in hardware, and slow path exception packets get handled by the cpu.

:)

ekanes · on Oct 28, 2020

I do this with my kids, gesturing (not pointing) as it helps my mind remain focused on truly listening to them amid everything else going on. I probably look ridiculous, but I'm a better father for it so ¯\_(ツ)_/¯

stjohnswarts · on Oct 28, 2020

I always called it a "that can't be right" interrogative.

xamuel · on Oct 27, 2020

I wish it were possible for similar prompts to appear before all sorts of policy-makers and bureaucrats. "It appears you are about to institute a policy which will require 400 million patients to sign an additional waiver every time they visit a clinic, this will waste a total of 354,921 human hours within the next year alone. Please type 354,921 to proceed."

gumby · on Oct 27, 2020

The motivations are different: the cost to the rule maker of the effort by all those people is nil. While the cost of not adding the paper is the risk of something happening in the future which could cost them their job. This is why the shoe removal theatre was added to flying: the risk of something happening is essentially nil, but if it did, heads would roll.

This is not a criticism of bureaucracy or regulation BTW (I'm a fan of both, in general). It's simply a recognition that there's a misalignment of objectives.

Not sure how to analyze the calculus in the case of rachaelbythebay's observation. Certainly there is one misalignment which is if the tool has sharp unprotected edges (e.g. can take the company's whole site down) the person who ran the program will be blamed, not the person who wrote it. Unless they are the same person, it's hard to get a proper feedback loop in place. The only tools we have are coding standard and code reviews: bureaucracy!

cortesoft · on Oct 27, 2020

In my experience, the protections are added after a Learning Review from an incident.

Joker_vD · on Oct 27, 2020

Yeah, it's quite surreal. "Hey, privacy is important, so let it make so that to handle people's private data, you'll need a permission from them". All right, now whenever you try to e.g. send a (paper) mail, you have to sign the waiver that yes, you do allow the post office to see and handle your name and your mail address. Not only that, all such waivers seem to be written as "I hereby allow <insert the legal entity> to handle my private data in whatever way they want to", so we're back on square one, just with more perfunctory paperwork required.

jackhack · on Oct 27, 2020

closely related: the Paperwork Reduction Act of 1995

https://digital.gov/resources/paperwork-reduction-act-44-u-s...

it requires the office of management and business to calculate the impact of records-keeping requirements impact on time and privacy, among other things.

I do not believe it has resulted in a reduced recordskeeping burden. For the most part I simply see an estimate of how long it will take to complete my tax forms and permits, on the form itself. Perhaps others have different views.

mulmen · on Oct 27, 2020

Hard to say, knowing the cost of a new process could have informed a new design or requirements. We don’t know what the other path held. But I believe in general having more information allows us to make better decisions so this is a good act.

mulmen · on Oct 27, 2020

How do you know it was a waste? Maybe that was time well spent.

harikb · on Oct 27, 2020

I have a habit of creating cli tools, which potentially do dangerous things, to default to dry-run mode. For example, instead of the typical `--dry-run` or `-n` option, my scripts instead had a cheesy `--do-it` to be non-dry-run. It is annoying as hell to my colleagues, but saved the day many times.

PureParadigm · on Oct 28, 2020

A coworker of mine would write all his bash scripts to echo out the commands it would run, and then to actually run it he would pipe it to bash. This way he could inspect the commands to make sure they were correct before running them.

Something like: ./dangerous-script.sh $args | bash

GauntletWizard · on Oct 28, 2020

I would love a shell that allows you to “run” a script in manual mode - Where at the end of every command, every statement, it prints what the next command will be with all variables expanded or otherwise called out, and then requires you to hit “enter” to cause it to proceed. I write a decent amount something between README and Shell Script. I’ve already got an awk one-liner that parses the shell out of Markdown. I typically copy+paste, line-by-line, from my README and add a bunch of echo statements to verify what i’m doing.

efreak · on Oct 29, 2020

Press f8 to process autoexec?

tomjakubowski · on Oct 28, 2020

Is your coworker Willard Van Orman Quine?

dredmorbius · on Oct 28, 2020

Same, or save to a file, temporarily, check that, then run the resulting script.

meesterdude · on Oct 28, 2020

wow that's so clever and simple! Love it.

jacobwilliamroy · on Oct 28, 2020

I also do this.

jiggawatts · on Oct 28, 2020

In PowerShell, this is a native feature of the entire shell and hence scripts and commands.

The following prefix in a ps1 script enables the -WhatIf and -Confirm parameters:

    [CmdletBinding(SupportsShouldProcess=$true)]

To enable -Confirm by default for scary scripts, just use:

    [CmdletBinding(SupportsShouldProcess=$true,ConfirmImpact='High')]

The nice thing is that in PowerShell, unlike bash, this flows through to the vast majority of other commands. If the script has the snippet above, then you don't have to litter it with "if ( $userSaidYes ) { ... }" blocks all over the place.

Similarly, PowerShell automatically wires up logic to produce all of the useful modes you might want:

    [Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend

This is very fiddly to implement manually, and "Suspend" is likely impossible for most shells.

See: https://docs.microsoft.com/en-us/powershell/scripting/learn/...

yobert · on Oct 27, 2020

I did this with "--im-not-scared" for production mode :D

csours · on Oct 28, 2020

We had one that required "BADIDEA" to run

dmuth · on Oct 27, 2020

I do something similar with my scripts, but have `--go` action, even on a script that requires no other options, just so that if it's run without any options, the person running it gets a message saying what the script WOULD do, if `--go` were passed in.

hotsauceror · on Oct 27, 2020

I do the same thing. All of my scripts have a -defang parameter which walks through the entire process, including placeholder log messages, but not actually performing the operation. My run books always say to run your exact command with this switch first, to proofread it. For some dangerous scripts, defang is enabled and has to be manually turned off. Defang is also nice because it will tell you e.g. here’s the size of the backup you’ll be restoring, or the filepath you’ve composed based on your parameters, or confirming that you’ll be replacing an existing thing instead of creating a new one. It has saved me many, many times.

robaato · on Oct 27, 2020

Bash tip I picked up from observation - always start a potential command with #

# rm -rf some_dir

Then if you accidentally press return before completing it hasn't happened.

When you have reviewed and are sure it is correct, you recall and delete the hash to execute - simples!

arendtio · on Oct 27, 2020

In my opinion, the option -r should only be allowed as the last parameter. Maybe with the exception of -f. Everything else is just f*ing dangerous.

I mean, I use the # hack sometimes too, but when I don't, I find myself often being afraid of accidentally coming on the enter key.

stjohnswarts · on Oct 28, 2020

I generally throw up a status report type of thing "you are applying $this_operation to $this_many_machines on $this_farm. Continue (yes/no)?" and enforce yes/no full typing. Anything other than yes is a no

matart · on Oct 27, 2020

Does this work with autocomplete?

greenyoda · on Oct 27, 2020

Just tried it with bash on Linux, and apparently autocomplete works in a comment.

jrumbut · on Oct 27, 2020

Even having a dry run mode is exciting. Doesn't even have to give complete results just "I was planning to delete 3 files and create 7 files", gives a hint whether the command will blow up the system or not.

dingaling · on Oct 27, 2020

I wish SQL had a dry-run mode in updates and deletes for that reason.

"Run it as a query first" gets 90% of the way until you drop a constraint by accident whilst rewriting it as an update :o

harikb · on Oct 27, 2020

For interactive queries / surgery, you do have an option with a transaction (begin/commit/abort).

If it is Postgres (don't know about other dbs), you can go a way long way using "savepoints" and "rollbacks" to truly have a trial-and-error safe surgery on db. Still dangerous, but quite helpful. I hate working on any other db without those features. Postgres also allows schema changes to be within a txn envelope.

vlunkr · on Oct 27, 2020

I've thought the same thing. I also wish SET came after where. I've done "UPDATE table_x SET something = true"; and then forgot the WHERE clause.

krab · on Oct 27, 2020

Transactions and rollback is the dry run. The problem is that if you keep the transaction open for too long, you will block other updates to the same data.

cableshaft · on Oct 27, 2020

Yep, I always write any update queries as a rollback transaction with some selects inside it to verify what the data looks like after it's done now, before I switch it to commit. I primarily use Microsoft SQL Server right now, so I also use WITH (NOLOCK) to prevent issues running my query will have with other updates.

skymt · on Oct 27, 2020

Enough folks have replied that transactions are the way to go, but I just wanted to add that whatever interface tool you use for your database may have an option to force you to commit your transactions manually. For example PostgreSQL's default 'psql' shell has the "autocommit" option which, when disabled, requires you to manually 'commit;' before any changes take effect.

SkyBelow · on Oct 27, 2020

I think an improvement to SQL would be for insert/update/delete clauses to require a where clause and allow for something like 1=1 if you really intend to hit all rows. A safe but even more invasive would be requiring an end to the were clause as well (to prevent selecting a few but not all constraints).

dsego · on Oct 28, 2020

I think this is true for mysql, at least for delete.

verve_rat · on Oct 27, 2020

Wrap it in a transaction and roll back the transaction at the end. Then remove the transaction when you are ready to do it for real.

You can jam a select in the end of the transaction to check what happens.

cbm-vic-20 · on Oct 27, 2020

MySQL has a command line option "--i-am-a-dummy" (aka "--safe-updates") for exactly this purpose.

https://dev.mysql.com/doc/refman/8.0/en/mysql-command-option...

austinl · on Oct 27, 2020

I like this format in general, since it communicates the command is severe/irreversible. Heroku implements a similar confirmation when performing destructive actions. Commands require your to pass a `--confirm ${APP NAME}` flag, so the original command itself does nothing. Of course, this doesn't prevent you including those flags in makefiles, etc. I once dropped a table in a side project by accident because I took the wrong tab autocomplete suggestion in a makefile.

leetcrew · on Oct 27, 2020

works great until some asshole puts

  alias harikb_script='harikb_script --do-it'

in their .bashrc to eliminate this annoying step.

actuallyalys · on Oct 27, 2020

I suspect someone who'd do that isn't going to take that or other precautions seriously regardless of it being aliased. It's still a problem that they're circumventing it, but I think you have a larger problem if someone with that mindset has access to production.

xaedes · on Oct 27, 2020

This would help a bit: Don't accept the "--do-it" as first parameter, make it obligatory to be the last.

X6S1x6Okd1st · on Oct 27, 2020

If someone is a programmer and is trying to disable safety features making it slightly harder to do so doesn't really seem like the solution.

_ikke_ · on Oct 27, 2020

  my_command() {
      command my_command "$@" --do-it
  }

xaedes · on Oct 27, 2020

Good point. Stuff like this is why I wrote "a bit". Thank you for providing an example, why it wont be enough.

Xophmeister · on Oct 27, 2020

We've been known to use something like --yes-i-really-mean-it-this-time for really dangerous options. It's a like built-in solemnisation step.

vehementi · on Oct 28, 2020

I once came across one like this

$ run-script.sh --dry run

`--dry-run` parameter not recognized

Executing ...

roydivision · on Oct 27, 2020

Reminds me of the proposal to keep the nuclear launch codes inside the body of an innocent volunteer, so the President would have to kill the person to get the codes.

https://boingboing.net/2015/12/11/proposal-keep-the-nuclear-...

chrisseaton · on Oct 27, 2020

I've never understood this idea.

If you believe we should never use nuclear weapons, then don't have them at all.

If you believe there is a case where it may be moral and rational to use nuclear weapons, why would you want to put a potential barrier in the way of their use? You could have a situation where everyone was agreed to use them but the president was physically unable to harm the aide to use them.

You can know that something is the right thing to do but not have the courage to physically harm someone to do it.

An interlock that you may not be able to unlock for reasons unrelated to the task at hand is a bad interlock.

shuntress · on Oct 27, 2020

>You can know that something is the right thing to do but not have the courage to physically harm someone to do it.

In this specific case the "thing to do" is literally to harm hundreds of thousands of people.

The reasoning behind this proposed interlock is that any logic which concludes that it is moral and rational to harm hundreds of thousands of people must also conclude that it is moral and rational to harm the "interlock" individual. Otherwise, it is likely that dropping the bomb would be a mistake.

chrisseaton · on Oct 27, 2020

> The reasoning behind this proposed interlock is that any logic which concludes that it is moral and rational to harm hundreds of thousands of people must also conclude that it is moral and rational to harm the "interlock" individual.

Yes, but you can know it's the right thing to do, but not be able to physically do it.

The president's ability to physically cut someone open is not relevant to whether it's a good idea to use nuclear weapons or not. Him being unable to do it tells you nothing about whether they should be launching the weapons.

If the president fails the test that tells you nothing about whether the launch is the right thing to do. Doesn't that fundamentally make the test bad?