Yup, I've heard stories of junior engineers causing millions of dollars worth out outages. In those case the process was drilled into, the control that caused it fixed and the engineer was not given a reprimand.
If you have an engineer that goes though that and shows real remorse your going to have someone who's never going to make that mistake(or similar ones) again.
Agreed. Several years ago as a junior dev I was tasked with adding a new feature- only allowing a user to have 1 active session.
So, we added a "roadblock" post auth with 2 actions- log out other sessions and log out this session.
Well, the db query for the first action (log out other sessions) was missing a where clause...a user_id!
Tickets started pouring in saying users were logged out and didn't know why. Luckily the on-call dev knew there was a recent release and was able to identify the missing where clause and added it within the hour.
The feature made it through code review, so the team acknowledged that everyone was at fault. Instead of being reprimanded, we decided to revamp our code review process.
I never made that kind of mistake again. To this day, I'm a little paranoid about update/delete queries.
We all make this mistake eventually, often in far more spectacular fashion. My lessons learned are
1) Always have a USE statement (or equivalent);
2) Always start UPDATE or DELETE queries by writing them as SELECT;
3) Get in the habit of writing the WHERE clause first;
4) If your SQL authoring environment supports the dangerous and seductive feature where you can select some text in the window and then run only that selected text — beware! and
5) While developing a query to manipulate real data, consider topping the whole thing with BEGIN TRANSACTION (or equivalent), with both COMMIT and ROLLBACK at the end, both commented out (this is the one case where I use the run-selected-area feature: after evaluating results, select either the COMMIT or the ROLLBACK, and run-selected).
Not all of these apply to writing queries that will live in an application, and I don't do all these things all the time — but I try to take this stance when approaching writing meaningful SQL.
#5 is the big one. I left WHERE off an UPDATE on "prod" data once. Fortunately it wasn't mission critical data that I ended up wiping and I was able to recover it from a backup DB. I never did anything in SQL without wrapping it in a transaction again.
I will note that depending on your DB settings and systems, if you leave the transaction "hanging" without rolling back or committing, it can lock up that table in the DB for certain accessors. This is only for settings with high levels of isolation such as SERIALIZABLE, however. I believe if you're set to READ_UNCOMMITTED (this is SQL Server), you can happily leave as many hanging transactions as you want.
> 2) Always start UPDATE or DELETE queries by writing them as SELECT;
On that point, I'd love a database or SQL spec extension that provided a 'dry-run' mode for UPDATE or DELETE and which would report how many rows would be impacted.
> I'd love a database or SQL spec extension that provided a 'dry-run' mode for UPDATE or DELETE and which would report how many rows would be impacted.
I mean, if your DB supports transactions, you are in luck.
Start a transaction (that may be different among vendors - BEGIN TRANSACTION or SET AUTOCOMMIT=0 etc) and run your stuff.
If everything looks good, commit.
If you get OOOps number of records, just rollback.
> Luckily the on-call dev knew there was a recent release and was able to identify the missing where clause and added it within the hour.
Raises questions about deployment. Didn't the on-call have a previous build they could rollback to? Customers shouldn't have been left with an issue while someone tried to hunt down the bug (which "luckily" they located), instead the first step should have been a rollback to a known good build and then the bug tracked down before a re-release (e.g. review all changesets in the latest).
Yup, agreed entirely. I'm a bit fuzzy on the details of that decision...he very well may have rolled back and then deployed a change later in the evening. It was fixed by the time I checked my email after dinner.
UPDATE cashouts SET amount=0.00 <Accidental ENTER>
Oops. That was missing a 'WHERE user_id=X'. Did not lose the client at the time (this was 15+ years ago), but that was a rough month. Haven't made that mistake again. It happens to all of us at some point though.
At a former employer, we had a Megabuck Club scoreboard; you got your name, photo and a quick outline of what your (very expensive!) mistake had been posted on it. Terrific idea, as:
a) The culture was very forgiving of honest mistakes; they were seen as a learning opportunity.
b) Posting a synopsis of your cockup made it easier for others to avoid the same mistake while we were busy making sure it would be impossible to repeat it in the future; also, it got us thinking of other, related failure modes.
c) My oh my was it entertaining! Pretty much the very definition of edutainment, methinks.
My only gripe with it was that I never made the honor roll...
We had something similar at one of my jobs, its hard to relay it in text but it was really a fun thing. Mind you this is at a fortune 100 company and the CIO would come down for the ceremony when we did it, to pass the award. We called it the build breakers award and we had a huge trophy made up at a local shop, with a toilet bowl on it. If you broke the build and took down other developers then the passing of the award ceremony was initiated, I would ping the CIO (as it was my dev shop) he would come down, and do a whole sarcastic speech about how the wasted money was all worth it because the developers got some screw off time, while the breaker fixed the build. It was all in good spirit though and people could not wait to get that trophy off their desk, it helped that the thing was probably as big as the Stanley cup.
We built actual, physical thingamajigs; 'breaking a build' more often than not meant fragments of steel whizzing around the workshop while we were all getting doused in hydraulic fluid for our trouble.
Note to self: be very careful using Ziegler-Nichols to tune multi-kHp-systems. Very careful. Cough.
If you have an engineer that goes though that and shows real remorse your going to have someone who's never going to make that mistake(or similar ones) again.