Hacker News new | past | comments | ask | show | jobs | submit login
I Accidentally Deleted 7TB of Videos Before Going to Production (thevinter.com)
516 points by thevinter on May 5, 2022 | hide | past | favorite | 345 comments



> but at the time the code seemed completely correct to me

It always does.

> Well, it teaches me to do more diverse tests when doing destructive operations.

Or add some logging and do a dry run and check the results, literally simple prints statements:

    print("-----")
    print("Downloading videos ids from url: {url}")
    print(list of ids)
    ...
    ...
    ...
    # delete()  dangerous action commented out until I'm sure it's right
    print("I'm about to delete video {id}")

    print("Deleted {count} videos") # maybe even assert
    ...
Then dump out to a file and spot check it five times before running for real.


I was involved with archiving of data that was legally required to be retained for PSD2 compliance. So it was pretty important that the data was correctly archived, but it was just as important that it was properly removed from other places due to data protection.

This is basically the approach that was taken: log before and after every action exactly what data or files is being acted on and how. Don't actually do it. Then have multiple people inspect the logs. Once ok'd, run again, with manual prompts after each log item asking to continue, for the first few files/bits of data. Only after that was ok'd too did it run the remainder.

In other things I've worked on, I've taken the terraform-style plan first, then apply the plan approach, with manual inspection of the plan in between.


Once we get used to doing same thing multiple times a day, it doesn't matter if the log shows that we're about to take a destructive action, we'll still do it. Only thing that is foolproof is to not take the destructive action because people make mistake, it's human nature. I don't know how this can be implemented, may be encrypt the files, take a backup in some other location (which may not be allowed).

Multiple reviewers here didn't catch the mistake

https://www.bloombergquint.com/markets/citi-s-900-million-mi...


While this is a huge issue, a solution (well, a partial mitigation) I've seen and used is the "Pointing and Calling" technique. The basic idea is that you incorporate more actions beyond reading and typing or pressing a button—generally by having people point at something and say aloud what it is they're doing and what they expect to happen.

It's used rather extensively in safety-critical public transportation in Japan [1] and to a lesser extent in New York (along with many other countries) [2]. This can easily extend to software without overcomplicating by just setting the expectation that engineers, Q&A, etc. do this even when alone.

[1] https://www.atlasobscura.com/articles/pointing-and-calling-j...

[2] https://en.wikipedia.org/wiki/Pointing_and_calling


Hell, GitHub does that to an extent, with the "type the name of this repository to delete it" prompts. Typing the name of the repository isn't exactly perfect, but it's an interesting direction.


There was a thread recently about a repo that accidentally went private and lost all of its stars because of confusion with GH teams vs GH profile readme repo naming. I think this type of prompt is very useful for explicitly preventing the rare worst case scenarios but the problem is making any type of prompt "routine" so that our brains fail to process it.


The suggestion in that post about how to fix it is good, and mirrors one I read in the Rachael by the Bay blog - type the number of machines to continue:

https://rachelbythebay.com/w/2020/10/26/num/

The take away by both is there is actually something to do which can wake people up when the stakes are high, and they might not be doing what they expect.


And most importantly, don't let yourself get into the habit of copy pasting the value


I wonder if your could print some non visible characters in there to taint the copied value in some detectable way.


Prompt in words, but expect the value in numbers, eg: "Twenty-five" and the box requires you to type "25"? At least this specific case, it would require you to type it.


yeah, that would possibly stop the copy and paste problem. to make it robust they would need to use a string of a few non-visible characters but that would fail if the browser's clipboard system doesn't copy them over for some kind of privacy initiative. might be another way it fails that I can't think of right now.



I always copy-paste into that box as well, they should probably make at least an attempt at disabling pasting into it


Azure has the same when deleting a database just a verify this is the correct one by typing the db name


I heard of this technique, but unfortunately I don't see how it can be easily applied in software engineering/devops.

Also, I now realized that aviation checklists seem to tend to be done similarly with gestures - at least from what I saw on YouTube, not sure if that's representative or only used during education (?)


Spelling out loudly the command you are about to execute and explaining the reasoning behind it can help a lot too.


Ok, but am I to do it on every single command I do on my terminal? Or on which ones specifically? If the problem we're trying to solve is that I can sometimes overlook the "dangerous commands" among "safe ones", by definition of overlooking it won't work if I tell myself to "spell out the command only in case of the dangerous ones", no?

I'm honestly trying to think of the way how I could approach this for myself, just I don't see a clear solution yet that wouldn't require me to spell out everything I type in my terminal window.


“I’m removing that semicolon!” (Pointing)


Parent meant this sort of pointing.

https://t.co/TjfX5K54H7


Because everyone assumes that everyone else is looking at it more closely than they are. “I’ll just do a cursory look since I’m sure everyone else is doing a in-depth look.” Narrator: nobody did an in-depth search.


I'm a fan of doing things temporally so data is very rarely actually deleted from the database. Most of the time, you just update the "valid_to" field to the current time. Sometimes real deleted are required such as with privacy requests, but I think that sort of thing is pretty rare.

If your application has space concerns, you can modify this approach to be like a recycle bin where you delete records which are no longer valid and have been invalid for over a month (or whatever time frame is appropriate for your application). However, I think this is unnecessary in most cases except for blob/file storage.


That form had a couple weird checkboxes with odd wording. It is a famous mistake, but also rather understandable just because the form was cryptic.


> Multiple reviewers here didn't catch the mistake

Sure, but we can only do so much. I find its good bang for buck and alternatives that might prevent that are not always available, so we do the best we can. You gotta make a call on whether its enough or not.


mv then rm is another idiom. So long as you have the space.

For database entries, flag for deletion, then delete.

In the files case, the move or rename also accomplishes the result of breaking any functionality which still relies on those file ... whilst you can still recover.

Way back in the day I was doing filesystem surgery on a Linux system, shuffling partitions around. I meant to issue the 'rf -rm .' in a specific directory, I happened to be in root.

However ...

- I'd booted a live-Linux version. (This was back when those still ran from floppy).

- I'd mounted all partitions other than the one I was performing surgery on '-ro' (read-only).

So what I bought was a reboot, and an opportunity to see what a Linux system with an active shell, but no executables, looks like.

Plan ahead. Make big changes in stages. Measure twice (or 3, or 10, or 20 times), cut once. Sit on your hands for a minute before running as root. Paste into an editor session (C-x C-e Readline command, as noted elsewhere in this thread).

Have backups.


You mean cp then rm?

And yes, copy, verify, delete. And make sure by the code structure that you either do the three on the same files, or their fail.

Also, do it slowly, with just a bit of data on each iteration. That will make the verification step more reliable.

Anyway, for a huge majority of cases, only having backups is enough already. Just make sure to test them.


No, mv.

Example:

  cd datadir
  mkdir delete
  mv <list of files to be deleted> ./delete
  # test to see if anything looks broken.  
  # This might take a few seconds, or months, though it's usually reasonably brief.
  rm -rf ./delete
The reasons for mv:

- It's atomic (on a single filesystem). There's no risk of ending up with a partial operation or an incomplete operation.

- It doesn't copy the data, it renames the file. (mv and rename are largely synonyms.)

- There's no duplication of space usage. Where you're dealing with large files, this is helpful.

The process is similar to the staged deletion most desktop OS users are familiar with, of "drag to trash, then empty trash". Used in the manner I'm deploying it, it's a bit more like a staged warehouse purge or ordering a dumpster bin --- more structured / controlled staged deletion than a household or small office might use.


I think mv then rm is probably meant as 'windows trash bin' style.


  > ... Then have multiple people inspect the logs. Once ok'd, run again, with manual prompts after each log item asking to continue...
This sort-of reminds me of some "critical" work I had to do a couple of decades ago. I was in a shop that used this horrifically tedious tool for designing masks for special kinds of photonic devices-- basically it was tracing out optical waveguides that would be placed on a crystal that was processed much like a silicon IC.

The process was for TWO of us to sit in front of computer and review the curves in this crazy old EDA layout tool called "L-edit" before it got sent to have the actual masks made (which were very expensive). It took HOURS to check everything.

The first hour was tolerable but then boredom started to creep in and we got sloppy. The whole reason TWO people got tasked with this was because it was thought that we would keep each other focused-- 2 pairs of eyes are better than one, right?. Instead, it just underscored the tedium of it all. One day someone walked in and found us BOTH in DEEP SLEEP in front of the monitor. Having two people didn't decrease the waste caused by mistakes, it just bored the hell out of more people.


How many mistakes did you catch?


ONE real one and some occasional nitpicks to show that we were busy (after being caught asleep).

Was it worth it? No, I don't think so from an opportunity cost perspective-- even though we were the most junior folks there. A mind is a terrible thing to waste!


From his story I can tell he found one big mistake. The tedious work itself.


Another good approach is do deletions slowly. Put sleeps between each operation, and log everything. That way if you realize something is broken, you have a chance of catching it before it's too late.


> Then have multiple people inspect the logs.

I think that this is the most important part of any check. Your parent refers to checking the log five times, but, at least in my experience, I won't catch any more errors on the fifth time than the first—if I once saw what I expected rather than what was there, I'll keep doing so. Of course everyone has their blind spots, but, as in the famous Swiss-cheese approach, we just hope that they don't line up!


Yes, I love the idea of the Plan Apply.


It never hurts to ask for another set of eyes to review. At the least if something goes awry, the blame isn't solely on you.


Make a plan, check the plan, [fix the plan, check the plan (loop)], do the plan

See PDCA for more a more time critical decision loop. https://en.wikipedia.org/wiki/PDCA


Another technique that I've used with good success is to write a script that dumps out bash commands to delete files individually. I can visually inspect the file, analyze it with other tools, etc and then when I'm happy it's correct just "bash file_full_of_rms.sh" and be confident that it did the right thing.


This was taught to me in my first linux admin job.

I was running commands manually to interact with files and databases, but was quickly shown that even just writing all the commands out, one by one gives room personally review and get a peer review, and also helps with typos. I could ask a colleague "I'm about to run all these commands on the DB, do you see any problem with this?". It also reduces the blame if things go wrong if it managed to pass approval by two engineers.

While I'm thinking back, another little tip I was told was to always put a "#" in front of any command I paste into a terminal. This stops accidentally copying a carriage return and executing the command.


> This stops accidentally copying a carriage return and executing the command.

For a one-liner sure, but a multi line command can still be catastrophic.

Showing the contents of the clipboard in the terminal itself (eg via xclip) or opening an editor and saving the contents to a file are usually better approaches. The latter let’s you craft the entire command in the editor and then run it as a script.


From [0]:

[For Bash] Ctrl + x + Ctrl + e : launch editor defined by $EDITOR to input your command. Useful for multi-line commands.

I have tested this on windows with a MINGW64 bash, it works similarly to how `git commit` works; by creating a new temporary file and detecting* when you close the editor.

[0] https://github.com/onceupon/Bash-Oneliner

* Actually I have no idea how this works; does bash wait for the child process to stop? does it do some posix filesystem magic to detect when the file is "free"? I can't really see other ways


It does create and give a temporary file path to the editor, but then simply waits for the process to exit with a healthy status.

Once that happens, it reads from the temporary file that it created.


The 'enable-bracketed-paste' setting is an easier and more reliable way to deal with that: https://unix.stackexchange.com/a/600641/81005

It will prevent any number of newlines from running the commands if they're pasted instead of typed.

You can enable it either in .inputrc or .bashrc (with `bind 'set enable-bracketed-paste on'`)


That was our SOP for running DELETE SQL commands on production too, a script that generates a .sql that's run manually. It saved out asses a fair amount of times


Yeah, wish I'd learned that the easy way. Fresh into one of my first jobs I was working with a vendor's custom interface to merge/purge duplicate records. It didn't have a good method of record matching on inserts from the customer web interface so a large % of records had duplicates.

Anyway, I selected what I though was a "merge all duplicates" option without previewing results. What I had actually done was "merge all selected". So, the system proceeded to merge a very large % of the database... Into One. Single. Record.

Luckily the vendor kept very good backups, and so I kept my job. Because I also luckily had a very good boss and I had already demonstrated my value in other ways, he just asked me "Well, are you going to make that mistake again?". I wisely said no, and he just smiled and said "Then I think we're done here."

I have been particularly fortunate throughout my career to have very good managers. As much as managers get a lot of flack here on HN, done well they are empowering, not a hindrance, and I attribute a lot of success in my career to them.


> Yeah, wish I'd learned that the easy way.

I think that, if you've only learned something like that the easy way, then you haven't learned it yet. As long as everything's only ever gone right, it's easy to think, I'm in a rush this one time, and I've never really needed those safety procedures before, ….


At a previous job the DB admin mandated that everyone had to write queries that would create a temporary table containing a copy of all the rows that needed to be deleted. This data would be inspected to make sure that it was truly the correct data. Then the data would be deleted from the actual table by doing a delete that joined against the copied table. If for some reason it needed to be restored, the data could be restored from the copy.


I tend to write one script that emits a list of files, and another that takes a list of files as arguments.

It's simple to manually test corner cases, and then when everything is smooth I can just

    script1 | xargs script2
It's also handy if the process gets interrupted in the middle, because running script1 again generates a shorter list the second time, without having to generate the file again.

When I'm trying to get script1 right I can pipe it to a file, and cat the file to work out what the next sed or awk script needs to be.


Ah, I’m glad I’m not the only one who did this. It also means that you can fix things when they break halfway. Say you get an error when the script is processing entry 101 (perhaps it’s running files through ffmpeg). Just fix the error and delete the first 100 lines.


The only issue with that is if subsequent lines implicitly assume that earlier ones executed as expected, e.g. without error.

Over-simplified example:

1. Copy stuff from A to B

2. Delete stuff from A

(Obviously you wouldn't do it like that, but just for illustration purposes.) It's all fine, but (2) assumes that (1) succeeded. If it didn't, maybe no space left, maybe missing permissions on B, whatnot, then (2) should not be executed. In this simple example you could tie them with `&&` or so (or just use an atomic move), but let's say these are many many commands and things are more complex.


At the point you're doing this, you should be using a proper programming language with better defined string handling semantics though. In every place it comes up you'll have access to Python and can call the unlink command directly and much more safely - plus a debugging environment which you can actually step through if you're unsure.


Eh, I think that misses the point a bit. Use whatever you want to generate the output, but make the intermediary structure trivial to inspect and execute. If you're actually taking the destructive actions within your complicated* logic then there's less room to stop, think, and test.

You could always generate an intermediary set, inspect/test/etc, and then apply it with Python. I've done that too, works just as well. The important thing is to separate the planning step from the apply step.

* where "complicated" means more complicated than, for ex, `rm some_path.txt` or `DELETE FROM table WHERE id = 123`.


Yes. Also, maybe not have a delete action in the middle of a script. It's usually better to build a list of items to be deleted. In that case, two lists: items to be deleted, items to be kept. Then compare the lists:

- make sure the sum of their lengths == number of total current items

- make sure items_to_be_kept.length != 0

- make sure no two items appear in both lists

- check some items chosen at random to see if they were sorted in the correct list

At this point the only possible mistake left is to confuse the lists and send the "to_be_kept" one to the delete script; a dry run of the delete list can be in order.


This. The original approach can fail horribly if there's a problem on the server when you run the script for real. Your code can be perfect but that's no guarantee the server will always return what it ought to.


I've had good success with this approach, have two distinct scripts generate the two lists, then in addition to your items here also checking that every item appears in one of the lists.


What do you recommend, to not get intro trouble if there are spaces or newlines in the file names?


Try not to delete stuff with Bash.

This is the most reliable way. Bash has a few niceties for error handling, but if you are using them, you would probably fare better in another language.

If you do insist on Bash, quote everything, and use the "${var}" syntax instead of "$var". Also, make sure you handle every single possible error.


`set -e` will abort on any error, anywhere in the pipeline. It’s a must for any critical script.


Don't use a shell script.


Do you mean, always pass the list directly to the next script via function calls, without writing it to an intermediate file / pipeline?


I'm being flippant, because shell scripts are so inherently error prone they're to be avoided for critical stuff like this.

If you _absolutely_ must use a shell script:

0. Use shellcheck, which will warn you about many of the below issues: https://www.shellcheck.net/

1. understand how quoting and word splitting work: https://mywiki.wooledge.org/Quotes

2. if piping files to other programs, using `-print0` or equivalent (or even better, if using something like find, its built in execution options): https://mywiki.wooledge.org/UsingFind

3. Beware the pitfalls (especially something like parsing `ls`): https://mywiki.wooledge.org/BashPitfalls

(warning: the community around that wiki can be pretty toxic, just keep that in mind when foraying into it.)


Yes, use the list argument to Python’s subprocess.run for example. It’s much easier to not mess up if your arguments don’t get parsed by a shell before getting passed.


Yes, I find command line tools that have a "--dry-run" flag to be very helpful. If the tool (or script or whatever) is performing some destructive or expensive change, then having the ability to ask "what do you think I want to do?" is great.

It's like the difference between "do what I say" and "do what I mean"...


That's what I like about powershell. Every script can include a "SupportsShouldProcess" [1] attribute. What this means is that you can pass two new arguments to you script, which have standardized names across the whole platform:

- -WhatIf to see what would happen if you run the script;

- -Confirm, which asks for confirmation before any potentially destructive action.

Moreover these arguments get passed down to any command you write in your script that support them. So you can write something like:

    [CmdletBinding(SupportsShouldProcess)]
    param ([Parameter()] [string] $FolderToBeDeleted)
    
    # I'm using bash-like aliases but these are really powershell cmdlets!
    echo "Deleting files in $FolderToBeDeleted"
    $files = @(ls $FolderToBeDeleted -rec -file)
    echo "Found $($files.Length) files"
    rm $files
If I call this script with -WhatIf, it will only display the list of files to be deleted without doing anything. If I call it with -Confirm, it will ask for confirmation before each file, with an option to abort, debug the script, or process the rest without confirming again.

I can also declare that my script is "High" impact with the "ConfirmImpact = High" switch. This will make it so that the user gets asked for confirmation without explicitly passing -Confirm. A user can set their $ConfirmPreference to High, Medium, Low, or None, to make sure they get asked for confirmation for any script that declare an impact at least as high as their preference.

[1]: https://docs.microsoft.com/en-us/powershell/scripting/learn/...


I’m a bit confused (because I didnt read the docs)… does calling it with “—whatif” exercise the same code path as calling without, only the “do destructive stuff” automagically doesn’t do anything? Or is it a separate routine that you have to write?

Cause if it is an entirely separate code path, doesn’t that introduce a case where what you say you’ll isn’t exactly what actually happens?


Well, just read the...

> because I didnt read the docs

Ouch.

> Or is it a separate routine that you have to write?

If you are writing a function or a module what would do something (eg API wrapper) then of course you need to write it yourself.

But if you are writing just a script for your mundade one-time/everyday tasks and call cmdlets what supports ShouldProcess then it works automagically. Issuing '-whatif' for the script would pass `-whatif` to any cmdlet what has 'ShouldProcess' in it's definition. Of course if someone made a cmdlet with a declared ShouldProcess but didn't write the logic to process it - you are out of luck.

But if have a spare couple of minutes check the docs in the link, it was originally a blog post by kevmarq, not a boring autodoc.


It's the first option. And yes, sometimes you have to be careful if you want to implement SupportsShouldProcess correctly, it's not something you can add willy-nilly. For example, if you create a folder, you can't `cd` there in -WhatIf mode.


The rule we have is that anything that is not idempotent and not run as a matter of daily routine must dry-run by default, and not take action unless you pass --really. This has saved my bacon many times!


Deleting actually is idempotent. Doing it twice wont be different from doing it once.


Deleting * may not be though. Your selection needs to be idempotent.


idempotency means that f(X) = f(f(X)). Modifying the X inbetween is not allowed. Is there really an initial environment where rm * ; rm * ; does something different than rm * once?


In the case of any live system, i would say yes. Additional, and different, files could have appeared on the file system in between the times of each rm *.


* is just short hand for a list of files. Calling rm with the same list of files will have the same results if you call it multiple times. That’s idempotent.

Your example is changing the list of files, or arguments to rm between runs. Same as pc85’s example where the timestamp argument changes.


In addition to what einsty said (which is 100% accurate), if you're deleting aged records, on any system of sufficient size objects will become aged beyond your threshold between executions.


Right. You can kind of consider the state of a filesystem on which you occasionally run rm * purges to be a system whose state is made up of ‘stuff in the filesystem’ and ‘timestamp the last purge was run’.

If you run rm * multiple times, the state of the system changes each time because that ‘timestamp’ ends up being different each time.

But if instead you run an rm on files older than a fixed timestamp, multiple times, the resulting filesystem is idempotent with respect to that operation, because the timestamp ends up set to the same value, and the filesystem in every case contains all the files added later than that timestamp.


> Is there really an initial environment where rm * ; rm * ; does something different than rm * once?

if * expands to the rm binary itself, maybe.


How is the system different after the first and after the second call?


If there is an rm executable in the current directory, and also one later in your PATH, the second run might use a different rm that could do whatever it wants to


This is actually a likely scenario, as it is common to alias rm to rm -i. Though your bash config will still run after .bashrc is nuked, some might wrap with a script instead of aliasing (e.g., to send items to Trash).


# rm rm

# rm rm

rm: command not found


Early in my career I used --yes-i-really-mean-it and then a coworker removed it with the commit message "remove whimsy".

T'was a sad day.


Going further, make it dry run by default and have an --execute flag to actually run the commands: this encourages the user to check the dryrun output first.


All my tools that have a possible destructive outcome use either a interactive stdin prompt or a --live option. I like the idea of dry running by default.


This is why I like to always write any sort of user-script batch-job tools (backfills, purges, scrapers) with a "porcelain and plumbing" approach: The first step generates a fully declarative manifest of files/uris/commands (usually just json) and the second step actually executes them. I've used a --dry-run flag to just output the manifest, but I just read some folks use a --live-run flag to enable, with dry-run being the default, and I like that much better so I'll be using that going forward.

This pattern has the added benefit that it makes it really easy to write unit tests, which is something often sorely lacking in these sorts of batch scripts. It also makes full automation down the line a breeze, since you have nice shearing layers between your components.

http://www.laputan.org/mud/mud.html#ShearingLayers


I tend towards a --dry-run flag for creative actions and --confirm for destructive actions. Probably sightly annoying that the commands end up seemingly different, but it sure beats accidentally nuking something important.


This sounds like a "do nothing script."

https://news.ycombinator.com/item?id=29083367

It defaults to not doing anything so you can gradually and selectively have it do something.

Learned about when I posted my command line checklist tool on HN: https://github.com/givemefoxes/sneklist

(https://news.ycombinator.com/item?id=25811276)

You could use it to summon up a checklist of to-dos like "make sure the collection in the dictionary has the expected number of values" before a "do you want to proceed? Y/n"


I do this, too, but I also take a count of the expected number of items to be deleted as well. If my collection I'm iterating over doesn't have exactly that number of objects I expect, I don't proceed.


Human-in-the-loop is so important concept in ops and yet everyone (that's including me) seems to learn it the hard way.


I just want to say as someone currently working on a script to delete approximately 3.2TB of a ~4TB production database, this subthread is pure gold.


To ensure that the files are actually are downloaded (step1), before deleting the original (step2). I would make make step1 an input to step2. That is step2 cannot work without step1. Something like:

    (step1) Download video from URL.  Include the Id in the filename.
    (step2) Grab the list of files that have been downloaded and parse to get the Id.  Using the Id, delete the original file.


Yep, even writing a simple wildcard at command-line I will 'echo' before I 'rm'.


On computers I own, I always install "trash-cli" and i even created an alias for rm to trash. It's like rm, but it goes to the good old trash. It will not save your prod but it's pretty useful on your own computer at least.


That's a good tip, thanks!


Agreed, I've also been burned doing stupid things like this and always print out the commands and check them before actually doing the commit.

As they say, measure twice, cut once.

Don't feel bad, I think every professional in IT goes through something similar at one time or another.


This was my first thought too. Another think I like to do, is to limit the loop to say one page or 10 entries and check after each run that it was correctly executed. It makes it a half-automated task, but saves time in the long run.


Condensed to aphorism form:

    Decide, then act.  
There's a whole menagerie of failure modes that come from trying to make decisions and actions at the same time. This is but one of them.

Another of my favorites is egregious use of caching, because traversing a DAG can result in the same decision being made four or five times, and the 'obvious' solution is to just add caches and/or promises to fix the problem.

As near as I can tell, this dates back to a time when accumulating two copies of data into memory was considered a faux pas, and so we try to stream the data and work with it at the same time. We don't live there anymore, and because we don't live there anymore we are expected to handle bigger problems, like DAGs instead of lists or trees. These incremental solutions only work with streams and sometimes trees. They don't work with graphs.

Critically, if the reason you're creating duplicate work is because you're subconsciously trying to conserve memory by acting while traversing, then adding caches completely sabotages that goal (and a number of others). If you build the plan first, then executing it is effectively dynamic programming. Or as you've pointed out, you can just not execute it at all.

Plus the testing burden is so drastically reduced that I get super-frustrated having to have this conversation with people over and over again.


It's amazing the number of times I look at some simple code and think "nah, this is so simple it doesn't need a test!", add tests anyway (because I know I should)... and immediately find the test fails because of an issue that would have been difficult to diagnose in production.

Automated tests are awesome :)


A few assertions would have also stopped this.

    During buildup of the our_id list: assert (vimeoId not in our_ids). 
    After creating the list:  assert len(set(our_ids)) > 10000 and assert len(set(our_ids)) == len(our_ids)
    Before each final deletion: assert id not in hardcoded_list_of_golden_samples. 
    Depending on the speed required you could hit the api again here as an extra check. 
But as always everything is obvious in hindsight. Even with the checks above, Plan+Apply is the safest approach.


>literally simple prints statements

Yes, that can be a simple but powerful live on screen log. I developed a library to use an API from a SaaS vendor, in much the same way as the author. It was my first such project & I learned the hard way (wasted time, luckily no data loss or corruption) that print() was an excellent way to keep tabs on progress. On more than one occasion it saved me when the results started scrolling by and I did an oh sh*t! as I rushed to kill the job.


Rather than commenting it out, I suggest adding a --live-run flag to scripts and checking the output of --live-run=false (or omitted) before you run it "live."


But then you have double the chances of introducing a bug for the specific scenario we are talking about:

Before: there is chance there is a bug in my "delete" use case

Now: what we have before plus the change that there is a bug in my "--live-run" flag


You can make automated tests for your flag. You can’t make automated tests for your code comments.


Beside doing this, I like to first just move files to another dir (keeping the relative path) instead of deleting them. It's basically like a DIY recycle bin.

If both paths are on the same disk moving files is a fast operation - and if you discover a screw up, you can easily undo it. On the other hand if everything still looks fine after a few days, you just `rm -rf` that folder and purge the files.


Yeah, that is what I recommend too.

Instead of performing the dangerous action outright, just log a message to screen (or elsewhere) and watch what is happening.

Alternatively, or subsequently, chroot and try that stuff on some dummy data to see if it actually works.


Indeed. I would say that framework or even language-level support for putting things in "dry-run" mode is something sorely missed from many modern frameworks and languages, that old C libraries used to do.


This is how I do it in compiled code. In shell, I print the destructive command for dry runs - no conditions around whether to print or not, I go back to remove echo and printf to actually run the commands.


I'd make sure those include WARN or ERROR (I'd use logging to do that), that way you can grep for those. Spot checking might be difficult if the logs get long.


The No. 2 philosophy!

Make sure you got everything out and off before you pull up your pants, or else you better be prepared to deal with all the shit that might follow!


   SELECT COUNT(1) FROM table 
   -- UPDATE table SET col='val'
   WHERE 1=1


    BEGIN TRANSACTION 
    UPDATE table SET col='val' WHERE 1=1
    ROLLBACK


Definitely better, when you can afford the overhead!


Exactly!


100% on the logging and dry run.


That is called experience.

Good decisions come from experience. Experience comes from making bad decisions.


Dry run really is key here. Most automated tests wouldn't find this bug.


Experience is the best teacher™


Aaaahhh, the feeling you get when you notice that you fucked up. Everything gets quiet, body motion stops, cheeks get hot, heart starts to beat and sinks really low, "fuck, fuck, fuck, fuck, fuck, fuck, fuck, fuck, fuck, fucking shit". Pause. Wait. Think. "Backups, what do I have, how hard will it be to recover? What is lost?". Later you get up and walk in circles, fingers rolling the beard, building the plan in the head. Coffee gets made.


Pffft, it's not a real panic until you weigh the pros and cons of leaving the country with nothing but the clothes on your back and becoming a illegal immigrant shepherd in a nation with too many consonants in its name.

(Your description is so, so, spot on.)


The worst panic I've felt actually took me over the precipice into peaceful oblivion. I started simply saying to myself "oh well... It's just a job".


I don't think there's any public technical mistake that'll prevent you from ever getting a job in tech. Demand is just too high. Peaceful oblivion still isn't my default even though it should be.


Ah, the goat farmer fantasy that always seems to come _at the cusp_ of the solution.


I had this experience when, years ago on my first day as group lead at $JOB, I was being shown a RAID 5 production server that held years of valuable, irreplaceable data (because there were no backups. Let me repeat that there were no backups). For some bizarre reason, I thought "oh cool, hot-swappable drives" and pulled one out of the rack. This naturally resulted in loud, persistent beeping from the machine, which everyone ignored on the assumption that the fellow who was just hired as the group lead knew what the f he was doing.

While I didn't know what I was doing, I did manage to get the beeping to stop, and had to come in at 5 a.m. the next day to restripe the drive I'd yanked out.

Did I mention there were no backups? When I was a little bit more seasoned on the job, I raised a polite but persistent issue with management of the need for durable backups. Although I kept at it for months, they thought about it, talked about it, and ultimately did nothing. A few months after I left, the entire array failed. Since the group's work relied on the irreplaceable data, all work ground to a halt for the several months it took for an off-site company to recover the data.


My previous boss stores company data this same way. I begged him to approve the $5 per month cost for Backblaze on the computers I used. He approved it for some, but not all (about half of the ten computers). He completely rejected the idea for the company's data. After all, it was already protected by RAID.


Isn’t RAID 5 supposed to survive a single disk being taken out?


Theoretically but there are often other things at play. I know the story is older but since about 2015 raid5 has been dead to me, mostly because at current drive sizes a raid5 rebuild takes so long your chance of a cascade failure and losing a second drive which makes it a "send to a recovery lab" risk. Anywhere you would use raid5 just do raid6.


To add to the comments of cascading failure: if a drive goes bad, another drive from the same manufacturing batch is disproportionately likely to go bad. RAID arrays are often built with drives from the same batch, since they were bought at the same time from the same vendor. This means array failures include multiple drives more often than you'd expect.


Yes, the array itself was fine; was just a dumb action on my part given how brittle the system was.


If a second drive fails after the first while rebuilding (which happens more often with larger and slower drives), the data is lost.


lol, its amazing how fast the blood leaves your face when your mind transitions from "cool that worked well" to "Oh no, what have I done?"

That backups comment sounds very familiar.

I accidentally deleted a clients products table from the production database in my early years as a solo dev. There was only a production database. Luckily I had written a feature to export the products to an excel sheet a while before and happened to have an excel copy from the prior day. I managed to build an export to ingest the excel and repopulate the table in record speed while waiting for my phone to ring and the client to be furious. Luckily they never found out.


God the feeling of having your body temp rise based purely on realizing you fucked up is so relatable.


damn, your description is spot on and reading this triggered PTSD in me... Last time I had this feeling was two years ago when I destroyed one of our development servers because of a failed application update. I know exactly how I wished Ctrl + Z to exist in real life... We had backups of the machine, but it was still kind of a humiliating feeling to tell everybody and ask for restore from backup (everybody was cool though in the end)


I lost 1hr and 30 minutes of a Slack like app (chat messages). Luckily at the time we were pretty small so not much data was lost but holy shit did that make me almost throw up.

Thank God my automatic backups were so close to the mistake I made and I didn't lose 24 hours.

Haven't made a mistake like that since and I don't destroy DB records like that anymore.


Don't forget that out-of-body experience where you just kinda float outside yourself.


If it is for real, body motion does not exactly stop, it manifests itself in other ways.


Poetic! Love it


I like these stories. I think they resonate well for 'the rest of us'. I've made plenty of mistakes like this - you learn and grow, right?

One of the best things about HN is that so many incredible, talented people post. It's incredibly inspiring to raise your own game, to see what the best are doing. But sometimes it's equally important to realise we all fuck up, and for every unicorn dev there's another thousand of us grinding away.

OP - well done for sorting the problem and telling us all about it!


Amen


The root of this particular issue was Vimeo's failure to do this migration for their customers.

Vimeo OTT has a codebase written in Rails, whereas the main PHP application is written in PHP. At the time Vimeo acquired Vimeo OTT's codebase, the Vimeo OTT codebase was small — around 10,000 lines of Ruby. Rewriting that codebase inside the Vimeo PHP application would have been a tough technical challenge for the all-Ruby team, and they'd have likely lost some people along the way and missed out on some content deals, so they decided instead to maintain two separate codebases and two separate login systems.

The video-playback and video-storage infra has since been unified, but all the business logic is still siloed.


He wasn’t asking them to refactor their internal code bases. But they should be able to whip up the 20 lines of code needed to do this between APIs (or just directly on their servers). Essentially what author was trying to do when he screwed up. For the author this was disposable code, for Vimeo this would have been a reusable utility.

I know how these things happen. Support ticket queues and all. And while I don’t fully know the difference in cost, I would assume a customer upgrading to an Enterprise plan would get a better support experience.

Whoever within authors company negotiated the upgrade to Enterprise (or didn’t) and failed to embed some agreement around OTT to Enterprise transition assistance was the one who made the first mistake.


Per the post, Vimeo DID do it -- without telling the customer! And then wouldn't help uncluster the situation.


>The root of this particular issue was Vimeo's failure to do this migration for their customers.

Yes and No. At the end of the day, you as a business have to insulate yourself from your infrastructure provider.


Vimeo is the only infrastructure provider providing that service. It is impossible to insulate a business from it.


You're saying it's impossible to not accidentally delete 7TB of videos, and when you do, to blame it on Vimeo?


First, I want to say that this is a great post. You always grow stronger when you make mistakes. Writing it up solidify understanding in the learning process.

This story resonates with many people here because many experienced engineers had done something similar before. For me, destructive batch operations like this would be two distinct steps:

1. Identify files that need to be deleted; 2. Loop through the list and delete them one by one.

These steps are decoupled so that the list can be validated. Each step can be tested independently. And the scripts are idempotent and can be reused.

Production operations are always risky. A good practice is to always prepare an execution plan with detailed steps, a validation plan, and a rollback plan. And, review the plan with peers before the operation.


> 1. Identify files that need to be deleted; 2. Loop through the list and delete them one by one.

> These steps are decoupled so that the list can be validated. Each step can be tested independently. And the scripts are idempotent and can be reused.

This is the most underrated comment.

I'm saying it as someone who had the ultimate oversight of deleting hundreds of TBs per day spread of billions of files on different clouds and local storage.


I've never regretted treating tasks like this as a pipeline of discrete steps with explicit outputs and inputs. Sending output to a file, viewing it, then having something process the file is such a great safety net.


I'm impressed you went with an automated solution (PlayWright) for 500 videos after all that, considering they could be cross-loaded from Google Drive almost instantaneously. I'm glad it worked, but coding around a screw-up under the gun seems like a high-risk operation compared to spending 4 hours doing the task manually (albeit being super bored the whole time), but with the benefit of knowing it's being done correctly instead of hurriedly writing a script to potentially do something else wrong very efficiently and dig your hole deeper.


+1 to this. After the few major screw-ups I've caused at work, my self-confidence in my coding ability is rocked, and I tended to react by erring towards manual cleanup, rather than coding some scalable solution for fixing the issues


Actually I was surprised reading that the person wrote a script to delete 900 videos.

If you need to do it once, it’s probably 2-3 hours of work? That is identifying a duplicate video and then clicking the button(s) to delete it once every 20 seconds.

Reminds me of https://xkcd.com/1205/


A big part of the reason for the problem in this post is because Vimeo made it impossible to move videos from one Vimeo product to another Vimeo product: "There were roughly 500 videos on VimeoOTT that had to be transferred to Enterprise and Vimeo doesn't provide an easy way of doing it."

I have found working with Vimeo to be very frustrating, especially recently. They have a great video solution, especially for streaming, but they seem to put these unnecessary and frustrating roadblocks that make me constantly question my decision to use Vimeo. From in ability to move videos from one place to another, requiring complete uploads (resulting in problems like this post) to nonsensical limits and pricing, especially on their new webinar offering, which has a limit of 100 registered attendees. For anyone who has run webinars before, this makes no sense since 100 registered attendees usually means 20-30% of those people actually attend, so you're capped at 20-30 live attendees. They should price it like most event sites and charge per live attendance rather than registration.

Regardless, I've been very frustrated with Vimeo since it could be so much better if they didn't have these roadblocks in place. If they could have easily enabled moving videos from one product to another, the post (and 7TB of lost videos) would never have happened. It wasn't always this way with Vimeo, but they went IPO in May 2021 and it's no surprise they're turning the screws on their product offering and pricing now.


Honestly, this is positively representative of any junior developer with comparable experience. Depending on their background and how much production work they had, there's an overwhelming sense of eagerness and enthusiasm. Quick to script and perhaps a bit too quick to execute.

A friendly team will harness that enthusiasm and tame the quickness / encourage respect for production. We all made a massive doo doo and its how you proceed that'll define your career.


We can all poke at this person for doing things incorrectly, but one has to wonder what mindset could lead to any programmer ever thinking that:

  1) parsing a web page shouldn't be considered incredibly fraught with problems
  2) that reloading web pages should be part of (1)
  3) that this should ever possibly be run without validating the list of files that would be deleted
So forget the specifics. Where are people learning these things, and what do we do to teach them better things?


Some mistakes can only be learned by making them. Sometimes you can tell someone a hundred times something, they won't learn until they experience it.

The point is not to prevent these mistakes, but to keep the consequences low.

Have backups, have version control, etc.


True, and worth remembering why. Most of us are constantly getting warned about the dire potential consequences of huge numbers of things, most of which are either massively unlikely to ever happen or not actually that bad, or both. It's very difficult to tell which of the things we get warned about are actually high risk until something bites us.


College? Parents? In my experience it runs pretty deep so not sure it can be easily trained out. This mindset is probably quite useful in evolutionary terms: rush at the attacking bear without thinking, for example.


> rush at the attacking bear without thinking, for example

Would that work? I don’t see a bear backing down and I don’t see the human winning either.


> Where are people learning these things, and what do we do to teach them better things?

Learn to learn and learn to work carefully. It starts in school and should be part of a proper college/university education or vocational training.

There's several ways of learning the specifics: by experience on-the-job, which can be hard if mistakes can get you fired; or by putting in the work in your free time.

If your job is to work with certain web frameworks and you're not very experienced, either ask senior devs to assist/review before going live with critical changes. Alternatively, practice at home. Unpopular, but you need to get experience from somewhere. OSS projects are a great way to do that - be that by creating your own or by contributing to an existing one.


"rm -rf" blowing you foot off is a Unix Right of Passage(tm).

You will do it at least once in your career. If you're old enough you will do it twice. If you're really old, you get the joy of doing it a third time.

The subtlety increases each time because you do learn.


Seriously.. also, looking at these code snippets...

If someone delivers code that looks like that, especially if intended for a production system, I'm firing immediately.

It's a miracle nothing has happened sooner.


From the article:

>I'm a Junior Developer with less than one year of actual experience.

>The bad news is that this was on Friday, and we needed to have the videos back up at most for Tuesday morning.

You say:

>If someone delivers code that looks like that, especially if intended for a production system, I'm firing immediately

Fire immediately? What a miserable sounding place to work.


In this case - seeing how they let them have direct access to production - I agree on the miserable sounding place to work and repeat myself -

It’s a miracle nothing happened sooner


I was referring to your workplace.


At least we don’t let junior developers with close to zero experience anywhere near production..

I didn’t quite read the part about his experience in the article, I agree firing over that wouldn’t be fair, but that just raises other questions.


The more I read about vimeo the more I wonder what's up with these guys.

Only recently they made some god aweful policy changes for content creators(1), but it looks like they treat their enterprise customers just the same.

Surely, there must be better alternatives for hosting videos than being at the mercy of a company who couldn't care less about big paying customers.

(1) https://www.theverge.com/2022/3/18/22985820/vimeo-bandwidth-...


mux.com seems like a great alternative and is super developer focused.


This is one of those times that even if you don’t use a fully functional language, trying to make as much of your program logic pure functions would be helpful.

It also makes it more testable. Instead of putting the delete call right in the loop, split it into four functions.

    function getAllVimeoVideos()

    function getAllDbVideos()

    function getVideosToDelete(vimeo_videos, db_videos)

    function deleteVideos(videos_to_delete)

Your core logic lives in getVideosToDelete which is simply a set difference.

Given that there are only a few hundred videos, it is easy to run the getter functions above and quickly verify they are returning what you expect.


This was going to be my exact recommendation. By “separating the concerns”, you make it easier on my pretty much every dimension: testing in unit tests, doing a dry run in production, ability to read the code (you and code reviews), and in some cases your code will be written in a more functional way reducing variable scoping issues.


Yes that's fun. a

    List<Foo> getFoosToUpdate(List<Foo> foos, List<Bar> bars) 
function is the first time I thought about time complexity in my job.

Say Foo and Bar have fields in common, such that you can say a Foo object "equals" or "matches to" a Bar object, like if they have name and dateOfBirth fields or something else that are the same (nothing like a common ID between the two). Now say there are some other fields too, like amountSpentThisYearOnDogFood that you know is always accurate for Bars, but might be out of date for Foos. How do you get the list of all the Foos to update?

Initially I did the nested for loop solution that's like

   List<Foo> getFoosToUpdate(List<Foo> foos, List<Bar> bars)
   {
    List<Foo> returnList = new List<Foo>();
    foreach (var foo in foos)
    {
     foreach (var bar in bars)
     {
      // check if "equal" or "matching" based on some criteria
      // if equal, update foo dog food expenditure with bar dog food expenditure, add to returnList, and break
     }
    }
    return returnList;
   }
but that's O(n^2) right.

The solution with a Dictionary is obviously better. All you need to ensure is that you have a method for both the Foo and Bar classes that will produce the equivalent hash for both, if they would be considered equal or matching by whatever criteria you are using.

So you could have something like

    int GetHashOfFoo(Foo foo)
    {
     string firstName = foo.FirstName;
     string lastName = foo.LastName;
     DateTime dob = foo.Dob;

     return (firstName, lastName, dob).GetHashCode(); // convenient c# method
    }

    int GetHashOfBar(Bar bar)
    {
     string firstName = bar.FirstName;
     string lastName = bar.LastName;
     DateTime dob = bar.Dob;

     return (firstName, lastName, dob).GetHashCode();
    }
These two functions will return the same value if those fields are the same. So then you can do something like

   List<Foo> getFoosToUpdate(List<Foo> foos, List<Bar> bars)
   {
    List<Foo> returnList = new List<Foo>();
    Dictionary<int, Bar> barsByHash = new Dictionary<int, Bar>(bars.Count);

    foreach (var bar in bars)
    {
     int barHash = GetHashOfBar(bar);
     barsByHash[barHash] = bar;
    }

    foreach (var foo in foos)
    {
     int fooHash = GetHashOfFoo(foo);
     if (barsByHash.ContainsKey(fooHash) 
     {
      returnList.Add(foo.CopyWith(dogFoodExpenditure: barsByHash[fooHash].DogFoodExpenditure))
     }
    }
    
    return returnList;
   }
Which is faster cause you only have to go through the bars list once.

I actually messed up something like OP with this, but with doing undesired additions instead of undesired deletions.

You can think of it as having two endpoints, both expecting a .csv with rows being the things you were updating/changing/deleting.

The problem was, there was a column to indicate (with a character) whether the row was for an edit, or addition, or deletion, but this was only with one of these endpoints. For the other, there was only addition functionality, but I thought changes and deletions were also options for the other kind of .csv due to some unwise assumptions on my part (thinking that the other .csv would have the same options as the other). That's how we accidentally put in over 100 additions that should have been changes that had to be manually deleted. Luckily I had a list of all the mistaken additions.


"I'm under an NDA"

Don't write a blog post.


Oh dude, we've all been there.

9 years ago I was working for a major broadcasting company in the arse end of London as a junior dev, building one of their Android apps.

We'd roll features out months before & enable them with feature flags via a json file we'd manually push to a prod server at a later date.

We'd just built a huge new feature letting you request content to be downloaded to your set top box remotely & it had a 250k marketing campaign to go along with the launch.

Senior dev trusted me with prod deployment rights.

I pushed the wrong json config to prod, launching the feature weeks before the marketing campaign.

Thank god I was a junior perm, that was definitely a firing offence.


> Senior dev trusted me with prod deployment rights.

That part's crazy! If you think it was a firing offence wouldn't they've been fired? (I don't think it is, but obviously requires system changes/explanation.)


> It involves bad practices and errors from multiple parties in a world that might seem

> foreign to the "Silicon Valley" world but paints an accurate picture of what

> development is for small IT companies around the world

Everybody makes mistakes even in the "Silicon Valley" world, but such problems cloud be easily caught by testing (which he did but it was restricted to the first page) and performing a simple dry-run.


Exactly, everyone makes mistakes. Sometimes huge ones. In hindsight or on the sidelines it's always easy to point out a few technical things that WOULD HAVE avoided catastrophe, but does that help? I think not (aside from a cautionary parable for interns).

Things are complicated, people are human and forget things, there are pressures to "get it done" and override the guardrails. Everybody has horror stories. Some worse than others. Welcome to the OP's day of horror. I would think "Silicon Valley" dev-ops horror stories make this one seem like a triviality.


Apart from all the advice on how to do such destructive operations more safely, I think there's also a lesson to be learned about communicating more actively:

1. Vimeo responds to the original request with "will look into it", then... nothing happens? This may depend on culture, but at least from my experience in the UK, this is a very non-committal response, and if you really want them to do something, you'll need to chase them. Wait a few days and inquire if they have any estimate for when it might get done, or if they need more information. I find that the "looking into it" response is sometimes used to gauge how important the request is to you.

2. Once you go with your own solution, just drop a quick message to Vimeo: "Hey, just wanted to let you know we've found our own solution for this, and won't require your help any more. Sorry if you've already committed any resources for this task. Have a nice day, yada yada." This not just avoids what happened here, but is also a courtesy to them.


Hey, everyone, ease up. I have: 1) dropped a production database because I thought it was the test database. 2) screwed up a print job costing $100,000 in today’s money and had to do it again 3) crashed all of Facebook with a C++ bug. 4) crashed Facebook photo uploads, with a JavaScript bug, in my first month. 5) literally killed a startup’s cash flow and caused them to lose their merchant account because I over focused on the wrong bugs.


At my first development job (paid internship at a moderately-sized, though fast-growing business - maybe 300 people at the time?) I introduced a bug that didn't appear until a certain microservice stopped working (my code defaulted in the wrong direction when the ms failed) and as far as I can tell they may have lost or almost lost a pretty big account from it. In an after-hours meeting regarding the issue, one of the higher ups ended up storming out and never showing up again.

In my defence, we had to get 2 PR approvals before anything was merged! But I definitely learned a thing or two from that experience


You worked at Facebook, we get it


Code without constant logging of “utc [who] does what exactly” is a no-go for me for a long time. Also, if you have to be destructive, replace the <rm/sell/halt> with log() for at least one time (aka --verbose --dry-run) and check your expectations. One-shot scripts like this are screaming disaster.

(The problematic line lacks the closing ", probably a typo? I though it closed in an unexpected location)


This is more common than you think. Not just losing data, but not having a good handle on where the important parts of the system are, and how close you are to catastrophe. I find diagrams really help. I can recall a visual map of the system when I work on some component, and think, "OH, I remember seeing this component connected to a really critical thing, I need to check something first."

Start by creating one empty page for every component of your system. You won't remember them all, but over time you can add missing ones. Each page is the authoritative source of info on that component. If you need more pages for one component, put them in a directory of the same name as the page and add ".d" to the directory name, and link to them from the first page. Finally, create a diagram (however you want) that includes every component you have a page for. Add the count of components to the top of the diagram. If the count on the diagram doesn't match the number of documents, time to update the diagram. If you ever add, remove or rename a page, time to update the diagram. If you do this the same way for every different system you have, you can link them all together and get both small and large scale diagrams. (p.s. don't waste time automating this unless you find the system changing constantly or you have a very big system)


I believe if we're honest, we've all done stupid things we should have avoided. I remember a group of about 3000 emails that went out to insurance agents saying that policy #123456789 for Someone Funky was going to be cancelled by underwriting. I also remember very quickly figuring out how to automate Outlook's email recall feature.

We've all made big dumb mistakes. Recover and learn.


It's like the first time you run

  rm -rf /path/to/delete/ * 
And realize it is taking too long...


Can you explain? I feel like it removes / but not sure why.


   rm -rf /path/to/delete/ *
Note the space between the last / and the *

This will recursively remove the directory /path/to/delete and remove every file/directory that matches * in the current directory where 'rm' is being run.

When what was most likely meant was:

   rm -rf /path/to/delete/*
Note the lack of a space between the last / and . This will remove all files that match that reside in the /path/to/delete/ directory.


Besides recursively deleting /path/to/delete/ the command also deletes all (non hidden) content of the current directory (note the * at the end of the line). I assume the correct command would be /path/to/delete/*.


The error is the space before the asterisk. The original intention was to delete the contents of the folder /path/to/delete/. Instead, the asterisk enumerates files in the current directory and they get deleted


It removes everything in the current directory


> Vimeo doesn't provide an easy way of doing it. I wrote to the support team around October asking them if it was possible to do a migration, and they told us that they "will look into it" without letting us know anything ever since. [...] At one point, without letting us know anything, Vimeo decided it was a great idea to comply with our request and dumped all the videos present on OTT onto the new platform. No questions were asked [...] they were duplicating videos that were already uploaded.

Oh yes Vimeo, the crappy company that won't let you play videos unless you enable autoplay in your browser[1].

Selecting them as a provider was the actual mistake.

[1] https://askubuntu.com/questions/777489/vimeo-video-not-playi...


This is why you have backups. Good on you to have them!

When I just started as a junior dev at a small company I made the classic mistake of emptying the prod db instead of my local dev db. This was a small and in hindsight insignificant project. But Google was our customer, so it didn't feel insignificant at the time.

In this case my inexperience was partly my savior. All the data was inputted by people via a web form. Normally you're supposed to use POST to submit a form. But I was quite clueless at the time, so I had used GET. This meant all requests were still in the Apache logs. I could simply replay all requests.

I still feel my hard pounding when I think about the moment I realized what had happened. I was really relieved when everything was back!

What I learned from this incident:

- make automated backups

- no access to prod db from anywhere but prod


Yea, I’ve wiped out an entire government’s form library once. Backups are a career saver.


For larger 'live' production changes I've now started to rely on generative programming. I've got one script in some 'normal' programming language like javascript, or python, which in turn generates a script that contains a list of curl or other cli commands which do the actual deletion, modification, addition, etc.

This allows me to run a small sub-set of commands and test those under a live-environment before running all commands at once. In addition, this also functions as a complete log of what has been changed manually in production.


Kudos to you for "learning in public" by showcasing part of your learnings online!!! I think this is extremely important to do!

Not everyone is an innate rockstar developer who provisions k8s clusters for breakfasts and delivers features for lunch!

Being a developer is a really hard job and there are endless complexities and difficulties along the way and when we are more seasoned already.

Don't let any negative feedback deter you from keeping doing what you're doing: learning from your mistakes and improving along the way!


Shouldn't that be `page={page}` rather than `page{page}`? Or better yet, use the requests `params` argument.


Any process that makes a junior directly access prod codebase/database is flawed. No matter how small of a company you are, you can set up a proper CI/CD pipeline.


90% of IT companies in Italy don't even know what a CI/CD pipeline is. That said I don't think it's something we could've integrated in our pipeline as it's an error that originated from an external service!


The only thing that I can remember helping against such actions, is the exponential need for confirmation by intent.

Means, if you delete one small file you need one confirmation, if you delete thousands, you need a intent stating i expect thousand files to be deleted. Same goes for size. So not a okay button, but instead a form allowing you to enter the dimension of the intented outcome. 100 files max, 1 gb max deleted.

If the request goves over the intent, the system aborts.


It should really be something like: "a flaw in our system allowed me to delete 7am TB of videos". Not entirely your fault.


System and/or development processes


This is a great technical write up, I'd love to hear the human side of this story as well! When did you tell the higher ups that you deleted production? Was no one more senior on call to try to fix it? Did they want you to learn how to fix it? Or were you the most senior responsible for this whole area? Or did they don't know?


The first part of my write up slightly explains it but the point is that HN is the top 1%. In my current company we have 10 developers, most of them without a technical degree. They know how to do what they've been doing for the past 10 years but (as with most small companies here in Italy) people don't know what best practices are used in the industry, what a pipeline is or what a dry-run is (I learned about it today myself!).

What happened is that no one knew how to react and I was probably the best suited for it, we don't really have seniority in office.

That said when I deleted the videos I immediately told my boss. He was kind of scared but his reaction was mostly "Well, now we have to re-upload them immediately, find a way. The people that uploaded them once won't be doing it twice". I was basically left on my own to find a solution (which I luckily did).

Please note that I'm in no way blaming my company or accusing it of something, this is the standard knowledge base and way of dealing with things in many places, contrary to what working in big tech or reading HN might make you believe!


Thanks for the explanation, that makes a lot of sense!

> "HN is the top 1%" + "this is the standard knowledge base and way of dealing with things in many places, contrary to what working in big tech or reading HN might make you believe!"

I'm in fact from Spain and now live in Japan, and I believe the practices in Spain would be as bad as Italy, and in Japan they are def worse (great at hardware, horrible at software), so I do understand a lot of what you are saying. FWIW, in Spain I've seen whole dev teams composed only of interns!

> "we landed a big contract for one of the biggest gym companies in Italy, the UK and South Africa" + "we don't really have seniority in office"

Maybe now that seems like you have the budget it's a good time to go to management and suggest to hire some senior devs who can mentor the rest into learning best practices? You can sell it like a reinvestment in the company to management if they want to take it as pure profit. If Italy is like Spain, many devs won't really even want to learn these things, but some will and then those will become seniors at some point.


> "What does this teach us? Well, it teaches me to do more diverse tests when doing destructive operations."

I think it also teaches us that adversity sometimes leads to better solutions. I love that the OP made a hacky script that did in 4 hours what a guy was paid to do manually over several months!


>... the "Silicon Valley" world ...

To rebillionizing!

https://www.youtube.com/watch?v=wGy5SGTuAGI&t=369s

...yeah, the Tres Commas bottle was on the DELETE key. The corner of it was just, it juuuust got on there...


> but at the time the code seemed completely correct to me

I venture this kind of (misplaced) over-confidence is not atypical of many junior developers. As someone with a few years under my belt, I don't care how sure I was of the code I wrote that deletes important data, I would have gone through the code over and over again, and at least ran a simulation (by maybe logging the generated delete urls for manual verification).

It's a rite of passage and we all went through something like this. It's how you learn and grow.

>It also should probably teach something to Vimeo

No. Even if Vimeo could have made things better, it's still your fault. You have to take responsibility for your business. At the end of the day, if this causes the closure of your company, Vimeo is still fine.


After having read about plenty of such cases over the years, I have a persistent dread of pulling something like that myself, to the point of being nervous with ‘*’ in the terminal, and generally checking everything twice. (And also have some kind of mild horror-high from corporate snafu stories, weirdly reminiscent of Ballard's ‘Crash’).

So: I never feed the data straight from the gathering script into the modifying script, at least not in the first runs. Instead, I dump the whole list of items into a file, count them in there, gawk at them to see that they're right, and compare with the source data by hand until I begin to annoy myself. Then I feed that file to the second script.


Great post and great attitude.

I think I would reflect on why this is a script to begin with. It's run once and with only 500 items could be done manually, though 500 is certainly a bit much.

But it's not a massive time saver; the point of the script should be almost entirely to increase accuracy. I think I would write one script to generate the list of videos to delete; that's the part that's actually difficult, and a human can then verify the list. I would probably just delete them by hand after that, but if I really wanted a script for that part too, it would be a separate script that uses a list that has been vetted by a human even if initially created by the first script.


Does anyone else get that deep, dark, disturbing feeling in their gut when they know they have done something bad like this?

This is why I use so many print statements and comment out destructive actions! Lots of experience with these feelings!


You are fine dude, you didn't delete any videos, only high availability cache of videos on a streaming site. If that was your master copy, you probably would have taken greater care, if not :-). Anyway, when working with caches that can be recreated in reasonable time, it's normal to take less care than when it comes to originals.

The only concern is Google Drive as the only backup, please make sure you have a local copy on a local RAID drive and another one regularly archived and stored in a bank locker.


As everyone else has already pointed out, better testing would have been very useful here. For instance, print(len(our_ids)) would have been a dead giveaway that that something was up

I am also a junior dev and completely empathize with being given a lot of responsibility and potentially messing up. I think for someone with < 1 year of experience, to solve the problems you created as fast as you did is really impressive. Thankfully your story ends well :)


The product I work on, I can watch the events occur afterwards (videos of people using it) and it's so embarrassing watching it fail. The wasted time. Ahh... I've gotten better to check deps and run a full automated E2E test everytime new code is deployed (before/after diff envs).

Still things happen. Hopefully you have a large enough client base where some bad experience doesn't define the whole thing.


For many years I have had a private blog. I like to write but realised 99% of us are not interesting to read. This is a young guy processing his thoughts. Not "teaching" the rest of us as he frames it. This should have stayed in-house and personal. The company can then decide which clients, authorities to contact if necessary. There is a book in all of us as they say. For most of us it should stay there.


Experience is directly proportional to the amount of equipment ruined or data lost.

Even though you were fortunate not to lose any data, you gained a lot of experience!


A great success story as far as I'm concerned, even if it doesn't reflect well on Vimeo support. But a good reminder to have someone doublecheck your logic if you aim to delete massive amounts of data from production. And to check if the backups are working (producing restorable data) on a regular basis. Sometimes they just seem to be working, as I have learned the hard way...


I'm currently working with FOIA software, and a regular user can only delete one document at a time from the information that they verify/redact before sending out. They can't even multi select! Only an admin can delete multiple documents at one time.

I'm guessing users accidentally deleted multiple documents one too many times, and now it's baked in.


Not completely off topic (as one of my scripts deleted files recently which dates were off by one):

> Fri May 06 2022

> I'm currently working [...] in Italy


Mistakes happen. Kudos to the author on taking it as a learning opportunity. I am friends with a lot of smart devs, and many of them have dropped a production db at least once, and if not then, then accidentally emailed 10k people …etc. It happens. Work to avoid it, but plan for what to do when it inevitably happens. ¯\_(ツ)_/¯


Related: is there any HTTP API model that supports transactions with commit and rollback? Also isolation levels? Usually one wants to set_stock(get_stock() + 10) but there may be competing from various clients between both calls, resulting in races. Usual web APIs seem vulnerable to this.


Wouldn't the model be to expose an increment_stock(10) type HTTP endpoint instead, and the backend can ensure it's atomic?



His solution reminds me of how I used Cypress to generate test accounts on our local admin dashboard for Cypress tests, since our api was inadequate (it didn't do the billing signoff required to create accounts that last longer than a month... don't ask...).


I accidentally deleted a printer from the printserver by using a python script. The docs weren't exactly clear, so i thought it would only remove the local printer connection. After reading this post i feel better now. My fuckup wasn't that bad in comparison. :)


in my opinion any process that isn't preceded by another identical and automated process that varies only by the data involved is very risky to do in production. your management hopefully had a big reality check? or not because of backups?


> Some of the things that might seem obvious to some might not be so for me, thanks!

> my mind thought that url would refresh itself as soon as the page variable changed

This is what I thought too when I read the code. I don't think it's obvious at all!


That's actually surprising to me. In most languages that I've worked with, strings are immutable so the fact that url doesn't update is more obvious to me and I'd be surprised if it did update.


> .. physically backed up in a Google Drive folder ...

That's not what a physical backup means.


I am also a junior with 1 year’s experience, just in Python but none with the requests module or web development. If the ‘page’ variable is being changed, was the error something specific to this module, not refreshing the page?


This wouldn't be an issues if providers like Vimeo would soft delete and hard delete the items after a period of time, allowing recovery between.

Everywhere I have to implement a delete operation, I never hard delete data on first call.



f for format ("formatted string").

It does the same thing as `https://api.ourservice.com/media?page${page}&step=100` [sic] in Javascript, or "https://api.ourservice.com/media?page$page&step=100" in Bash, PHP, Perl or Groovy (and other languages). It outs you into variable substitution / interpolation in the string literal.

In Python these string literals are called f-strings if you want to look it up. They are defined in PEP 498 - Literal String Interpolation [1] and available since Python 3.6.

[1] https://peps.python.org/pep-0498/

[sic] there probably would be a missing '=' in this url after "?page"


It's a Python f-string [0]. A way of formatting a string by directly including a Python expression between curly braces.

[0] https://docs.python.org/3/tutorial/inputoutput.html#tut-f-st...


"f-strings", a (new) way to format strings.


if it's python, it's the formatting/interpolation string marker.


Nice work :D I tend to always add a `--dryrun` flag to any scripts like this these days so that when we move it to production we can run an extra test there just to be sure.


The company was lucky to have someone like you that could actually sort out real problems efficiently. I would bring up this story when negotiating for a raise.


Always do a dry run when deleting many things with code.

- Captain Obvious


fwiw I would probably have turned to rclone.org for this. It doesn't have support for vimeo out of the box but the Vimeo API seems sane enough that it would be trivial to implement uploads quickly.

Previously used rclone for doing massive transfers between cloud providers using "cheap" on-demand servers which provide unlimited data transfer (the public clouds make this very expensive).


Everyone makes mistakes, juniors and seniors alike, but I consider you have the right mindset and resolutive skills that will make you thrive :)


So much wisdom in these comments, people have different styles of being careful, and each makes sense in a nuclear "go" situation


A computer lets you make more mistakes faster than any invention in human history, with the possible exceptions of handguns and tequila.


Imagine coding while drinking tequila...


But are you a junior dev with less than one year of experience working by yourself alone at a company? No tech lead/help?


when doing migrations/conversions I always write a script in dry-run mode first. I exhaustively check the results to make sure they are expected. Then try to do a real conversion/transfer of only the 1st file and make sure that worked. Then do a couple more. Etc. Only then do I feel confident to do the whole thing.


So, apparently, vimeo has better support than youtube (not informative, but at least they DO something). Duly noted.


You can automate using puppeteer or selenium


The author used Playwright in the end to automate uploads. Using e2e tools for automating tasks is clever, I'm not sure I would've thought of it.


It's clever, but also brittle. And might have disastrous error conditions (like hitting "Delete" instead of "Continue" if the wrong UI part has focus).


> I Accidentally Deleted 7TB of Videos ...

Spoiler:

But there was a backup that could be reuploaded in time and everything was fine in the end.


The conclusion should include that backup at separate locations is key. Also, that the backups are tested and work. I worked with clients that had everything from lightning strikes destroying servers to ransomware to people making mistakes. No problem with solid backups. There is a difference between a good process and skill.


Would you have had the courage to post this here if you hadn’t been able to fix it?


Junior Dev: "I'm under an NDA"

Also Junior Dev: "Here's my source code"


Unless you're Oracle that code is hardly critical to the business.

Even as a Sr Dev I'd share stuff like that, it's code that'd appear on a stack overflow post anyway.


Is 7TB a lot? Peers at personal arrays at orders of magnitude greater.


Related: The change is fine, it's only one line.


Now you learned what a backup is.


How can any enterprise only rely on such online services and not keep copies of their job on their own storage ?

At least store in large TB hard disks connected with a SATA adapter when needed, and put them in a case in a safe place (better: two copies, stored in two places). What is the HD + copy time price relatively to production work ?


scary. maybe as well just pay vimeo to restore data.


[deleted]


I'm baffled by this too. Unnecessary bridge burning I'd call it.

It's not even necessary to the story.


Its explained on the first line: " I'm a Junior Developer with less than one year of actual experience. Some of the things that might seem obvious to some might not be so for me". I guess it applies to this, too, not just the technical aspects.


I might've missed it, but I don't think that line existed when this was first posted.


You're right and I edited the company's name (might be too late but better this way). That said I'm not very happy with the experience of working for TheCompanyTM anyways so I'm in the process of switching jobs.

Thanks for the comment :)


Talking bad about your employer is great for finding a new job. Companies are eager to hire people who bad-talk them.


He doesn't talk bad about his employer. He talks bad about his employers client.


Tech is like any other human endeavor. People talk. People change jobs and still like the people in the place they left.


Yes exactly. Which is why I wouldn't touch anyone who has no criticism for the systems OR culture of a place they've been before.

Nowhere is perfect. If people can't be honest about the flaws then they're useless.


Of course you can hire whomever you want. I would hire someone who has criticism about what he had done in the past and what they have learned. Nobody is perfect. But people with no self reflection blaming others and their employer? No thank you.


As sibling comments indicate, I would advise emailing HN mods to take this post down and remove it from your blog and post it on an anonymous one. Here are the problems you will face:

1) Your current blog has your current employer + client linked to it. 2) Your github has your real name. 3) All of these have been crawled/archived.

None of this bodes well for your career in the future. While I think your blog post is a great war story, it's really not a good idea to post it on your main account which can be traced back to your real name and CV because it will come up the next time you apply for a job.

Unfortunately, even if it illustrates a great deal of ingenuity and creativity on your part in fixing a mess you made, many folks will take one look at it and be judgmental. You have to manage your reputation online and be careful.


I would take down the post entirelly.

Your current job is linked in your CV.


And try emailing the hackernews mods asking them to take this post down.


You're welcome and good luck!


What negativity and arrogance in the comments here. Jeez, it's like no one HN ever made a mistake, a bunch of 10xers ninja programmers here. Please read this:

>I also want to preface this whole post by saying that I'm a Junior Developer with less than one year of actual experience. Some of the things that might seem obvious to some might not be so for me, thanks!

It's just some kid sharing a mistake they made and owning up. Ease up on the "LOL what an idiot" attitude


I was actually really impressed with this individual! For someone who has less than a year of experience, they're showing quite a bit of initiative, drive, and curiosity - which really are what make or break engineers as they develop. Taking the time to do a blog post (effectively a post-mortem) and share it is even better!

And yes - I've literally done this exact same error (with TB of video data!). Spending the following week remediating all of that data loss was a great lesson in patience and attention to detail. :-)

OP: If you're ever looking for a job be sure to send me a message. Contact info in profile.


Wrt "less than one year of experience", looking at Nikita's CV and GitHub, despite the title, they aren't really a junior developer :)


True, he's been teaching programming since at least 2018, I was in a similar boat where I'd been programming for almost 5-7 years for fun and profit before my first official fulltime job.


My mistake was on floppy disc with source code, other text files and images. Was hand editing (in hex disc editor) the floppy to get back the data, sector by sector. Fun times. Not going back there though :-)


Mine was a DELETE FROM Users; WHERE... Fun was had.


Usually the recommendation is to not start writing the DELETE query first. Write the SELECT query first and see the results. If you miss the WHERE clause, you will see that immediately. Then change SELECT * to DELETE. But I assume you have learned that lesson already :)


Yes, but it can't be stressed enough, always the first time for someone.


I think it was a great post. Reveals a knack for clarity in explanations. The mistake is simple enough and natural for a junior. If it were just one video or something, it would probably not even be noteworthy. I think the developer learned from the incident too. So all good.

I do think Vimeo was irresponsible in the whine affair though.


More importantly, this person is helping us learn from their mistake. This is something that should be encouraged, not mocked.


there's an argument that the best people around are the people who have already (or almost) made some big mistakes.

I have made a couple of huge ones - luckily I kept my job


When interviewing candidates I always enquire about their professional mistakes. Their reply often is the decider between hiring/rejecting.

I want to have colleagues who admit fault, be truthful about actions which lead to the issue, and learn from it. The learning includes organisations perhaps putting additional measures in place to prevent future issues.

One candidate told of a story how he was On-Call early in his career and was told situations happened so rarely, just to continue living life as normal.

Unfortunately for him, his pager went off at 02:00am while he was high as a kite on drugs - but felt he had to take action (mostly due to arrogance!).

He promptly deleted production data and things only got worse when he tried to rectify the situation.

Of course he was fired for his actions but ever since he's been stone cold sober when on-call.... just in case.

He learned a valuable lesson about professional responsibilities.


>When interviewing candidates I always enquire about their professional mistakes.

"You see, my biggest mistake was programming in the first place! Since then, it's just been an apology tour"


Don't fire for the mistake. Fire for the inability of someone to own it, cover it up, or point fingers at others.


His honesty of admitting to being off his nut while on-call led to his firing, not the action of deleting things.


>His honesty of admitting to being off his nut

This now my favorite euphemism for being high


It’s funny how so many managers on this board are like, yeah I focus disproportionately much on this one factor. Why? Because my intuition and experience says so.


I currently have about 12 years of experience, and a few years back I accidentally cleaned up GitLab's database a bit too well. I wouldn't be surprised if the people being dismissive simply never worked on a moderately complex and large system, and thus don't understand how easy it is to make these kinds of mistakes.


I’m impressed by their commitment to automation. If that was me, once I realized that manually uploading from Gdrive to Vimeo would fix the problem, I probably would have just committed myself to manually doing that all weekend. It would feel safer and serve as a sort of penance for screwing up the automation the first time.

But nope, they went right back to scripting and got it done.


LOL!

I have multiple years of experience than this man and still I could *very* *too* *easily* make a 7Tb mistake (or likely more :P )


This sort of mistake happens all the time when you write in multiple languages. A key solution is code review, a standard practice which doesn't seem to have happened here (and certainly isn't the fault of a junior).


Just to be fair also to some commenters, I think that the post had been edited after posting from what I remember ... so maybe the older comments are not very relevant.


To clarify, I only removed the company name and added the top disclaimer


I have done a lot of such blunders myself. Accidentally deleted my unchecked code and had to re-write everything from memory.

I envy those who claim to do no mistakes at all.


Don't envy them - they are deluding themselves.


I’ve been there! At least when you write it the second time it goes more quickly.


"What does this teach us? Well, it teaches me to do more diverse tests when doing destructive operations. It also should probably teach something to Vimeo and to my contractor but I doubt it will (and yes, the upload for some reason is still manual to this day. Go figure!)"

So you wrote bad code, didn't test it properly, ran it on production on the Friday before a release and are blaming Vimeo and [name redacted]?

And your resolution was yet another cobbled together script that you probably didn't test?

This isn't a great article to have attached your name to


I'd hire this guy if only being for this frank about his mistake. He owned it and that is what I would look for.

After deletion, what should he have done? Postpone the go-live? That's often not a a cost-effective option. As for a risk-analysis the worst what could happen was deletion of the remaining videos. I don't think that that makes big difference in this situation. And to do the right thing, you have to have the infrastructure in place, if you are in a hurry. I doubt that's the case for a 10 heads shop.


Agree 100%. Acknowledged mistake, moved forward to find a solution. Reflected on lessons learned. Shared valuable lesson.

To me this indicates intelligence, competence, integrity, grit and generosity. TechnicL proficiency is much easier to come by than integrity, grit and generosity. I would trust the author to deliver on commitments.


Aye, this is how you learn and make sure it doesn't happen again.

I did a similar thing ~20 years ago when I first started my career, accidentally deleting a production database because I thought I was working on the test database.

I owned it, learned lessons from it, and it's never happened again.


Owning the mistake would be fine if he did that - he did'nt. He blamed the company he was contracting for. That's a big no from me


It's as if we read different articles. He literally writes that he made "A series of mistakes that could've probably been easily prevented."


I'm sorry if it came off like that. The mistake in this case was completely mine (bad code and bad testing). The detour on the other two companies was mostly because this way of deleting/recovering stuff should've probably been avoided in the first place, other than that I'm absolutely not blaming anyone else!


Don't worry about all that - there isn't a developer worth their salt that hasn't made a mistake. But I'd consider having this blog post and HN post retracted purely for future internet checks. It isn't a reflection on you, and your honesty is fantastic. But there is a lot to be said about using a pseudonym when it comes this close to your employers


I'd probably make your github profile private for a while as well. Or at least removing your real name from it.


Agreed. But I’d also fire him from this job.


Doesn't make sense. Their employer literally paid them to learn from their mistake.

Now, you think they should be fired? So that another employer rips the benefits of that learning experience.


"Recently, I was asked if I was going to fire an employee who made a mistake that cost the company $600,000. No, I replied, I just spent $600,000 training him. Why would I want somebody to hire his experience?"

-- Thomas J. Watson


For having got into a sticky situation and out of it?


Will every developer who has never checked in bad code on Friday, or accidentally deleted the wrong data, please raise their hand?

‘Judgment comes from experience, and experience comes from poor judgment.’

:-)


Not to mention that he _deleted_, but not _lost_ videos. Nothing to see here.


Vimeo completed a major migration of videos between accounts with no confirmation or communication before commiting it, then refused to reverse the change. Hardly the best service.

The article hardly comes across as 'blaming' them for the core issue but they were definitely not helpful.


> This isn't a great article to have attached your name to

A million times better than your comment.


All I did was give advice. If you don't like it it's fine.


Earlier in the article, the author does call out that it's bad code, so he's not entirely blaming these companies. Anyway: You should not be afraid of thinking about what each party could have done better. Not just yourself, but other people too. When I look back on times where I only blamed myself for prod issues, it was less of a learning experience, and more focused on beating myself up for no good reason. That approach shows that I'm afraid of the consequences, and it's an effective way to feel isolated from the team instead of improving.


Better to do it before the release then afterwards. I'm assuming this way nobody noticed the issue.

Also, would you rather everyone only ever posted about all the times they were successful?


(Since the OP redacted the company name from the post, I've done the same in your comment here. I hope that's ok.)

(We do this sort of thing to protect users, usually as the result of an emailed request, and you can tell when we've done it because of the word 'redacted' in square brackets.)


Oof, we wouldn't work well together. Very rarely is someone good enough to be this obnoxious.


I very much doubt you would ever work with or for me.


So, “i am under NDA” but I reveal my client’s name and a lot of sensitive details about what we are doing. LOL.


Where do you see the clients name? I only see Vimeo being mentioned.



Got it. To be honest I'd be hesitant to publish a blog post like that with your name + current company name attached to it.

It's a bit different to share a fun story a few years later about that time you almost wiped production.


It still breaks the NDA:

* Firstly, you don't have to name the company to break the NDA anyway (you are still disclosing information you aren't supposed to disclose regardless of if it can be linked back to the company).

* Secondly, the client is still named on the front page of the website.

* Thirdly, OP posted this with his real name that trivially links back to the dev shop he is working for. The site also has his CV which lists the client again, with a description of the project to link it to the post.

* Finally, The client can trivially be identified by googling the description in the second paragraph (i.e. just search the named countries in operation plus the word Gym).


Not all NDAs have the same terms. I could write up and serve an NDA right now that still counts as an NDA yet permits everything in your list.


All contracts vary in terms, but I've never seen an NDA that says "you can talk about the content under NDA as long as you don't mention the businesses name, and just identify who they are in a roundabout way instead".

"Well i'm under an NDA, so I can tell you all the specifics of the project, but I can't tell you the companies name. I can say they own the largest search engine though, and have a market cap of 1.5 trillion, and rhyme with "Roogle", but I really can't say who they are. Anyway, here is some code I wrote for them and a description of how we nearly ruined their project along with me calling them incompetent..."


Well at least deleting the secret is a step back toward the NDA he left behind.


Under NDA but I'll give rough details of what's occurring while also naming my client and disparaging them to the public.

Well that's a brave move...


They said they are a junior developer with not much experience. I'm afraid they may not know what is and isn't covered under NDA.


My tip would be: read what you sign.


Just to clarify, my company is under an NDA and not personally me. It also encompasses only the actual project details so a post like this is legally compliant. (Not a lawyer, might be wrong)


In every contract I've ever signed, part of the NDA clause with my employer is that I'm also bound by NDA's my employer is bound by, so if the employer signs an NDA with a customer, I would also be bound by that. It might be worth checking your contract, otherwise having a company sign an NDA doesn't hold much weight if their staff are free to go around sharing the information themselves.


So you're not under an NDA as you wrote.

I don't know your position but I would assume a NDA is part of your freelancer or employee contract.


OP might at least want to consult with a contract lawyer in Italy to make sure.


You likely have a confidentiality clause in your contract.

If your company is under an NDA, your company will have an obligation to ensure that you also do not disclose information.

Companies are mostly just collections of people, and an NDA is mostly meant to stop people working on the project from talking about the project.


There's a thing called unit tests.


Just a note: being able to click yourself a server at Google, AWS etc. Might be cheap enough even paying for 15tb of traffic.


ZFS -> Snapshot....always!! Before touching writable-data (my personal mantra) ;)


I love ZFS too but that's not really relevant to this discussion because the deleted items were on a video hosting platform and the company did already have local copies.


Yes and? Make a snapshot on live. Again, never touch data before snapshot.


At risk of sounding snarky, you do understand how video hosting platforms work? Customers, even enterprise ones, don’t have shell access let alone control over what file system is used.

There are a hundred ways this problem could have been prevented but ZFS isn’t one of them.


>you do understand how video hosting platforms work?

No, no i don't.


This reminds of some IRC threads. You post a question and someone's answer assumes you are going to rip out and replace your existing prod setup just so you can use their pet tool.


Pff, there are thousands of system's and filesystems that are capable to make snapshots, even a shadow disk from VM370 (1982) could be seen as one.


Controversial opinion: And this is why block syntax by white space is not for production.


This is hardly a whitespace issue


Ah, yes, I just noticed the difference in indentation. In actuality, the error about the mental model of variable states.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: