Not only is a problem that deleting a bucket costs money, but if you have a big bucket with many deeply nested files, it can take a really long time to clean it up using the AWS command line.
I ran into this with a bucket full of EMR log files a few years ago and had to figure out some pretty crazy command line hackiness, plus running on a EC2 machine with lots of cores to figure it out. This a write-up I did if anyone else ever runs into this issue.
These days the "easy" way to delete a bucket of a billion tiny files is to configure a very short-term expiration rule on the bucket, and let AWS itself delete all your files as they expire.
When we did this (a few years ago) it still took several days for it to remove all the files.
I believe the nesting shouldn’t affect it. When you’re iterating over objects to delete them, you can just iterate over the keys and ignore nesting—I believe that’s how the s3 tools do “recursive” deleting. The underlying S3 API provides a recursive interface (the “delimiter” parameter) but keys are really just string keys, and directories are illusory.
But yes, it can take a while to iterate through the objects.
It won't effect it, because there is no concept of nesting in an object storage engine like S3. Everything is a flat key that references an object, but we just abstract and conceptualize a directory structure because it makes it easier for us to manage our data.
But in reality you just have a really, really long list of keys and a big, flat file system of objects.
Yup, similar experience.
Our devs kept using S3 as a caching backend for some small pictures. Only based on billing, we learned that we had over 17TB in tiny files, unable to groom it in any way that was feasible. Kept hitting all sorts of api limits.
I ran into this with a bucket full of EMR log files a few years ago and had to figure out some pretty crazy command line hackiness, plus running on a EC2 machine with lots of cores to figure it out. This a write-up I did if anyone else ever runs into this issue.
https://gist.github.com/michael-erasmus/6a5acddcb56548874ffe...