Hacker News new | past | comments | ask | show | jobs | submit login
Deleting an S3 Bucket Costs Money (cloudcasts.io)
290 points by fideloper on Oct 19, 2021 | hide | past | favorite | 145 comments



You can use lifecycle policies to delete it for free, but its best to confirm it via support. Not saying this is the great way, maybe its intentionally hidden, but at least there is a way.

https://stackoverflow.com/questions/59170391/s3-lifecycle-ex...


Yes, this was mentioned as the preferred method in the article. As the article states, it's free for objects in the standard storage tier but will incur a Transition cost for other tiers. It's not hidden, but it's not exactly advertised as a way to empty a bucket.


I'm an AWS Solutions Architect and I was helping a customer with the same issue as in the article a couple of months ago.

What I found out when I researched it is that there is a subtle difference between using lifecycles to move objects to other storage classes and for deleting objects: deletions are not transitions, they are expirations – and expirations are free. I submitted a clarification to the S3 documentation and now it says "You are not charged for expiration or the storage time associated with an object that has expired." (https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecy...)

If you have objects in IA or Glacier there is a minimum duration you're charged for, but there will be no extra charges for expiring these objects.


Thanks for the clarification. Since you looked into this recently, could you elaborate on any possible scenarios when S3 Intelligent Tiering would cost more than S3 Standard? All other storage classes have gotchas built in that can cost more if you're not careful, but I'm thinking Intelligent Tiering might be friendly enough to set it up on all buckets/objects. Are there situations where this is not advisable?


Intelligent Tiering no longer has the 30 day and 128KB limit[0], but it does still have a "monitoring and automation" charge of $0.0025 per 1,000 objects[1] for the objects larger than 128KB. This may be significant if you have a very large number of objects. Potentially you could pay more than Standard if your objects all end up staying in the frequent access tier.

[0] https://aws.amazon.com/about-aws/whats-new/2021/09/amazon-s3...

[1] https://aws.amazon.com/s3/pricing/?nc=sn&loc=4


This is not a "Transition cost", because it is not a transition lifecycle policy, rather an expiration lifecycle policy.

"Minimum storage duration charge" along with "Minimum capacity charge per object" is a property of an S3 Storage Class, not a lifecycle policy. Therefore this cost needs to be considered before selecting a storage class.

The charges for deleting before the minimum storage duration are documented for each storage class at https://docs.aws.amazon.com/AmazonS3/latest/userguide/storag...

Refer to the "Performance across the S3 Storage Classes" table at https://aws.amazon.com/s3/storage-classes/


There's some caveats to that...

"If you create an S3 Lifecycle expiration rule that causes objects that have been in S3 Standard-IA or S3 One Zone-IA storage for less than 30 days to expire, you are charged for 30 days"

It goes on to say 90 days for Glacier and 180 days for Glacier Deep.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecy...


That isn't a property of using the lifecycle policy, though. It would be the same if you manually deleted the files.


Lifecycle rules are the preferred (cheapest) way to do it for standard-storage objects in S3 for sure!


Not only is a problem that deleting a bucket costs money, but if you have a big bucket with many deeply nested files, it can take a really long time to clean it up using the AWS command line.

I ran into this with a bucket full of EMR log files a few years ago and had to figure out some pretty crazy command line hackiness, plus running on a EC2 machine with lots of cores to figure it out. This a write-up I did if anyone else ever runs into this issue.

https://gist.github.com/michael-erasmus/6a5acddcb56548874ffe...


These days the "easy" way to delete a bucket of a billion tiny files is to configure a very short-term expiration rule on the bucket, and let AWS itself delete all your files as they expire.

When we did this (a few years ago) it still took several days for it to remove all the files.


I believe the nesting shouldn’t affect it. When you’re iterating over objects to delete them, you can just iterate over the keys and ignore nesting—I believe that’s how the s3 tools do “recursive” deleting. The underlying S3 API provides a recursive interface (the “delimiter” parameter) but keys are really just string keys, and directories are illusory.

But yes, it can take a while to iterate through the objects.


> I believe the nesting shouldn’t affect it

It won't effect it, because there is no concept of nesting in an object storage engine like S3. Everything is a flat key that references an object, but we just abstract and conceptualize a directory structure because it makes it easier for us to manage our data.

But in reality you just have a really, really long list of keys and a big, flat file system of objects.


Yup, similar experience. Our devs kept using S3 as a caching backend for some small pictures. Only based on billing, we learned that we had over 17TB in tiny files, unable to groom it in any way that was feasible. Kept hitting all sorts of api limits.


Per-object costs can be tricky with S3 -- it's easy to mentally round costs less than 1/10th of a penny to zero, and then look up a few years later and realize you have hundreds of millions of things and can't afford to do anything with them.

When this bit us on a project I made a tool to solve our particular problem, which tars files, writes csv indexes, and can fetch individual files from the tars if need be.[1] Running on millions of files was janky enough that I also ended up scripting an orchestrator to repeatedly attempt each step of the pipeline.[2] Not tested on data other than ours but could be a useful starting point.

[1] https://github.com/harvard-lil/s3mothball [2] https://github.com/harvard-lil/mothball_pipeline


And deleting your AWS account will keep billing you [1] if you don’t delete all resources first.

AWS is designed to extract dollars from big enterprise contracts.

Also interesting from the article, this poor soul on StackOverflow was trying to figure out how to delete a bucket that would cost him $20,000 [2]. Can’t delete, can’t close.

[1] https://www.reddit.com/r/aws/comments/j5nh4w/ive_deleted_my_...

[2] https://stackoverflow.com/questions/54255990/cheapest-way-to...


A long time ago, we deleted/disabled our aws account and I assumed that all my files on s3 would also get deleted.

When we reactivated the account a few yearss later, we were retroactively billed for all the files in the s3 bucket. We got the money back though.


That sounds like a great long term “shit hits the fan” backup idea:

1) Upload an encrypted blob to an S3 unique AWS account with a burner credit card.

2) Cancel the account.

3) If you ever need to restore the data, restore the account and pay the difference.

Since uploads are free you just do this to a new account every so often and you’ll only need to pay the time difference of the most recent backup!


Past performance does not guarantee future behaviour. Just because S3 did this at one stage means no guarantee they'll act that way in the future. That's not a bet you should be making with important data.


This defeats the purpose of a backup because they could delete your data at any time.


Yeah, but the idea of a backup service where you only pay when you need to restore isn't bad :)


Glacier deep storage tier is 100x cheaper than standard tier and is guaranteed to work in a SHTF scenario.


Yikes. Do you mind giving a ballpark on how much you owed?


Amazon has threatened via email to be sent to collections for 1$.... over s3 bucket I had no access to for 3 years.


they closed my retail amazon.com account because I (thought so at least) closed my AWS account and never received another bill. One day I just lost access to shop on amazon. It was an ordeal to get it back - I would have just written it off but I have TV and Movie purchases I didn't want to throw away.

Scummy company


Same! I’ve got an email a month saying they are going to close my account…


They did this to me for $0.02


PayPal did this to me over like $1.50

But I had moved to Singapore. My PayPal account was in Australia. When trying to pay using my Singapore card it got flagged as fraud. I called and there was no way to pay. In the end I created a 2nd account in singapore. Added $2 credit to my account. Transferred it to the other account. Then closed both. Ive not used PayPal in about… 6 years now? Scam company with unhelpful support.


Given that they don’t even collect amounts below 1 USD, I’m having a hard time believing this.


I've had to ask my credit card company to block AWS charging an account that I can't remember the login to.

Amazon can keep trying to get blood from a stone...


PM at AWS, writing on my own personal behalf.

Sorry to hear that you had a bad experience. That's definitely not the culture that I'm experiencing every day or that we hold ourselves to. Our goal is to deliver a ton of value to customers. If we are not delivering value then it would not be in line with how we operate to want to charge you for that.

Out of curiosity, did you try contacting AWS support? Almost all customers I speak to love our support and the fact they can speak to an actual human. Of course I can't speak to the actual case as I'm just providing my personal opinion here, but I would not be surprised if support could fix this for you quickly and zero out any small balances that you accrued because you weren't able to access/use your account.


With all due respect, it has nothing to do with culture and more to do with business processes having rough edges. At AWS' scale, any rough edge is more than a rare edge case, since rarity flies out the window.

Spending more to provide better customer service (explicitly or implicitly through bugfixes) is not generally how this type of need is. Folks shouldn't need to post their grievances to a very public site to be heard, and that it needs to happen to get traction on an issue should be taken as a clear gap in customer service provision.


That's kind of his point though, OP did not need to post to HN, he can just contact AWS Support and get the bucket deleted at (probably) no charge.

Support will routinely give forgive an account balance given a good enough reason.


From the comment:

> I've had to ask my credit card company to block AWS charging an account that I can't remember the login to.

We don't know whether OP has reached out to AWS support, but one can assume so since AWS is charging an account the user cannot log in to.

Further, it sounds like the user has found their own resolution, so the user isn't asking for _help_ on HN, more that they are registering poor experience in agreement with others. So, it is a rough UX edge.


You wrote:

> Folks shouldn't need to post their grievances to a very public site to be heard, and that it needs to happen to get traction on an issue should be taken as a clear gap in customer service provision.

Yet by your very own comment you say that we don't know if he actually tried to contact support so there is no "clear gap in customer service provision".

The fact is, anyone who actually interacted with an AWS rep would tell you that they would delete the bucket and not charge you for it.


From my own experience, no you can't. All links to cknta t support require you to have an active AWS account. That defeats the purpose. YMMV as in maybe it's different today but that was my experience. And I was just trying to them to stop sending me spam after I had closed out my account.


It's weird to see hackers here posting this as AWS is literally the only web scale company I can actually talk to a person at to resolve rough edges.

This is both on retail side and AWS side.

My only request - do some more automated discovery and remediation tooling for VPC-classic accounts! It's annoying to figure out what needs to be handled there - give me a dashboard for this.


I'd really be curious how much money AWS makes on zero-usage accounts where someone spun up something and then forgot how to log in or delete it.

I've been paying $10/month for over a year on an EC2 (I think?) instance I spun up, and I still haven't had the time to go in and find it and delete it. I've tried several times (and even disputed the credit card charge one month).

I'll probably just end up cancelling this credit card. That would be a lot easier than figuring out how to stop AWS from charging me.


> I'll probably just end up cancelling this credit card. That would be a lot easier than figuring out how to stop AWS from charging me.

Nooooo no no no that is not true. If you cancel, this debt will follow you on your credit history and via collectors. Canceling a CC does not waive you of responsibility.

Just contact AWS support. Tell them your story. They’ll get your account closed and they’ll probably forgive whatever outstanding bill you have.

Please don’t just run away from the bill, you’re just setting yourself up for pain later.


I've actually contacted them via email multiple times, and they never respond.

At this point, they're legally committing fraud. I don't see how closing a credit card that has regular fraudulent transactions is wrong, or likely to cause problems later on.


They are absolutely not commiting fraud - that is a total lie.

If some random wants to contact amazon to delete our business AWS account they need to reject that as well.

I would login, open a account and billing support case.

Do Account / Close and cancel my account

They will give you a chat or call back option in most cases. The call back option has worked well for me.


> I've actually contacted them via email multiple times, and they never respond.

> They are absolutely not commiting fraud - that is a total lie.

Lie isn't the word you're looking for when you disagree with someone.

If they never respond they aren't engaging with the OP to figure out what this person want's. If the OP even contested the charges AWS has been warned that someone isn't happy with the contract. Both by personal email and by the credit card company.


> They are absolutely not commiting fraud - that is a total lie.

If I write an email to them asking them to cancel an account, doesn't that constitute legal termination of my authorization to continue service or charge me?

Unless the service agreement I signed specifies a specific way to cancel an account - in that case my agreement to it constitutes authorization unless I jump through their hoops. Argh.


No, it's in your agreement.

Remember, canceling your account means everything (EVERYTHING) is deleted. They are far far more concerned that someone will randomly email them pretending to be the CTO of some startup or you, and they will blow away 3 years of someone's work.

Canceling your account even cancels glacier, WORM records, object and compliance lock data etc.

Everything I've seen says that AWS rightfully biases towards retention unless the delete request is very clear - and email will never rise to that level (nor should it!).

So you have to login and close your account yourself.

How you get from a very reasonable business practice here (and they are frankly one of the easiest of the major players to actually talk to) to fraud... is a stretch.

Edit: For those not familiar with AWS account closure here are the steps:

https://aws.amazon.com/premiumsupport/knowledge-center/close...


I'd login and go look at your most recent bill, this should bring up the bill from September 2021, for you [1].

That should give you a breakdown of where the charges are coming from, specifically which region an EC2 instance might be running in. Once you can figure out which region it's in, you can go to the EC2 dashboard for that region [2] and start terminating instances.

Having said this, AWS should seriously have a way to say "close this account and delete/stop everything associated with it." Having to spelunk through bills and the AWS console to figure out what you're getting charged for is a joke.

[1]: https://console.aws.amazon.com/billing/home?#/bills?year=202...

[2]: https://us-west-2.console.aws.amazon.com/ec2/v2/home?region=...: (update both us-west-2 with whatever region you're looking for)


Thanks! This is really helpful. Looks like it was a WorkSpace I spun up... I have no idea why I did that.


How hard is it to stop an EC2 instance? It's literally 3 or 4 clicks from when you login. As for forgetting how to login, if I give some company my credit card info, I damn well make sure I know how to login when I need to.


They can still theoretically send your bills to collections and ruin your credit


Why can't we bill them for wasted time and stolen money, send those bills to collection and ruin their credit? Something seems unfair about the power dynamic.


Well, you can document your attempts to close the account and report it to the police. It's just a lot of work and you will only gain a document you can use to press your credit card issuer into rejecting the payments.

Also, if they do send your debit to collectors, you can go to a court and ask for the debit to be dropped and for them to pay damages. That is a lot more work, but if they are clearly on the wrong will get you something.


Or take you to court.

Theoretically. I don't know in what cases Amazon might do either of these things. If you're not in the US, that's certainly another layer of barrier.

But yes, in the US anyway, whether you legally owe someone money or not and they can legally collect on it is not controlled by whether you've cancelled a credit card.


Another reason that I'm very hesitant to run anything on Amazon.


I'm completely amazed that such a large group of developers have decided to put their personal credit card on Amazon. Foolish choices lead to bad outcomes.

When does billing by the hour for storage/everything start to seem appealing?


I wanted to use ECS/Fargate earlier this year and the deploy kept failing. It was an issue on my end, but the infuriating thing was that Cloudwatch didn't have any logs 80% of the time. Like, at random: I could click the redeploy button 5 times and 4 of those times wouldn't have logs while one would. Of course, they charged me per deploy. I ended up running up a bill of $100 before I said screw it and used DigitalOcean instead (where I had logs consistently and was able to debug my container deploy within about 10 minutes and only a few cents of a bill).


I had a similar issue when moving our stack to ECS Fargate. logs wouldn’t tell us anything other than ‘application was terminated’ within seconds of it booting. The most helpful thing was looking at the reason given for the container shutting down(which still was a difficult riddle for a Fargate noob).

In our case, it was a message saying something like ‘container timed out’ originally making us think it was a network issue. We ended up tracing it back to our app sometimes taking 21-30 seconds to fully boot, instead of the 20 second limit Fargate had for health checks. So even though the container did end up booting, Fargate was already waiting for it to start so it could kill it.


Yeah, my issue was something similar. When I moved it to DigitalOcean, I was getting logs consistently (from my application) and I traced it to something that was misconfigured on my end, which was causing health checks to fail. It took a total of about 10 minutes to track and fix on DO, while I had already spent 2 days (and the $100) trying to figure it out on AWS.

I mean, the issue was my fault, but AWS made it incredibly painful to figure out and fix and was not helpful at all. I decided that day that I will never use AWS again, unless I have someone with a lot of AWS experience on the team (and even then, only if I have a good reason to use AWS, in my case there is a reason why AWS would perhaps be desirable in the future but not enough to go through the pain again myself).


There's nothing particular about Amazon here, any company or person you owe money to in the US (can't speak for other countries) can take you to court or turn your debt over to a collection agency.

I haven't heard of Amazon specifically ever doing either of these things.

Or am I missing what you're saying is particular to Amazon and relevant here?


And you use the Fair Credit Reporting Act to force them to validate the debt or cease collection activity. For small amounts they will not go through legal hoops.


Unless you provided a social security number to AWS at signup, going to be a pain for them to ruin your credit.


It's not hard to get a SSN from a name + address


Legally?


Often yes. And perhaps more importantly: you don't need a SSN to send a debt to collections or report it to a credit bureau (though of course it helps).


Legally shmegally. Just download one of the many troves of hacked databases. I'm sure you'll find the data you need somewhere. Or hire a 3rd party that does stuff on your behalf. Now you have plausible deniability in that you're the one not doing it.


How does you forgetting your password absolve you from paying for resources within your account?


From the answers on the SO, it looks like there are several ways to delete 2 billion objects that will be free or cheap. I think? Not sure. Which is part of why I comment here, if anyone else is.


www.privacy.com


Pricing of AWS services makes me uneasy in general, just take the S3 as an example - you go to the pricing page and you have several tabs with dozens of entries which makes calculating how much exactly will you pay difficult. I might be simple minded but I prefer a clearly defined plans with predetermined limits - you know exactly what it costs you each month and what you get and if you need more, just switch to a higher plan, no risk of nasty (and often expensive) surprises like mentioned in the article.


Its ALMOST like one of the most lucrative businesses in the world is optimizing extracting money from you rather than customer experience.


You could try their Lightsail object storage (and other lightsail services) for this type of pricing: https://aws.amazon.com/lightsail/pricing/


I didn‘t know they extended lightsail to more than VMs. In S3 every request costs money, but in lightsail object storage they are free and you pay bandwidth?


Problem is that customer pricing demands are infinitely granular, with i only want to pay for X that i use.

The alternative is the tiers which have their own problems.

Small Medium Big Call us

No one know if they are gonna have success in the beginning or need to scale up. Usually. I guess at a size like AWS it makes more sense to have those infinite pricing tiers and let the customers figure out how much they are willing to pay rather than try to negotiate with huge swaths of people that take sales staff salary to deal with.


This is why I've never used it. It always felt like I was getting 10 records for a penny.


Yup. And uploading / downloading large objects from S3 incurs tons of requests because S3 client does parallel chunking with a small number of other control requests. That client works on the same premise as SFTP client.

It’s amazing how often it retries.

Example from go sdk: https://github.com/aws/aws-sdk-go/blob/main/service/s3/s3man....


The first thing everyone who tries using cloud services should learn: everything costs money. Even the service that tells you how much it costs: https://aws.amazon.com/aws-cost-management/pricing/


I'm still learning about AWS, but I don't think other clouds do the same.

If there are still those who do not use the cloud, it is because the big three have taken advantage of their position a lot.

The pricing of Hetzner, CloudFlare, Linode, OVH, ... seems to be cheaper and more transparent.


Those small players aren't comparable to AWS. They might be sufficient for your needs, but the only players at AWS scale are Azure, Google, and Alibaba. Their prices and billing practices are basically identical.


Exactly, they have a commercial offer and a pricing more suited to large industries, where the scale is really needed. For a large industry, the cost of deleting a bucket is not even a thought, for an SMB or a developer it's a problem.

If you don't use special tools not available elsewhere, such as AWS' SageMaker, or Google's TPUs, ...., then it's probably not economically interesting to use the Amazon, Microsoft or Google clouds.


This post finally got my ass in gear to cancel an account that I thought I had closed but was still charging me a few dollars a month.

I spinned up an AWS instance to practice, and once I was done I thought I closed everything down.

Turns out I had just stopped my micro instances, and I didn't terminate them. I also hadn't released the my IP address. There was also a snapshot of the tiny db I had created still floating around. The documentation was a little confusing, so after I went through it I spent half an hour chatting with a support rep to make sure everything was completely good. After next month my last bill should go through and I should be free and clear. Unfortunately I have to wait for next months bill to go through as I can't just pay it all now.

This was mostly my fault for letting it go on for so long, but I hate how if you don't do some very specific steps you can still be charged. And I think if an account is closed, it should absolutely terminate all services that are still running on that account, and then send you the final bill.


If it happens again, you can try out the aws-nuke tool to help you destroy resources. (Because of course an open source tool is needed for this )


I've made the same mistake! Also check for EBS images which can be sneaky.


How much of this is a problem in practice?

I think in practice, S3 data is often indexed using other DBs e.g DynamoDB, Postgres, MySQL etc. Can't this index be used to enumerate all S3 URLs? I am off-course simplifying this a lot.


How much of this is a problem in practice?

This specific issue probably isn't a very big problem.

The issue of Amazon repeatedly coming up on HN as a service that will bill you when you're not unexpecting it for things that are moderately hard to understand and might refund you later probably costs them tens or even hundreds of millions in lost revenue every year from developers being cautious about deploying things to their services.


> things that are moderately hard to understand

My experience with AWS is that the pricing for each service is reasonably well documented and the calculator does descent job. The problem starts when multiple combinations of services are used and it becomes harder to reason.

With the advent of cloud, cost-modelling becomes an essential skill (which can be learned). One needs to be clear about total work that gets done and "how" that work gets processed. This in turn should translate to relevant cost metric (e.g PUT requests/s for S3 or IOPS for DynamoDB, amount of data scanned for Athena, etc)

This needs to be evaluated for zero load, normal load, 5x load, 20x load, etc. Zero load gives what is dead weight cost of the system i.e cost incurred when no work is being done (e.g EC2, EBS volumes, etc)


> probably costs them tens or even hundreds of millions in lost revenue every year from developers being cautious about deploying things to their services

I see this as a good thing. They literally encourage you to be cautious with your pricing and resource usage, to the point where they put limits on what resources you can use without explicitly asking for more.

Developers should be cautious and aware.


I'm not sure it's a good thing. But open-ended charges come up a lot. AWS, like the other big public cloud providers, doubtless have self-interested reasons for letting people run up the meter. But they're probably also hesitant to add cut-offs that could end up being footguns in a production context to bring down services in complex infrastructures.

People just playing around should be careful. There are ways to keep risk lower. But if absolute price caps are your priority you should probably be using a VPS of some sort.


Developers are pretty sloppy with maintaining references to object in my experience. It generally only becomes a problem when you need to clean up and that usually at time of IPO when there are petabytes of data in S3.


Stories like this make me extremely hesitant to try AWS. I was about to try S3 for a static site I was working on this weekends but I think I am gonna stick with netlify or digital ocean instead after reading this.


Having worked with S3+Cloudfront as well as Netlify, I can say with confidence that Netlify is better anyways. And it's hard to beat free...


In theory, I'm sure you could do that with the free tier.

In practice, activating that free tier requires a valid card, and I'd highly advise never giving them your own. Whatever alternative you can think of is 100% better for your sanity.


They accept prepaid gift cards as valid.


There are also other S3 providers with much more sensible pricing, like Backblaze B2, Wasabi, or Cloudflare R2.


> .5¢ per 1000 items LISTed seems insanely expensive considering how cheaply you can transfer terabytes of data with S3.

Correction: I misread - .5¢ per 1,000,000 items LISTed

  .5¢ per 1000 LIST operations
  LIST operations max out at 1000 items
Still a little pricey, but way less so than I'd imagined.

Do they make a lot of money off of charging for basic operations? It seems like you could make the whole pricing structure a lot more friendly by only charging for bandwidth use. I guess when you're as dominant as S3, you don't need to care about friendly pricing structures.

Charging for basic operations like that is weird, it's akin to a service charging people per number of clicks on a website.


There's money in confusion... I'm terrified of using the existing cloud services for personal projects. For business projects you can mostly just get an idea of how much your month-to-month bill will increase with certain actions, but it's sure easy to blow a budget by accident.


You should probably set up an LLC to handle billing for your personal projects if you plan on going with a big cloud provider.

It sucks, but the fact is AWS/GCP/Azure aren't really designed for hobbyist, they are designed for massive corporations. Their free tiers exist merely as a service to help train professionals to use their platforms.

Luckily, there are still good, low-cost providers out there.


Listing is an expensive operation. I don't know the exact economics of it, but it's very plausible to me that serving 1000 LIST requests has a comparable resource cost to transferring a couple GB of data cross-region. (It should be noted that this definitely isn't a market dominance thing - every S3 competitor I'm familiar with also charges per-operation, and charges 10 times more for LIST than GET.)


> .5¢ per 1000 items LISTed seems insanely expensive

Note it’s $0.005 per 1k requests, not $0.05 per 1k items -- that’s an extra zero from what you said, and also important to point out that one request can list 1k items. So if you list in 1k batches, it’s $5 per million items listed.


> So if you list in 1k batches, it’s $5 per million items listed.

It's $0.005 per million items listed. A thousand requests of a thousand items each is your million items, and a thousand requests is $0.005.


Right! Indeed even I was missing three more zeros. So the GP comment was missing 7 zeros, it’s $5 per billion items listed. Yes $5c per 1k items would be very expensive, good thing it’s not that pricey.


> In 2021, anyone who comes across this question may benefit to know that AWS console now provides an empty button.

source : https://stackoverflow.com/a/67834172


According to the article the empty button still calls LIST per 1000 objects. So if the guy in the SO thread has 2B object this one click would still cost him ~$10k ??


LIST requests are priced by the 1000. And are priced the same at all storage classes. My estimate is that it would cost him $1 to click that button.

  2M LIST requests =  (2B objects / 1000 per LIST)
  $1 = (2M / 1000) × $0.005
At least that's my reading of the Pricing page for US East (Ohio).


Ah ok. That would at least make more sense.


> Within the last year, AWS added a handy Empty button to the S3 console when viewing a bucket. You can click that button and watch the S3 console make API calls on your behalf.

> Here's what it does: It calls a LIST on the bucket, pagination through the objects in the bucket 1000 at a time. It calls a DeleteObjects API method, deleting 1000 at a time.

> The cost is 1 API LIST call per 1000 objects in the bucket. Delete operations are free, so there's no extra cost there.

source: I read the article.


> you can also get an export of all objects in a bucket using S3 Inventory and run the output through AWS Batch in order to delete those objects

"S3 Batch Operations" sends S3 requests based on a csv file, which can but does not have to be from S3 Inventory. But S3 Batch Operations supports only a subset of APIs and this does not include DeleteObject(s). [0]

An AWS Batch job could run a container which sends DeleteObjects requests but only when triggered by a job queue which seems redundant here.

If I can't use an expiration lifecycle policy because I need a selection of objects not matching a prefix or object tags, I would run something with `s5cmd rm` [1]. Alternatively roll your own golang which parses the CSV and sends many DeleteObjects requests in parallel goroutines.

0. https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-...

1. https://github.com/peak/s5cmd#delete-multiple-s3-objects


They have an example of some person almost paying $20k on transition fees. In my early days of AWS, I racked up $90k on S3 transition fees. Thankfully, AWS forgave it.


Stories of forgiven fees are an example of survivorship bias. Developers who rack up thousands in AWS charges by mistake and aren't forgiven probably don't tell too many people about the time they screwed up and cost their company a lot of money.


I don't think this scenario happens much unless people don't try to resolve it. I've worked with AWS for a long time at many companies, and support always takes care of these things. Things is just another anecdote, but I've never heard of AWS refusing to refund in these scenarios, even when it's entirely the customer's fault.


Would the S3 inventory help here? That would allow you to get the list of all files (albeit on a delay similar to the lifecycle rule approach), which you could process offline to generate the DELETEs.

https://docs.aws.amazon.com/AmazonS3/latest/userguide/storag...


S3 inventory would cost $0.0025 per million objects listed [1], while LIST requests are $0.005 per thousand requests and each LIST request can return up to 1000 objects, making them $0.005 per million objects listed. For the "Infrequent Access" storage tier, LISTs cost double that.

So S3 inventory would be half price compared to LIST (or quarter price in IA storage class), but that's still small comfort if you're staring down the barrel of a bucket containing a large number of objects.

[1] Management & analytics tab on https://aws.amazon.com/s3/pricing/


"Requests & data retrievals" tab footnote, above the grid:

> LIST requests for any storage class are charged at the same rate as S3 Standard PUT, COPY, and POST requests.

I read this as LISTs do not cost double for infrequent access, even though other Tier 1 requests do.


What a confusing mess. If LIST pricing is separate then it's strange to include it in the column of a table of storage-class pricing like that. But I agree that's what that statement seems to be saying.

AWS's own pricing calculator doesn't split out LIST requests: https://calculator.aws/#/createCalculator/S3

Either way, the takeaway is that using LIST or a bucket inventory, will still be O(N) cost and there's only a factor-of-2-ish difference between the two.

Then again, a billion objects is $5 territory to delete, and if you have a trillion objects to delete and no pre-existing listing to go off of, then odds are you can stomach the $5000 hit more easily than you could stomach the staff time spent trying to reduce that cost!


“ For information about Amazon S3 inventory pricing, see Amazon S3 pricing”

Is also mentioned in the article though they don’t calculate the price.


DELETE is free LIST is not

I guess one could spam DELETE calls while bruteforcing filenames to make it free.


An S3 inventory report [1] is $0.0025 per million objects listed versus $0.005 per LIST request. Make an inventory report request, retrieve the inventory report from S3, make your free delete calls synchronously with API calls or pay for a batch job [2] if you'd rather not write code ($0.25 per batch job and $1.00 per million object operations performed).

It's listed in the post as "extra credit", but its trivial having done it myself for a client. A bit disingenuous to state, "if you like to do things the hard way." It literally takes less than hour to do, and you can preserve the inventory report for housekeeping if needed.

[1] https://docs.aws.amazon.com/AmazonS3/latest/userguide/storag...

[2] https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-...

(OP: feel free to steal this comment's info if you want to update your post)


S3 Batch Operations list of Supported Operations[0] does not include DeleteObject or DeleteObjects.

0. https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-...


I appreciate you pointing out my mistake. That wasn't clear from the documentation I referred to [1] (when I performed the operation, I created a queue from the inventory report and performed delete requests popping the items off the queue), and I've let our TAM know the documentation I referred to needs to be improved to reflect that DeleteObject or DeleteObjects operations with a batch job aren't currently supported.

[1] "Specify the operation that you want S3 Batch Operations to run against the objects in the manifest. Each operation type accepts parameters that are specific to that operation. This enables you to perform the same tasks as if you performed the operation one-by-one on each object."

https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-...


> It literally takes less than hour to do

Yikes. An hour just to delete some files in a folder? It seems to me that S3 should be avoided unless you really, really, really need it.


When the scale of data goes up, so does the complexity of managing it. Deleting 100 rows from a live psql table with 1000 entries is very different from deleting 100M rows from a live table with 1B entries.


Ok pretty obvious, but if you don't know what you are storing inside your bucket, how are you accessing your objects in the first place ?

If your use-case is storing random things you don't know the path of, maybe it's the wrong product to use.


A common pattern is to dump log files or to use as a dead letter queue for failed event processing. These things typically are arbitrarily named by e.g. a prefix and a unix timestamp


One of the great features of S3 is that it has an arbitrary prefix index on keys! And the list API paginates on batches of 1000 which is useful.

You can retrieve all objects with a given prefix, which is great for storing content-addressed files, and being able to iterate on them. You can also partition on arbitrary prefixes too.


Here's an example. You're doing a big machine learning workflow and you've got a gazillion large files representing training data. You run a large training job in Amazon's cloud somewhere that is composed of one worker that lists the files and delegates work to various learning jobs, and then those jobs each stream in one training file, process it, and then grab the next item and repeat. That's a pretty common type of work.


> AWS is "eventually consistent" within most services, and S3 is no exception

Nowadays, it (¿almost?) is. https://aws.amazon.com/s3/consistency/:

“After a successful write of a new object, or an overwrite or delete of an existing object, any subsequent read request immediately receives the latest version of the object”

I think that says that deletes are immediately visible, too, but they phrase it weirdly, as, after a delete, there is no latest version of the object.

Also, I don’t think buckets are objects in this sense, so the caveat in the article stands.


I believe objects in the bucket now are strongly consistent, but the buckets themselves are still eventually consistent.


Said using the official terms; data plane operations are strongly consistent whereas control plane operations are eventually consistently.


Yes, when you delete with bucket versioning on it will leave a record of the object deletion tombstone. The previous object versions are still there obviously.


>The wait is often hours until AWS released a bucket name (since bucket names are globally unique, not just within your account).

I think last time I did this, the wait time was pretty much exactly 60 minutes.


This kind of thing is the best argument for your own bare metal hardware.


Or at least get some service with transparent pricing, instead of per-object per-action bullshit.


Anyone have suggestions for S3 alternatives for storing many files sized 50-500mb each? They are mostly long audio files and there is an external index as well.


Its silly that they wont just let you delete the whole bucket but this actually pretty cheap tho.

Based on some quick maths, deleting a million files would only cost you like $5.

P.S. Again its silly they do this and I'm probably greatly underestimating how these costs can add up for mid to large orgs.


Yes but some of us have many billions of objects. From memory, 4 billion object is $20k to delete.


No it’s not. Deleting 4B objects costs $20, via DELETE and LIST requests.

As the article says DELETEs are free and you can do bulk deletes of 1000 objects at a time. However you need to have the object names. You get those using LIST, which gives you 1000 items for each request. LISTs are currently priced at $0.005 per 1000 List requests. So $0.005 to delete 1M objects. Using the “empty bucket” feature does this internally and charges you that exact same amount.

The only way you get near your price is if you try to delete by applying a new lifecycle policy to 4B objects that are not in Standard storage.


This.

If you can select your objects with a lifecycle policy (by object tag or prefix) you don't need the LISTs either. The prefix can be "" to select all objects. Just be careful with that.


I think the takeaway is to maintain your own database of references to objects if you can.


S3 isn't really built for lots of small objects, which is also quite tellingly reflected in its pricing (and performance in dealing with those).

Right now, anything less than 400KB should really be plonked into its cousin, DynamoDB.

Other than that, I fully expect AWS to announce a new S3 bucket type (and pricing) for high-volume, small-size blobs. There is also a small matter of addressing Cloudflare R2, which should result in a Lighsail-EC2-esque fork of S3.


UPDATE: i fucked up the maths, thought it was $0.005 per list request. Its $0.005 for 1k list requests, so its more like $0.005 per million objects..

thats really really cheap.


> Deleting a bucket won't let you re-create that bucket immediately.

This is partially incorrect. I can recreate it immediately in the same account, but in different account, I need to wait for ~1 hour


Another reason to switch to R2.


Does R2 have fewer expensive footguns than AWS?


R2 isn’t even available yet, so only time will tell


Yes.


If s3 was any service the general public use, it would be banned for sure.


Most things on AWS cost money and AWS makes pricing incredibly complex and opaque...where the monthly bill is usually the first way people find out about these things. While it is likely no consolation, S3 is by far one of the most complex AWS products pricing-wise with different object storage types each with their own rates, request costs with different rates for GET/PUT/POST/etc which this post mentions, and transit/egress fees.

I work on https://www.vantage.sh/ which helps teams get visibility on their cloud costs which may be helpful to folks here as well on this topic.


I've literally never had a problem explaining S3 costs to a customer. There are complex AWS products for pricing, but this just feels like a plug




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: