It's not exactly what you're asking for, but we have a large bucket with billion...

toomuchtodo · on May 1, 2019

I have to ask: what’s performance like for operations on the bucket objects?

Edit: I ask because AWS suggests a key naming convention for large object amounts to ensure that you're distributing your objects across storage nodes, to prevent bottlenecks.

https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate...

votepaunchy · on May 1, 2019

“This S3 request rate performance increase removes any previous guidance to randomize object prefixes to achieve faster performance. That means you can now use logical or sequential naming patterns in S3 object naming without any performance implications.”

https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-s3...

DVassallo · on May 1, 2019

No difference for Put, Get and Delete. Don't know about List, but if it degrades it's not significant. I worked with buckets with exabytes of data and billions of objects.

meritt · on May 1, 2019

Never noticed any speed difference due to bucket size. S3 is generally slow anyway (250ms for a write isn't uncommon) but it scales very well and we use it for raw data storage that's not in our critical path, so the latency isn't a problem.

Edit Response: I've always used the partitioning conventions they suggest so not sure what sort of impact you encounter without.

duality · on May 1, 2019

But would a batch delete by any other name smell just as sweet?

cheeze · on May 1, 2019

> don't ever do this, it was a terrible idea

I have billions of files. What do?

meritt · on May 1, 2019

For us, it was due to the relatively high PUT cost if you're storing a large number of small files. We ended up changing our approach and we now store blocks (~10MB archives) instead of individual files into S3. The S3 portion of our AWS bill was previously 50% PUT / 50% long-term storage charges. After the change, we managed to reduce the PUT aspect to nearly $0 and reduced our overall AWS bill by almost 30%, while still storing the same amount of data per month.

e.g. if you write 1 million 10KB files per day to S3, you're looking at $150/mo in PUT costs. If you instead write 1,000 10MB blocks, you're looking at $0.15/mo in PUT costs.

Due to S3's support of HTTP range requests, we can still request individual files without an intermediate layer (though our write layer did slightly increase in complexity) and our GET (and storage) costs are identical.