RPS to S3 is limited, but not throughput to S3, except by bucket. Higher throughput can be achieved by sharding your data across multiple buckets. Also, its important to properly namespace your keys within buckets to ensure its efficiently distributed across underlying data partitions.
My experience is solely based on recent production workloads attempting to pull TBs of data out of S3 very quickly to restore data to less than reliable indexed datastore. YMMV.
> However, if you expect a rapid increase in the request rate for a bucket to more than 300 PUT/LIST/DELETE requests per second or more than 800 GET requests per second, we recommend that you open a support case to prepare for the workload and avoid any temporary limits on your request rate.
You have to know how to read their docs. :) This is basically code for, "there is a default limit here that you have to get raised if you want to go above it".
>Amazon S3 scales to support very high request rates. If your request rate grows steadily, Amazon S3 automatically partitions your buckets as needed to support higher request rates. However, if you expect a rapid increase in the request rate for a bucket to more than 300 PUT/LIST/DELETE requests per second or more than 800 GET requests per second, we recommend that you open a support case to prepare for the workload and avoid any temporary limits on your request rate. To open a support case, go to Contact Us.
So this looks like an auto scaling issue. It states "S3 automatically scales to support higher request rates". However, if we know that a bucket is going to need to scale dramatically, we can request, in advance, that the S3 team pre-scales it.
I'm sure there is an account limit, but to run 1000 cpu's already requires requesting an increase in the account's EC2 instance limit. Are you saying that a team trying to access 150Gb of files, or to make 1000 RPS, as the article documents, will hit that limit? From your experience, how big is this hard limit? Is it Netflix scale or is it GB or TB?
We are routinely pulling a dataset of hundreds of GBs to 100+ instances (1600+ cores) in parallel. We have never noticed throughput going down with the number of nodes. S3 delivers the maximum throughput of 2-4Gbps / instance very consistently.