Sorry to do this here, I know it’s not Amazon support. Is there a way to copy S3...

mathgladiator · on April 11, 2021

Yes: https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObje...

I wrote many tests for it... so many tests...

jcims · on April 11, 2021

I’ll have to try this again, it really seemed like the performance was quite low for it to be a ‘backend’ color operation. Thanks!

edit: just tried it, definitely seems to be working. not sure what I was seeing earlier, thanks!

mathgladiator · on April 11, 2021

distributed systems are complex, and most likely you were bound to an overloaded webserver. My knowledge of S3 is six years old now, but the copy operation is single threaded with a tremendous amount of hashing to ensure durability.

At scale, the performance cost is worth it given the number of checks done internally to ensure the copy is perfect. If you download and upload, then you could make it faster however getting all the details right such that corruption didn't happen is tricky.

samanator · on April 11, 2021

You can use aws s3 batch which can handle billions of files in just a few minutes

https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-...

YawningAngel · on April 11, 2021

You can probably use https://docs.aws.amazon.com/AmazonS3/latest/userguide/replic... to accomplish this

derekdb · on April 11, 2021

aws cli cp can copy between buckets.

wikibob · on April 11, 2021

That’s true but data flows through the machine executing aws cli. The parent was asking how to copy data without it flowing through a compute instance.

toomuchtodo · on April 11, 2021

I’d double check the cli code path for that parameter. The s3 api does have a copy method, which performs the operation within s3 without compute acting as an intermediary. If that’s not the case for the copy parameter, sounds like a bug that needs to be fixed in the cli tooling.

ben509 · on April 11, 2021

It is directly checking for s3 to s3[1] and indicates that it wants to copy...

I've read over it and I'm reasonably sure that it's going to issue CopyObject, but it would take me actually getting out paper and pen to really track it down.

The AWS CLI and Boto are a case study in overdoing class hierarchies. Not because there's any obvious AbstractSingletonProxyFactoryBean, but rather that there's no instance that stands out as "this is where they went wrong" and nevertheless the end result is a confusing mess of inheritance and objects.

[1]: https://github.com/aws/aws-cli/blob/45b0063b2d0b245b17a57fd9...

Galanwe · on April 11, 2021

Not to mention the insane over engineering of a python 2.7-compatible async task stealing io loop which is slow as hell, and pitifully delivers a maximum of ~150MB/s with 30% cpu core activity. That's why anyone needing to regularly download/upload files from S3 need an additional library (s5cmd, s3pd, etc)

wikibob · on April 11, 2021

Thanks! I don’t know why I had the understanding that it worked the other way. This is useful to know!

ak217 · on April 11, 2021

No, when copying objects between buckets (aws s3 cp s3://... s3://... and the corresponding sync command), the AWS CLI uses CopyObject (https://docs.aws.amazon.com/AmazonS3/latest/API/API_CopyObje..., previously known as S3 PUT Copy), in which the client doesn't handle any object contents. The call stack eventually reaches https://github.com/boto/s3transfer/blob/develop/s3transfer/c... (or its multipart equivalent), where it calls the botocore binding for this API.

darthShadow · on April 11, 2021

RClone: https://rclone.org/s3/