The AWS Go SDK now has a connection pool based S3 download/upload manager API th...

2ion · on April 11, 2021

With AWS it's really hard to tell which of their SDKs are up to cutting edge feature parity and which not, that some of them are and some of them are not is a real shame.

Just last week I wrote basically the same thing as an ad-hoc solution using boto3 because I had 10s of TB of data to pull out of Glacier and distribute across S3 buckets. It wasn't a big deal because I'm experienced writing parallel network code in Python and having big datastreams flow, and boto3 has good documentation, but things like this really shouldn't be left as an exercise to the SDK consumer.

Xunjin · on April 11, 2021

Do you know if there is a "sync" function just like the aws-cli?!

I've been thinking of starting using Go to deploy some stuff that doesn't need python as dependency, and statically compiled :P

Edit: In this case for DO Spaces. Way more cheap.

darthShadow · on April 11, 2021

RClone does all that you require and much more.

RClone: https://rclone.org/ S3 Backend: https://rclone.org/s3/

Galanwe · on April 11, 2021

Be cautious though, as rclone "sync" is based on file metadata (e.g. last modified), it does not recompute local etags to know which files need to be sync'ed.

For instance, if you "cp -a" a directory and then apply sync, it could do nothing and return success if the copied files were last modified before the ones in S3.

For our use case at work, we wanted to be _sure_ that sync always work as intended, and thus ended up recomputing etags locally and compare to the ones in S3 to know what to sync (got bitten by the issue of last modified before)

ak217 · on April 11, 2021

I'm not aware of a supported API in the AWS Go SDK for this, but there is a sketch here: https://github.com/aws/aws-sdk-go/tree/main/example/service/...