Hacker News new | past | comments | ask | show | jobs | submit login

All the benchmarks were from a single instance.

(Note that I have done some testing from AWS Lambda, where we had 1k lambda jobs all pulling down files from S3 at once. That's a bit harder to benchmark...)




Hi OP, nice writeup! I hope my comment wasn't construed as dismissing the work, just a criticism of one small part.

It sounds like that wouldn't have been a factor, except for the cap you seem to have discovered on Amazon that you called out.

My only suggestion then is you may want to make it explicit that you ran the benchmarks from a single instance.


Thanks! Not at all, it's a great point and something I didn't realize would play into the equation.


Any comments on how it worked out with Lambda?


Reluctant to say much because the benchmarks weren't formal. However...

The throughput correlated directly with how much RAM we allocated to the Lambda function (which presumably means we were sharing the VM with fewer other jobs).

512 MB RAM, 19.5 MB/s

768 MB RAM, 29.8 MB/s

1024 MB RAM, 38.4 MB/s

1536 MB RAM, 43.7 MB/s

Note that this also used the node.js AWS SDK, which is slower to download files than some other APIs.


Thanks. I'd guess bigger RAM uses bigger instance types as a host hence more bandwidth. If this was my goal I'd try gof3r to stream data from s3.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: