I like AWS, but AWS EFS has the same problem. They've improved it a bit through ...

sciurus · on Jan 9, 2020

I think it's throughput credits that EFS gives you (e.g r/w MiB/s), not IOPS. AFAIK they don't document the IOPS available at all.

In my experience the latency for an individual i/o operation on EFS is always at the "HDD from 1995" level regardless of available burst credits. Something that does lots of small random I/O like checking out got repos on Jenkins workers is basically worst case for EFS.

https://docs.aws.amazon.com/efs/latest/ug/performance.html

angry_octet · on Jan 9, 2020

It's NFS, so the bad latency isn't surprising. The problem is that they don't have anything faster -- it tops out at 2GBy/s or something, even with hundreds of TB, even with multiple clients. You have to share your data over multiple EFS volumes, or build your own virtual gluster, which are extremely shit options. Also makes any kind of bug data HPC impractical.

Bezos, if you're listening, fire someone. You should have next generation pNFS or lustre like protocols by 2016.

sciurus · on Jan 10, 2020

They actually do have https://aws.amazon.com/fsx/lustre/

Latency of EFS is much worse than running your own NFS in my experience.

angry_octet · on Jan 10, 2020

Doh, how did I not find that?

But also, how does the pricing work? It seems to be half the price of EFS? It almost seems like that are assuing S3 read or Direct Connect to populate the FSx volume.

oblio · on Jan 9, 2020

Throughput credits, you're right, my bad.

The agents were in ECS with no persistent storage, so that wasn't the problem. I was just running the Jenkins master off of EFS, for the persistent configuration storage.

And I don't think it's the latency that's killing EFS usage, it's the throughput. While the credits were there, everything went smoothly, once the credits ran out, the base throughput was fit for IO meant for the 90s.

pnutjam · on Jan 9, 2020

We had the same issue with a pgsql server. Started out fine, but to get decent performance you pay out the nose for higher disk throughput. It looks competitive when your pricing things out and don't know you need to pay for that. When you find out it's a classic sunk cost fallacy and most companies just eat the cost.