I like AWS, but AWS EFS has the same problem. They've improved it a bit through some recent changes, but it's not much better.
The way it would work: they gave you absolutely pitiful base IOPS credits for EFS and everything else was related to disk space used. So more disk space used (and paid), more IOPS. After that they'd completely detroy your IOPS if you used up all the credits. By destroy I mean IOPS at the level of a HDD from 1995.
I set up a Jenkins using EFS and initially it went well. It barely had any activity and after about 2 weeks it used up all the credits. After that even the login page would take 20 seconds to load.
I think it's throughput credits that EFS gives you (e.g r/w MiB/s), not IOPS. AFAIK they don't document the IOPS available at all.
In my experience the latency for an individual i/o operation on EFS is always at the "HDD from 1995" level regardless of available burst credits. Something that does lots of small random I/O like checking out got repos on Jenkins workers is basically worst case for EFS.
It's NFS, so the bad latency isn't surprising. The problem is that they don't have anything faster -- it tops out at 2GBy/s or something, even with hundreds of TB, even with multiple clients. You have to share your data over multiple EFS volumes, or build your own virtual gluster, which are extremely shit options. Also makes any kind of bug data HPC impractical.
Bezos, if you're listening, fire someone. You should have next generation pNFS or lustre like protocols by 2016.
But also, how does the pricing work? It seems to be half the price of EFS? It almost seems like that are assuing S3 read or Direct Connect to populate the FSx volume.
The agents were in ECS with no persistent storage, so that wasn't the problem. I was just running the Jenkins master off of EFS, for the persistent configuration storage.
And I don't think it's the latency that's killing EFS usage, it's the throughput. While the credits were there, everything went smoothly, once the credits ran out, the base throughput was fit for IO meant for the 90s.
We had the same issue with a pgsql server. Started out fine, but to get decent performance you pay out the nose for higher disk throughput.
It looks competitive when your pricing things out and don't know you need to pay for that. When you find out it's a classic sunk cost fallacy and most companies just eat the cost.
The way it would work: they gave you absolutely pitiful base IOPS credits for EFS and everything else was related to disk space used. So more disk space used (and paid), more IOPS. After that they'd completely detroy your IOPS if you used up all the credits. By destroy I mean IOPS at the level of a HDD from 1995.
I set up a Jenkins using EFS and initially it went well. It barely had any activity and after about 2 weeks it used up all the credits. After that even the login page would take 20 seconds to load.