Hacker News new | past | comments | ask | show | jobs | submit login
PostgreSQL on ARM-Based AWS EC2 Instances (percona.com)
208 points by timf on Jan 22, 2021 | hide | past | favorite | 62 comments



When using pre-built Postgres packages on modern ARM platforms, like the author of this post is, double check that the Postgres binaries are actually using the ARMv8.2 assembly instructions.

This can make a significant performance difference, and at least for the RPM-based official Postgres packages this is a problem: https://www.postgresql.org/message-id/CACN56%2BP1astF5zvocrT...


Thanks, that might explain something I'm wondering for a while now.

I was running Postgres on a 2-core/2GB ARMv8 scaleway instance and had to migrate to AWS T2 micro (1-core/1GB)x86 when scaleway pulled the plug on ARM servers.

I expected performance decrease due to obvious reduction in HW specs, but on the contrary the application is much snappier. I don't have PG bench figures, but I can clearly notice performance improvements (discounting the network variables).


One of co-authors here. Yes, I expected that to be a huge problem, but, at least for Ubuntu (or, I guess, deb packages), it seemingly wasn't. Which is a pretty big deal, as I doubt a lot of people compile their own PostgreSQL. Trying to push PG on ARM platform to a limit would be an interesting exercise, though, and could necessitate a custom build.


> While ARM-based instance is 25 percent cheaper, it is able to show a 15-20% performance gain in most of the tests over the corresponding x86 based instances.

To put this into Workload per Dollar. Graviton 2 on AWS is roughly 50% to 60% cheaper. Now not every workload will benefits, there are many CPU intensive workload especially those benefits from SIMD won't work as well on current Graviton 2. But there are enough workloads especially those dealing with Web Stack has shown huge cost savings.

Hopefully this add perspective when people are thinking Intel's Server Platform is still relatively safe. I wrote something about it here [1].

[1] https://news.ycombinator.com/item?id=25808856


We are having a persistent issue on ARM RDS instances of postgresql. They don't come up after a reboot. Each time, we had to file support tickets and get the customer support to fix it.

They said it was happening because of load, which is a falsehood, since I'm talking about a machine coming up after a reboot.

There is something broken on ARM on AWS there. And even after multiple tickets, they aren't able to fix it


Has anyone here made the jump? I'm considering testing some components or our infrastructure on Gravitron processors but going all the way in with the database sounds risky but I don't really have a technical data point to justify my bias.


I made the jump on one of my databases soon after Graviton 2 was officially supported on RDS. I'm running a small (~500 GB stored data) DB on db.m6g.large with PG 12.4. It's a tiny bit cheaper and performs modestly better than the Intel based instance it was running on before. There haven't been any problems or externally visible changes since the switch.


We did is and is amazing! All the claims are true and worth it. If your stack is Linux Based you can be up and running within hours: Our stack is based on NodeJS, PostgreSQL, Nginx, Redis, MongoDB, MySQL


FWIW, our experience was different: I benchmarked our Node apps on C6g, C5n, and M5zn and found Graviton to be 10-25% slower in our CPU-bound NodeJS code than the same code on x86. Graviton give you better relative performance per dollar, but for absolute single-thread performance M5zn is still the best choice for us.


Arguably, if you're using AWS, you've already compromised to a point where scaling horizontally is your path forward.

You've likely given up much more than 10-25% by using AWS in the first place (note to the false dichotomist: this doesn't mean owning your own servers). And paid a premium for the privilege.

If you're CPU-bound, get a E-2288G (or whatever the commodity chip that maximized price/$$ is right now) and set mitigations=off.


Which cloud providers are better? What do you use?


I think Google Cloud requires less domain-specific-knowledge, which is a big plus. And, as a consequence, you're less likely to end up with an expensive and complicated mess.

But if price or performance or user privacy is a concern, none.

I'm a big believe in dedicated hosting. There's tens of thousands of options. Not affiliated with any, but in the past/present what I have/am using and would recommend: webnx.com, reliablesite.net, hivelocity.com, hetzner.com, ovh.com (and their other brands), leaseweb.com we're currently considering phoenixnap.com for a warm-standby DR site (but no experience with them yet). Also, Softlayer pre-IBM acquisition, but that's just sour grapes.


Depending on your specific needs, then Hetzner seems to be good value:

https://www.hetzner.com/dedicated-rootserver/matrix-ax

They have a good reputation for stability too, unlike (say) Scaleway. ;)


Mitigations=off?


I’m going to assume it’s the Intel HyperThreading exploit mitigations?


Yes, starting with Linux 5.2, a bunch of mitigations can be disabled with that 1 boot option. But that's just a small part of where the performance gain comes from. The real win is just recent, dedicated and non-virtualized hardware.


I moved ElastiCache & RDS instances over. A bit faster, a bit cheaper exactly as expected and no time required other than a “terraform apply”.


I’ve played with graviton instances and compiled a few bits of C I use for benchmarking. Not a scientific test but I found them to be good but nowhere near the performance per core as the intel offerings. As a comparison they are nowhere near the Apple M1 either. But that doesn’t mean they’re a bad option.

I think they end up close to same price/performance as the Intel offerings as well once you invoke the compiler optimiser. The main benefit is better core scalability and lower energy usage which may work out as a net gain for database workloads.

As always it’s best to measure these things yourself with your expected workload.


Gravitron's value prop is you get 2x the physical cores for a slightly cheaper price. Intel wins at single-thread performance, but for parallel workloads Intel's hyperthreaded vCPUs can't keep up.

But you're right in that in the end it all depends on your workload.


I use it with RDS. If something fucks up, you can always blame AWS. Otherwise have had no issues with it.


For my side projects waiting for them to do the burstable / cheaper versions in RDS. It's a bit funny that the cheaper chip is not available in the cheaper instance options - seems like that would be a good market (whoever is too cheap to do M instances etc).

Edit: T4g's are available on EC2.



> Has anyone here made the jump?

Waiting for the t4 instances.


Uh, they've been available for several months now.

https://aws.amazon.com/blogs/aws/new-t4g-instances-burstable...


Not for RDS (or beanstalk).


The author makes it pretty clear this isn't a comparison of ARM vs. x86 but with the specs being nearly identical, I'm finding it very hard not to draw that conclusion. The difference is even larger when you factor in the cost difference.

Are there other factors that could explain the large performance gap?


I think what they mean is that there are enough non-CPU differences that you can't make this an accurate comparison. That being said at the end of the day the ARM instances are cheaper and better performing—whatever the reason may be.


But the non CPU differences (compiler / linker / OS optimizations), number of drives etc - all seem to favor x86?


The SSD is clearly different. Are there other differences, or are people just assuming?


> the specs being nearly identical

graviton2 are real cores while amd/intel is SMT vCPU's. So 2x real cores difference.


Sure, but it's also 20% cheaper to get the ARM setup, so I don't think it's fair to discriminate this way. In the end what matters is what I can buy, and ARM both offers more cores per socket and per dollar.


The cost here is an artificial variable as it completely under the control of AWS. They can decide to sell them very cheap to lock people on AWS.


EC2 is just virtual machines. The lock-in parts of AWS are the ones where you're not exposed to stuff like CPU architectures.


This is generally true, but in the case of ARM instances, EC2 currently has no real competitors, as far as I know.


Scaleway used to be one but they screwed it up and removed the offering (to be fair ThunderX1 is an early platform, not fully standards compliant etc. but argh it was working)

Packet whatever their name is now (equinox metal or something) is a competitor but only against the bare metal instances, no cheap tiny VM product unfortunately.

Huawei Cloud... has weird region restrictions for their arm offering but it exists??


Most software isn't cpu-architecture specific though. Switching from ARM on AWS to x86 on another providor is likely to be quite straightforward.


That's true. I'd even made the same point elsewhere :-P

https://news.ycombinator.com/item?id=25881305


Buying Annapurna Labs also means they don't have to contend with the chipmaker's margin on the Graviton chips -- Intel's margin on server chips seems to be quite high.


Nobody pays retail.


Yes but porting to other ARM nodes isn't difficult, and if you're already using AWS you're comfortable with a fair bit of lock-in already.


The point here is that specs are not nearly identical. Which was a premise of the comment I was replying to.


Possibly once AWS has gotten enough people switching to Graviton CPUs, they would start raising the cost.


“Would” is a bold word choice given that AWS hasn’t ever raised prices. Could, can, might - sure.


I don't think that would work out for them. Most code that runs on AArch64 will run just as happily on AMD64. Few of their customers are going to be writing AArch64 assembly.


Cost is a key factor. It should be trx/$. That really makes clear the ARM advantage here.

Additionally the ARM version only had one disk vs two for x86.

I also feel like there is more headroom for optimization now that ARM is more visible as a server class target in the non-postgresql toolchains.


It's a wrong measurement because not all workload can be "shardable". If your workload is bound to a single instance it does not really matter if it's a bit cheaper.

Not every workload are behind a loadblancer, so single core / MT / single instance performance can be important.


Does anyone know if/when ARM is coming to GCP? Seems like a no-brainer for many use cases and our company would love to move, but we don't have the resources to manage multiple clouds for now.


Can't speak to GCP. I work at Oracle on OCI, their cloud platform. ARM is on its way, based on Ampere Altra chips, which are looking really competitive.

https://www.phoronix.com/scan.php?page=article&item=ampere-a... https://www.phoronix.com/scan.php?page=article&item=ampere-a...

I'm genuinely excited to get my hands on these systems, and see how well they can do. There has been so many promises of a future based on ARM powered servers, and claims about their advantages, and it finally seems like it's within reach from a performance perspective. Whether they'll meet the other expectations, I couldn't guess, but this surely has to be concerning for Intel.


Since Google doesn't appear to be working on anything like Graviton, they would have to use a smaller player like Ampere or Nuvia. Marvell says they are out of the general purpose market, and will only work in partnerships for custom chips.

Maybe the M1 success will cause some players to re-think things, but for now, there doesn't seem to be an easier path for competitive high-end ARM servers apart from what Amazon is doing.


Nuvia, at least according to Ben Thompson of Stratechery, is not going to making server chips. I guess that’s not to say they can’t given a big enough order but it is his belief that Qualcomm bought them to focus on consumer devices and things.


As of around 2020H2 Google had nothing in the works for their own main CPU hardware. The ASIC engineers there are mostly focused on proprietary ML hardware like dragonfish, positron etc that have more comparative advantage to do in-house.


You could try N2D (Which is AMD): https://cloud.google.com/blog/products/compute/announcing-th... .

The benefit of N2D is that it's still x86, so your existing binaries should work.

Disclosure: I work at Google.


I would have liked to have seen a comparative bonnie++ benchmark - those SSDs are not the same.


Has anybody tried out PostGIS on gravitron yet? I'd love to switch over, but I'm a little worried about what I'll run into.


With this sort of architecture switch for a database, would there be any reason to switch dev environments to use an ARM build of PSQL (where possible) as well to match production? PostgreSQL is often a core piece of apps that rely on it, but it sounds like this sort of change may be pretty much seamless from the application layer perspective.


Assuming you don’t mean developing for Postgres, not really.

The api developers will be using will be staying the same. It’ll be transparent.

For a staging server, I would. Just to be as close to prod as possible and actually have the same environment and packages to debug against if needed.


Is it me or I see intel going down the PowerPC route now. I never imagined a niche micro-computer like acorn computers do something so revolutionary.


Since DB are very often "single" instance you should use what scale vertically best which is x86 based.


Given the storage formats are identical, you should choose the most cost-effective option for your workload now and scale that to larger machines (ARM or x86_64) when it’s justified.

I promise you, you’ll know with plenty of time to spare before you’re going to hit the current 64-core limit of Graviton before you need to decide if you can fit into a 128 vCPU (which I believe are hyper threads not cores) x86_64 instance.


Thaxll is not wrong and shouldn't be downvoted. Vertical scaling means more cores, not higher single-thread performance. Because Graviton is limited to 1-socket systems, EC2 has x86 instances that are larger than the largest Graviton. Whether you'll ever need those is workload-dependent.


Well, no, he is wrong. Postgres can run on either x86 or ARM. You can do replication between them, no problem. There's no reason to run workloads on x86 just to get access to larger instance sizes at some point down the road. You can migrate Postgres on ARM to Postgres on x86, it's not even hard. Use what works best for you now and in the part of the future you have some visibility on. AWS might well announce bigger Graviton instances before you ever get there, if you ever get there at all.


single instance != single core/thread




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: