Hacker News new | past | comments | ask | show | jobs | submit login
Arm Announces Neoverse V1, N2 Platforms and CPUs, CMN-700 Mesh (anandtech.com)
216 points by timthorn on April 27, 2021 | hide | past | favorite | 92 comments



> AWS Graviton2-based EC2 Instances make up 14% of the installed base within AWS

> 49% of AWS EC2 instance additions in 2020 are based on Graviton2

Surprised at this level of Graviton2 adoption in AWS at this stage. Any clues as to who is using these instances?

Edit: Presumably Intel's shrinking Q1 2021 Data Center revenues are partly as a result of this.


>Presumably Intel's shrinking Q1 2021 Data Center revenues are partly as a result of this.

It was both AMD and ARM.

There are many work loads that G2 offer immediate cost / performance advantage. AWS charges per vCPU, which is one thread on Intel/AMD and one Core on ARM. So you get ~30% performance improvement along with a ~30% lower cost for using ARM Graviton Series. Most of them have reported a total of 50% reduction in cost. For those that have hundreds if not thousands of EC2 running which fits that workload advantage, this is too much saving to pass on.

There are many SaaS running on EC2 that has mentioned their success on twitter and various other places.

Worth pointing out, this is with Amazon installing as many as they get from TSMC.

A few months ago on HN I wrote [1] about how half of the Intel DC market will be gone in a few years time.

Edit: Another point worth mentioning, this is as much of a threat to Medium and Smaller Size Cloud like Linode and DO where they dont have access to ARM (Yet). And even when they get it Amazon have the cost advantage of building their own instead of buying from a company ( Ampere ).

[1] https://news.ycombinator.com/item?id=25808856


Linode and DO could always offer a physical x86 core instead of a virtual SMT core. It would cut into margins somewhat, but maybe Intel and AMD would be more willing to discount when they have to play defense. I think one problem for the x86 guys is that because the demand for chips far exceeds supply, they’re still doing “fine” or even “well” right now. So the threat from ARM may still be perceived on mostly an intellectual level instead of provoking the necessary visceral survival response.


I think it's unlikely that anyone at Intel thinks -20% Q1 datacenter revenue YOY is doing well.

The question is what options do they have to deal with this?


I believe Intel couldn't have imagined with what ease their biggest customers can turn into their biggest competitors overnight.

Even a decade ago that would've been unthinkable, but today, making a cookiecutter SoC is relatively easy because nearly everything can be taken off the shelf.

Production costs though.... sub-10nm mask set costs completely rule out anything resembling a startup competing in this area.

I think 65nm was the last golden opportunity to jump on the departing train. It was still posible to ship a cookie cutter chip under $1m, now... no way.

Now, Semi industry is basically Airbus vs. Boeing.


Startups can absolutely compete here. There is sufficient capital to fund chip design (integration) and it is relatively low risk. We are going to see a huge number of Arm and RISC-V solutions on the market 14 months from now.


What companies are working on RISC-V solutions? I always feel like it's "5 years away" but not sure if that's just my own perception


A few RISC-V SBCs are already on the market. I suspect RISC-V will come to dominate the IoT/Edge space in the next few years before graduating to other market segments.

IoT/Edge deployments are less standardised than other computing workloads. Developers and integrators in this area already expect to deal with a lot of bother when working with a new chips. Also, the margin on these devices is usually razor thin, so the potential savings from not paying ARM licensing fees would be more appreciated.

Finally, RISC-V's modular approach allows for a greater level of flexibility and innovation, which will allow manufacturers to further differentiate and gain a competitive advantage. This is especially relevant for IoT/Edge solutions where thermal and power budgets are heavily constrained.


SiFive is probably the leader, though I admit I don't know much about their competition.


Agreed .. except weren't Ampere (2017) and Nuvia (2019) startups?


Ampere basically started as a re-labeled XGene from Applied Micro which started back in 40nm days. And they came with quite some cash to start with: their backer is Carlyle Group, the biggest LBO shop in the world.

Nuvia basically never intended to really compete Intel, or AMD heads on. Their $30m stash would've been just enough for a single "leap of faith" tapeout on a generation old node, and a year of life support after.

They were aiming for a quick sell from the start too.


Depends on your definition of startup I guess. Certainly seems to be enough capital available.

I definitely don't agree with premise that it's now Boeing vs Airbus now (certainly less so than it was a few years ago when x86 was the only game in town).


Ampere kinda was an acquisition of Applied Micro, but its internal X-Gene uarch was dropped into the trash very quickly in favor of Arm Neoverse…


Do you actually know how much a sub-10nm mask set costs? There’s a lot of speculation from people who don’t have access to those numbers. Those who do are bound by NDAs.


I do hear figures in single megabucks for relatively small tapeouts.

Back in 65nm, 40nm days, big tapeouts were already costing in high 6 figure digits in masks.

And... masks are not the most expensive items on the signoff costs these days.

Specialist verification, outsourced synthesis, layout, analog, physical, test, and other specialist services will easily cost more than the maskset for <40nm.

I would not be surprised if tier 1 fabless already spend $10m+ per design just on them.


You are absolutely correct that design costs swamp mask costs by far. For 7 nm, it costs more than $271 million for design alone (EDA, verification, synthesis, layout, sign-off, etc) [1], and that’s a cheaper one. Industry reports say $650-810 million for a big 5 nm chip.

[1] https://semiengineering.com/racing-to-107nm/


Cookie cutter?


There have been a bunch of higher-profile, "we moved to Graviton2 and cut costs". Twitter, for example, migrated: https://www.hpcwire.com/off-the-wire/twitter-selects-aws-and...


Also Netflix, although I don't think they've said what portion of their instances they've migrated: https://aws.amazon.com/ec2/graviton/customers/


They're the cheapest EC2 instance type, so they're very attractive to small scale deployments like side projects, personal sites etc. (basically anything that can run on one or two small nodes) where budget is a major concern. The t4g.micro is in the free tier as well, so that'll help.

I host a few very low traffic sites & I'm in the process of switching from a basic DO Droplet to a pair of low-end Gravitons. Will save me money and give better peak performance for my workloads.


> switching from a basic DO Droplet to a pair of low-end Gravitons. Will save me money and give better peak performance for my workloads.

I'm having trouble figuring this out - a t4g.micro is $6/month, before any storage or data transfer costs. The roughly equivalent DO offering is $5/month, inclusive of 25GB SSD and 1TB transfer. Even with a reserve instance discount and significantly less than 1TB outbound transfer, DO seems likely to be cheaper.


CPU power on $5 offerings from others is likely not as great. Also AWS did a free tier for everyone, and the spot market is fun…


Maybe, but it would take a _lot_ of people moving small deployments (where by definition the savings would be small especially relative to the fixed costs of getting to work on Arm) in a relatively short space of time to have this impact - so I'm sceptical (and if it is then it must be very easy to move to Arm - which I'm also sceptical of).

More likely some very big customers (peer comment mentions Twitter) moving to Graviton2 for cost savings.


Graviton might be the top and/or default choice in their management console when you create an EC2 instance. That would swing things pretty quickly for all the free tier folks.

Edit: Nope, not yet, but close..you still have to change the radio button: https://imgur.com/a/W0Sweyy


I'm confused - the x86 box is ticked by default there.


Yes...edited my edit. I'm pretty sure there was no radio button for some time, you would have had to scroll into other choices to get a Graviton instance.


The company I work for has migrated hundreds of heavily-utilised Elasticsearch and Storm nodes to Graviton. No performance issues, pure cost saving. We’re working on the rest of our systems now. We’re going to save hundreds of thousands of dollars over the next few years.


AWS's own offerings such as RDS and internal control plane stuff are very likely using ARM behind the sheets.

I have evaluated going ARM, but I ended up deciding the savings were not worth it.

Not only you need to mantain 2 archs simultaneously for some time, but porting some stuff to ARM, (e.g. Python) can be a pain in the ass.

Finally, my devs work in AMD64 and that would be another source for "why does this work in dev but not prod".


Just wait for the m1 MacBook to become common place. Then it’ll be my devs use ARM.


> Finally, my devs work in AMD64 and that would be another source for "why does this work in dev but not prod".

I can see a use case for building a CI/CD pipeline on Raspberry Pi's.


RDS support Graviton2 as the instance type, maybe people with supported versions just migrated.


"instance additions" also doesn't take instance size/performance into account. If ARM-based instances are overall smaller, that'd allow more of them, distorting the numbers...

Percentage of compute power would be cool to know here.


We haven't migrated yet but we expect to do some benchmarking this quarter for Aurora.

For EC2 we run on spot and spot c5.metal are cheaper per vcpu than c6g.metal, so we haven't prio'd benchmarking our compute loads.


I wouldn't be surprised if AWS is using Graviton2 pretty heavily for internal processes as well, stuff like control planes for the major services like S3, SQS, SNS, etc...


AWS itself is probably the main user I would guess. So you use them indirectly through AWS' vendor lock APIs.


I know many users. Basically any non-x86 workload that cost sensitive can benefit from moving to arm instances. Database instances are good candidates, big data workloads as well.


Funnily coincident timing: current post #2 (GCC 11.1 released) adds support for the CPUs mentioned here (currently post #4):

  AArch64 & arm

    A number of new CPUs are supported through arguments to the -mcpu and -mtune options in both the arm and aarch64 backends (GCC identifiers in parentheses):
        Arm Cortex-A78 (cortex-a78).
        Arm Cortex-A78AE (cortex-a78ae).
        Arm Cortex-A78C (cortex-a78c).
        Arm Cortex-X1 (cortex-x1).
        Arm Neoverse V1 (neoverse-v1).
        Arm Neoverse N2 (neoverse-n2).
Good to see work going into this at the proper times. (Not that that was much of a problem for CPU cores in recent times. Still not a matter of course though.)


These tunings will only be used if you compile stuff yourself with -march=native (or specifying one particular model). Most software out there would be compiled with generic non-tuned optimizations. The tuning is rarely a huge deal though.


True, but it's still relevant for 3 things:

- when you have a particularly CPU-intensive application, you'd hopefully compile it to target your system

- the cloud providers can just do a custom Debian/Ubuntu/... build for their zillions of identical systems

- the library loading mechanism on Linux is slowly getting support for having multiple compile variants of a library packaged into different subdirectories of /lib (e.g. "/usr/lib64/tls/haswell/x86_64")

Also I was mostly trying to point out as a positive how well the interaction is working there between ARM and the GCC project. I wish it were like this for other types of silicon.

(CPU vendors all seem to be getting this right, and GPUs are slowly getting there, but much other silicon is horrible… e.g. wifi chips)


That is not entirely true. Binaries in the packaging systems might not be compiled for the most recent atomic instructions which can really affect performance.

https://blog.dbi-services.com/aws-postgresql-on-graviton2-aa...

https://github.com/microsoft/STL/issues/488

We are about 9-14 months away from the right pieces making their way through the software ecosystems where this will be almost a non-issue.

Exciting times for everyone!


Well, that – yeah. But it doesn't strictly have anything to do with the actual CPU model specific tuning that the news was about, only in that setting a specific CPU in -march (-mtune would not do it!) would imply the features. Typically though you'd just do -march=armv8-a+the+desired+features for that like the first post you linked does.

Really the important piece for making distribution binaries not suck is ifuncs/multiversioning. But library and app authors currently are required to deliberately use them. Which is fine for manual optimizations that use intrinsics or assembly (and e.g. standard library atomics) but I'm not sure any compiler currently would automatically just do that for autovectorization.


Hah. I like it when I can enjoy my hammock instead of fine tuning my code to weird limits for performance.

DDR5,PCIE 5.0, SVE speedup and 40% IPC improvement put a big smile on my face.


Apple's M1 has made ARM mainstream for laptops, Lets see which company does same for server space.

Hopefully ARM on cloud will result in cheaper prices.


I nominate Amazon for this award. As mentioned in another comment here, ~50% of newly allocated EC2 instances are ARM.


Apple's M1 will make ARM mainstream on the server side.


Apple hasn't seemed interested historically. And the Nuvia folks left Apple to found their company explicitly because they thought an M1 style CPU core would do well in servers but Apple wasn't interested in doing that.


it's not that apple will sell server chips. it's that developers can locally work on arm which makes it easier to deploy to severs. linus torvalds had a quote about this...


Linus' quote/post:

""" And the only way that changes is if you end up saying "look, you can deploy more cheaply on an ARM box, and here's the development box you can do your work on". """

(emphasis in original)

https://www.realworldtech.com/forum/?threadid=183440&curpost...

Thanks, I did not know about this!


It’s wild to consider that my next computer (an arm M1 Mac) will be compiling code for mobile (arm) and the cloud (arm). I wonder if we’ll ever see AMD releasing a competitive arm chip and joining the bandwagon.


Now with NVIDIA owning ARM I'm not sure AMD will do that


Ah, ok, that makes sense.


Personally, I don’t see many server admins choosing to pay the Apple Tax to get M1 into their data center. I don’t see how the watt/performance ratio could pay off that kind of tax.


I did not mean to imply that the actual M1 will be used in data centers. Apple is quite popular among developers and its also a trendsetter which will probably lead to other computer manufacturers to adopt ARM for personal computers. So having more people use ARM on their personal computers will lead to more ARM adoption in the data center.


> Apple is quite popular among developers...

The great majority of developers use Windows or Linux according to every Stack Overflow survey from the past ten years. Only ~25% use a Mac.


I believe interpreting statistics from those surveys in this way isn't fair. There are so many developers around the world but the pattern of value/money generation by them is not uniform; in other words, a small percentage of developers work for companies that pay the largest share of server bills and penetration rate of macOS devices among developers of top companies is probably higher than average. (I'm not implying that developers who work on non-macOS devices, make less value because your device doesn't have - nearly - anything to do with your impact. I'm just talking about a trend and possible misinterpretation of data)


25% is still "quite popular".

If 25% of servers switch to ARM that is massive.


Ah, ok, I understand. Thanks for clarifying!


The OP wasn't suggesting Apple M1 chips in the data centre, but rather that Apple M1 chips in developer workstations will disrupt the inertia of x64 dev –> x64 prod. It will be easier for developers to choose ARM in production when their local box is ARM.


Apple have been hiring for Kubernetes and related roles. This may well be for their own devops for Apple services.

However I'd be amazed if they don't release some kind of managed service for running Swift code in the cloud. Caveat emptor, though.


I have a similar view of what Apple are up to. Too many high-profile hires to be purely toiling in the mines.


this is inevitable.


My operating systems teacher 2001 was a total RISC fan and always said it would eventually overtake CISC.

I guess, he didn't expect this to take well after his retirement.


ARM today is probably more CISCy than what he considered CISC in 2001.


The best analysis of RISC vs CISC is John Mashey's classic Usenet comp.arch post, https://www.yarchive.net/comp/risc_definition.html

There he analyses existing RISC and CISC architectures, and counts various features of their instruction sets. They clearly fall into distinct camps.

But!

Back then (mid 1990s) x86 was the least CISCy CISC, and ARM was the least RISCy RISC.

However, Mashey's article was looking at arm32 which is relatively weird; arm64 is more like a conventional RISC.

So if anything, arm is more RISC now than it was in 2001.


amd64 is more RISC now than ia32 was in 2001 as well.


AArch64 is load-store + fixed-instruction-length, which is basically what "RISC" has come to mean in the modern day. X86 in 2001 was already… not that :)


I always understood it as that too.


Eh, it has a lot of instructions, but that was only the surface of RISC. It's a deeper design philosophy than that.


Also, isn't x86 ISA just a translation layer today? I thought on the metal, there is a RISC like architecture these days anyway.


Not really, because the variable length instructions have consequences - mostly good ones because they fit in memory better.

Also, the complex memory operands can be executed directly because you can add more ALUs inside the load/store unit. ARM also has more types of memory operands than a traditional RISC (which was just whatever MIPS did.)


I had the impression that M1 would outperform others because it didn't had variable length instructions.

Why do you think they have good conequences?


I've understood it, the tradeoff is

The upside to variable length instructions is that they are on average shorter so you can fit more into your limited cache and you make better use of your RAM bandwidth.

The downside is that your decoder gets way more complex. By having a simpler decoder Apple instead has more of them (8 wide decode) and a big reorder buffer to keep them filled.

Supposedly Apple solved the downside by simply throwing lots of cache at the problem and putting the RAM on-chip.

I'm not a CPU guy and this is what I've gathered from various discussions so I'm happy to be corrected.


In most cases, yes, but it doesn't get rid of the complexity for compiler backends that can't directly target the real instruction sets Intel uses and have to target the compatibility shim layer instead.


Does anyone know if these processors make mobile development easier ? I mean, its the same architecture now, right ?

The only thing I could find is https://www.genymotion.com/blog/just-launched-arm-native-and...


Not really. Maybe the emulators are faster but the main languages are managed anyway.


Plus for iOS, even the idevice simulator just builds for whatever platform you're testing on.


I hope that ARM servers with reasonable specs won't be exclusive to AWS and the other hyperscalers. For example, it would be nice if OVH would offer ARM-based dedicated servers.


Tl;dr

V1 = Slightly tweaked ARM Cortex X1 with SVE ( Used on Snapdragon 888 ) on 7nm aiming at ~4W per Core.

N2 = New Cortex with AMRv9, ~40% IPC improvement over N1 or 10% lower than V1, SVE2, 5nm aiming at ~2W per Core. With Similar die size to N1. ( I fully expect Amazon to go 128 Core with their N2 Graviton )

So in case anyone is wondering, no, it is not Apple M1 level. Not anywhere close.

CMN-700 = More Cores and support of Memory partitioning, important for VMs.


The cores don't serve the same purpose as the M1 cores. M1 is optimized for single thread at the cost of die size (and a bit of power). I don't have exact numbers, but say the apple M1 core takes 1.5x the die area of N2, the you'd get better performance by putting in 1.5x the number of N2 cores.


Yes. M1 / A14 is also a 5W+ Core. So a different set of trade off. I mentioned the M1 because it is the question that always comes up and people keep banging on about it. I wish most tech site would simply point this out since they have the reach. But it is obvious neither Apple nor ARM have the interest for their pieces to be named and compared in this way. And I guess tech site wont do this to harm any relationship.


Nice TLDR!

It's apples and oranges comparison to Apple M1 chips (server vs. consumer) but does hint at what's possible with the next generation ARM Cortex "X2" cores, that could appear in next year's flagship smartphones and laptops. A 30-40% IPC jump, partly due to moving to 5nm fabrication process, is huge.

Given the right implementation, namely squeezing more big cores than the current 1-3-4 configuration, it could close the gap considerably with Apple.


Process node changes generally doesn't do anything for IPC - those are generally rather due to microarchitecture improvements, so I doubt the move to 5nm has anything to do with the IPC gain..?


The node shrink lets you afford more transistors that provide more IPC.


I agree with that - but if you take an unchanged core and manufacture it at a different node, then you won't see a change in IPC, which in my book makes it questionable to attribute IPC gains to the process node.


this is good. if ARM take off for consumers and business. i'm hoping RISC-V will get some traction. there are alternative to Intel's x86.


You can tell it's a modern CPU because its name matches the [A-Z][0-9] pattern.


That was a joke by the way. Sorry that I'm the only one who thought it was funny.


I thought it was funny!


High five, there are two of us!


As opposed to those old [a-z][0-9] CPUs that Intel puts out ;)


I wonder why they didn't use AWS wide numbers rather then just EC2. I would have thought EC2 would lag in the transition while AWS services would make the switch quickly


Because EC2 represents a more realistic market adoption, it’s more important to know if you can run the software of your choice on ARM than can Amazon develop a service on an ARM stack.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: