Hacker News new | past | comments | ask | show | jobs | submit login
Google Axion Processors – Arm-based CPUs designed for the data center (cloud.google.com)
269 points by ksec 6 months ago | hide | past | favorite | 126 comments



Looks like GCP has been using ARM processors from Ampere since mid 2022:

https://cloud.google.com/blog/products/compute/tau-t2a-is-fi...

So I guess this may be the end for Ampere on GCP?


How do you do, fellow "gluing neoverse cores to synopsys IP" hyperscalers?


We walk among you


LMAO

Azure / AWS / Ampere silicon are all Neoverse wrappers, amirite?


I thought Google had started their own core microarchitecture group. Must be slow going.


Synopsys should make its on to stop all this nonsense zt this point.


Another interesting challenge for intel (and AMD to a lesser extent). Between the share of compute moving to AI accelerator (and NVIDIA), and most cloud provider having in house custom chip, I wonder how intel will positioned themselves in the next 5 to 10 years.

Even if they could fab all of those chips, the margin between the fab business and CPU design is pretty drastic.


> Another interesting challenge for intel (and AMD to a lesser extent).

Are there any benchmarks comparing Threadripers and Xeons with this processor ?


Doesn't TSMC just fab?

They seem to be doing just fine.


TSMC fabs leading node, and has consistently for several cycles now. So its margins probably benefit from a premium.

If Intel can make their foundry business work and keep parity with TSMC, the net effect is that margins for leading node compress from the increased competition.

And there's a lot of various US fab capacity coming online in 2024/5: https://www.tomshardware.com/news/new-us-fabs-everything-we-...


> the net effect is that margins for leading node compress from the increased competition.

That is true in perfectly competetive markets, but I'm skeptical about that idea holding true for high-end chip nodes.

I'm not sure there is enough competition with Intel joining the market alongside TSMC, Samsung, and all the other (currently) minor players in the high-end space. You might see a cartel form instead of a competative market place, which is a setup where the higher margin is protected.

My best guess is the price will remain high, but the compression will happen around process yields. You could successfully argue that is the same as compressing margins, but then what happens after peak yield? Before prices compress, all the other compressable things must first squeeze.


> You might see a cartel form instead of a competative market place, which is a setup where the higher margin is protected.

Wouldn't it more likely be that players just carve out niches for themselves in the high-end space where they DON'T compete?

If you're Intel - it seems like a fools errand to spend $50B to maybe take some of TSMC's customers.

You'd probably rather spend $10B to create a new market - which although smaller - you can dominate, and might become a lot larger if you execute well.


I figured you'd see margin compression from the major, volume-limited buyers: e.g. Apple, Nvidia, etc.

Paying a premium to get a quota with TSMC looks different, if there's competitive capacity, at least for large customers who can afford to retask their design teams to target a different process.

Even if only as a credible stalking horse in pricing negotiations with TSMC.


All I wanna know is how it compares to AWS Graviton2/3/4 instances. Axion needs to be cheaper, faster, or lower emissions to be worth even considering. Everything else is just talk and vendor lock in.


Graviton2 is based on Neoverse N1. Graviton3 is based on Neoverse V1.

The Graviton4, announced last year at AWS re:Invent. is based on Neoverse V2.

So you should expect similar performance to EC2 R8g. I say similar because obviously there may be some difference with clock speed and cache size.

It terms of x86. We are expecting AMD Zen 5c with 160 Core / 320 vCPU later this year.


Why vendor lock-in? It even uses the same CPU core as Graviton4, so it's clearly quite a fungible offering to me.


By lock-in, I’m referring to my EC2 committed use savings plan that would prevent me from considering migrating to GCP until it expires next year, even if Google’s instances are quantifiably better.


Why are you grousing about vendor lock in when you chose to sign an extended contract to get a better deal? You locked it in for them.


> lower emissions

Do you know of any cloud providers that publish actual data for this? Preferably verifiable, but I'll take anything at this point.


Google published PUE (https://en.wikipedia.org/wiki/Power_usage_effectiveness) numbers a while ago, I haven’t seen anything that specific from Amazon.

It’s difficult to quantify emissions because the power generation is mixed and changes based on demand. Water consumption should also be a factor, but there’s even less data available for that.



[flagged]


It can also matter where the data center is located, although certain trading schemes can offset that.


Summary: no timelines, no specifics, NOT a custom core (just neoverse), benchmarks with no references.


My hunch is that this is theater. The executives wanted this program as a weapon. Google didn't like being at the mercy of processor suppliers. Now they can threaten to just make their own. It could ultimately turn out to be a bluff, if their suppliers drop their price and improve their lineup. Source: made it up.


Nonetheless, Intel and AMD must be sweating as another hyperscaler breaks away from using x86 exclusively.


This is probably one reason why Intel is moving towards providing foundry services. The barrier to entry for doing chip manufacturing is higher than for designing chips now. It’s still an open question if Intel can compete with TSMC and Samsung though.


Intel 4 is superior to anything Samsung offers, and not nearly as far behind TSMC's 3nm density as people are lead to believe. The "open question" is mostly about fab scaling and the business side of foundary management. Their silicon works just fine in the market as it stands.


I read a lot about how important it is for a foundry to be able to work with customers and this used to miss from the company DNA of Intel. Cell library, process that sort of thing. We shall see.


Intel 20a and 18a are the really exciting processes yet to come out from everything reported. It seems like a game of catch up until then.


Right, Google invested in non x86 since 2016 afaik (I was in the team supporting arm powerpc inside Google). At the size of Google, it's pretty much can break from any vendors without damaging it's core businesses


I doubt it, Google has been using Ampere for some time.


They probably already knew it was coming. Announcing it 2 weeks before Alphabet/Google quarterly results might be timed.


It's obviously timed to the Google Cloud Next conference, which starts today.


The Cloud Next Conference is obviously timed to the quarterly results which is obviously timed to the ARM processors, wait, no, the ARM processors are obviously time to the Cloud Next Conference which is obviously timed to the quarterly results


And it's all timed to come out before Halloween. What kind of sorcery is this?


Maybe when there are some real life comparison.


I'm divesting as must as possible from cloud offerings these days. We have our data "lakes" in colocation now, @ 1/40th the cost of the cheapest cloud offerings, but with greatly expanded CPU, mem, and disk, so it's not even an apples-to-apples comparison.

What I am jealous of though is these x86 competition announcements by AWS/Gcloud, as there simply is nothing available outside Ampere Alta, and it has not seen a refresh in awhile. The Nvidia Grave CPU is eternally delayed and I'm guessing will not be at a competitive price point if it ever makes it to market. I've come to appreciate how important memory bandwidth is after using a M1 cpu for awhile.


Buzzword soup and lots of X% better than Y or Z times something. Any ELI5 version of this with concrete numbers and comparisons?


Nope, and I don't think Google will publish concrete numbers anytime soon, if ever.


Phoronix will benchmark it at some point.


I wonder if TOS precludes publishing benchmark results like some SQL database products.

I also wonder if home users will ever be able to buy one of these? Will the predecessor show up in the used market?


The major cloud providers are pretty good about allowing benchmarking.

The closest thing you can buy is Ampere or Grace. Graviton 1 chips are also floating around under a different name.


I am interested in the market impact of offloads and accelerators. In my work, I can only realistically exploit capabilities that are common between AWS and GCP, since my services must run in both clouds. So I am not going to do any work to adapt my systems to GCP-specific performance features. Am I alone in that?


They're accelerating networking and block storage. Do you use those?


Of course, but even if Google makes somehow networking 50% more efficient, I can't architect my projects around that because I have to run the same systems in AWS (and Azure, for that matter).


It appears to me that Google is just matching what AWS Nitro did years ago.


That's the flavor of it, but they didn't give us enough data to really think about. But the question stands generally for accelerators and offloads. For example would I go to a lot of trouble to exploit Intel DSA? Absolutely not, because my software mostly runs on AMD and ARM.


I'm curious to see when Apple will migrate off AWS and run their own datacenters on their own ARM socs


It’s not at all a given that this would be profitable. Apple is not paying the same prices for AWS/GCP that mere mortals do.


Well they’re paying minimum 30 mil per month until this year


I think it’s a heck of a lot more than that. 30 million seems puny compared to the revenue their services businesses generates. Though to be fair much of it probably doesn’t run on public clouds.


From what I understood is that everything runs on public clouds. They tried Microsoft, Google, and Amazon. Sooo they should have enough experience by now.

The contract was 1.5B over 5 years


Sourcing here is 2019 article about just one AWS contract. Apple also uses Google Cloud and Azure extensively, not just tryouts, they were one of Google Cloud's biggest customers. They are also building their own data centers. (TL;DR it's much more complicated than these comments would indicate at their face)


Working hard to avoid lock-ins, how the other half lives.


Xcode cloud isn’t running on Apple silicon, arguably a place where it would make tons of sense.


That does not make any sense


idk running a Hackintosh is kind of strange no


So how do you think they are running those unit tests for iDevices APIs, VMs?


Unit tests?


One of the uses of XCloud code, CI/CD pipelines?


Which Apple doesn’t use?


I find the posturing as a thought leader and industry leader (on this topic especially) a bit ironic. A cloud provider licensing ARM Neoverse and throwing an ARM chip into their cloud compute boxes is not exactly a novel business practice.

I'm happy to see this, and it should be all goodness, but... the posturing... I don't want to be negative for the sake of being negative, but I don't understand how anyone can write that first paragraph with a straight face and publish it when you're announcing ARM chips for cloud in 2024(?, maybe 2025?).


I’m all for investment in less power hungry chips. Even if it’s from Google (for a short period of time. Who knows how long these chips will be supported)


Server CPUs are not power hungry. Only the CPUs used in desktops and workstations are power hungry.

The server CPUs (including x86 like AMD Genoa/Bergamo) normally consume between 2 W and 4 W per core, the same as the big cores of the laptop or smartphone CPUs.

A server CPU consumes hundreds of watts only because it has a very large number of cores, so it is equivalent with a very large number of smartphones or laptops collocated in the same box.

In complete servers, there is a significant fraction of the power that is not consumed by the CPUs but by the many high-speed interfaces. That is unavoidable and it does not depend on the architecture of the CPUs.

Any newly announced server CPU should be more efficient than those already in production, otherwise there would be few advantages in introducing it, but which core architecture is used does not have enough influence as to ensure that it will not be leapfrogged by whatever server CPU will be announced later, regardless of the core ISA.

When Apple has switched to Arm it was able to improve the efficiency by using much lower clock frequencies than Intel. This was only an indirect effect of using the Arm ISA, because it makes much easier the concurrent decoding of a great number of instructions. For server CPUs, the Arm ISA cannot enable similar efficiency gains, because all server CPUs are already using the most efficient clock frequencies, in the range 2.5 to 3.5 GHz, so it is not possible to repeat an efficiency gain like in Apple M1, by reducing the clock frequency.

All the cloud vendors have chosen Arm because it is the only ISA that can be currently used to design their own high-performance SoCs, not because it would provide efficiency gains over the Intel/AMD cores. The overall efficiency of a custom SoC will indeed be better, because it is streamlined for their specific system architecture, in comparison with a general-purpose CPU from Intel, AMD or Ampere.


what happened with ampere? they are falling behind?

if this chip has no custom core, just neoverse v2. I dont see any compelling reason for gcp to do this.


I wonder if this is part of their new SoC chips they've been building to replace tensor on the pixel line.


I would be very surprised if there is much overlap between this server sku and any mobile sku. Mainly because Google doesn’t design the compute core, they integrate some number of cores together with many different peripherals into an soc. Mobile hardware will need vastly different integrations than server hardware and the power / area optimizations will be very different. (Mobile is heavily limited by a maximum heat dissipation limit through the case) the reusable bits like arm cores and pcie / network interfaces might be reusable between designs, but many of those come from other venders like arm or synopsis


also at techcrunch:

https://techcrunch.com/2024/04/09/google-announces-axion-its...

> Google did not provide any documentation to back these claims up and, like us, you’d probably like to know more about these chips. We asked a lot of questions, but Google politely declined to provide any additional information. No availability dates, no pricing, no additional technical data. Those “benchmark” results? The company wouldn’t even say which X86 instance it was comparing Axion to.


Goole taking a page from Apple’s playbook.


At least Apple really delivered with the Ax and Mx chips. Let's see if that pans out here as well.


It's google, so probably not. Their claims should always be taken with a grain of salt. They went with their own stuff route for the SOC on their Google Pixel phone lines and those SOC are always much worse than the competition and are the main reason why those phones have such terrible battery life compared to Qualcomm powered phones.


I remember when Apple marketed swift as being xx times faster!

In fine print, compared to python, lul.


This didn't happen


https://www.apple.com/in/swift/

Upto 8.4x faster Than python


As a Python developer, I am happy that Apple mentions Python and acknowledge its existence.


Claim: "In fine print, compared to python,"

Reality: In bold call out:

Up to 2.6X faster than Objective-C


Not sure what your point is but Apple also claims up to 8.4x faster than Python:

https://i.imgur.com/CZ6vTZV.png


My point is we shifted from "compared it to Python in fine print" to "compared it to Objective-C and Python in bold focused callout". People are jaded, we can argue that's actually the same thing, that's fine, but it is a very different situation.


At least with Apple's Bezos charts usually there's fineprint that'll tell you which 3 year old pc laptop they're comparing it to.


They must be really trying to pump the Next conference attendance numbers.


Other way around probably. Team had the next conference as a deadline and this is all they were able to come up with.


Recently there was an article on the frontpage titled "headline driven development".

I guess it happens more often than not.


https://news.ycombinator.com/item?id=39891948 - That was posted Apr/1st so authors were half joking


Probably not, given it starts today and is already sold out.


Amazing

Sounds like some middle manager will retire this even before launch so they can replace it with Google Axion CPU or some meaningless name change


What's up with physics names ? Axion, Graviton, Ampere, Electron. I guess it sounds cool and is not trademarked...

What's next ? -- boson, hadron, parsec ?


There is a lot of room in the periodic table, too, as MS showed starting with https://www.servethehome.com/microsoft-azure-cobalt-100-128-...


Fewer thank you'd think. It turns out the poisonous ones are bad for PR.


If I had to guess, it's because its a hardware change... like you're doing better at implementing software on physics :P


Generic names are often used to avoid copyright and trademark strikes. Other easy naming schemes are rivers, minerals, cities,...


Is "Tesla" a generic name too?


Yes


Eight testimonials and it's clear the companies haven't been able to test anything yet. "[company] has been a trusted partner for customers adopting Arm-based virtual machines and an early adopter of Arm for our own operations. We’re excited about Google Cloud's announcement of the Axion processor and plan to evaluate it [...]'


[flagged]


Since this is cloud computing, they're here to stay.


[flagged]


This is not an end-user service, this is custom-built hardware of which they know that it is more efficient than what they are already using. Did they kill their VCU and TPU, which was created for the same reason?


2-5 years depending on how moderately successful it is.


You will start using after they discontinue?


I came here expecting this comment. It’s interesting that a tech vanguard community like HN doesn’t trust Google products.


https://killedbygoogle.com/

Their reputation is deserved. Google domains was killed only last year!


Google does kill a lot of products but that site rubs me the wrong way, they really stretch the definition of "killed" to run up the numbers as much as possible. Products that were simply rebranded or merged into another product are listed as "killed" even though they functionally still exist.

The old Google Drive desktop client is on there, for example, when they still have a Drive desktop client. You may as well list Chrome versions 1 through 122 as products they've killed by that standard.


Yes, thank you, I feel the same way. It's especially frustrating because I would really like to have an accurate and fair list that I can reference when I need that information, or refer people to it when their memories are short. It's very not helpful in its current form


It’s not that I don’t trust the products. I don’t trust the business or its C-level suite.

This is not the same Google of 2000s


We used to, until they canceled that trust


doesn't trust anymore


Edit: I don’t get the downvotes because I completely agree with the commentators that Google is unreliable. I almost pasted the killedbygoogle link myself!


With more and more big players starting production of customized private proprietary hardware compatibility becomes increasingly difficult. Which works out well for the big players who can leverage it as lock ins.

Regular people wont be able to buy the chips from Microsoft, Google, and you only get M* chips if you buy Apple hardware.

Good luck with the frankenMacs now.

At the same time devices that you buy get locked into company as well, and if that company goes out of business you are screwed.

Like I was when Ambi Climate closed up shop and left with e-waste. All the hardware I need is in there but I can do anything with it.

Or when Google decided to close down access for Lenovo Displays, because they didnt want to support 3rd party Displays anymore. Two more e-waste devices for me. (There might be a way to save the Displays I just haven't got in working yet)

Open, compatible, standardized omni purpose hardware seems to be dying. Much more profit in lock ins


> you only get M* chips if you buy Apple hardware.

Former M* team members are now at Qualcomm and HP/Dell/Lenovo will ship their Arm laptops later this year, including Windows 11 and upstream Linux support.


Dell and Microsoft has been shipping ARM laptops for years.

Creating an ARM cpu is ok but will they copy the entire M* architecture?


We'll find out after the MS Surface + Qualcomm (ex-Nuvia) launch in late May.


If they can match the battery life, then I might switch back to a Linux laptop. I’m so sick of Apple’s anti-competitive, anti-consumer practices.


I’m a windows dev and dogfooding the latest OS/hardware. The upcoming ARM devices + OS standby changes massively improve battery life.

I also own an M1 and have been passively comparing, much closer now.


These are ARM processors using standard ARM instruction sets.

I don’t see any lock in here.


PC hardware (and hardware in general) has never been particularly open. We simply seem to move from one dominant player to the next. I don't think AWS/GCP using custom chip for their cloud offering changes much of the situation (well at least before they start having weird custom instructions).


Aren’t all of these ARM chips? Why is compatibility such a big issue?


It isn't a big issue. But ARM doesn't have a universal boot loader/device discovery/etc standard like EFI/ACPI, so there is some more work to support them.


Arm servers do precisely have exactly that set of standards ((U)EFI/ACPI). See Arm SystemReady. You'll notice in the blog linked above that it mentions Arm SystemReady-VE, which uses those standards.


Downside is that GCP still has the same faults regardless of which CPU is being used. Things like poor customer interaction, things not working as designed etc. Switching to ARM won't solve that.


As much as I don't trust google and their customer service is trash, their infrastructure is mostly good and some of their services are very aggressively priced. I wouldn't 100% buy into GPC but a multi cloud solution that leverages some of the good bits is definitely a top tier approach.


how do you deal with egress costs?


I don't think they're a big deal if you're hitting vertex for inference and cloud run for long tail services. If you're 50/50 with AWS though that might be a different story.


In what ways? In my experience I've found GCP to easiest to use


It's not about easiest to use but the way in which problems are handled. I am aware of case where something failed after being assured it would work by GCP ... when they got someone a GCP tech lead on the video call he started by saying it was all 'best effort' and that it might take several days to resolve. Ultimately it was fixed in 8 hours but that sort of laissez-faire attitude has led to the company in question making plans to go elsewhere.


Feels like they've changed their pricing structure for bigquery multiple times in the past couple years. They've never turned the status page yellow or red but there have been a few incidences where from our perspective the service was clearly degraded.


yeah on the whole (data eng by day -- data eng contractor by night) using both AWS and GCP I much prefer GCP to AWS. I find it far more simple to use, has sane defaults, a UI that isn't harmful to users doing clickops, etc.

AWS gives you low level primitives to build the moon or shoot yourself in the foot.


Looks like it's cloud-scam only, rather than a product one can actually buy and own?


That'd be like complaining AWS don't sell Graviton chips to the public. Why would they, they're a cloud provider building their own chip to get a competitive edge. Selling hardware is a whole other business.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: