Hacker News new | past | comments | ask | show | jobs | submit login
AMD Rome – is it for real? Architecture and initial HPC performance (dell.com)
208 points by rbanffy on Oct 23, 2019 | hide | past | favorite | 144 comments



Dell has historically been a completely Intel shop, so the fact they are even tepidly considering AMD here is likely because of a large number of customer request for pricing and performance quotes.


> Dell has historically been a completely Intel shop

Because Intel paid Dell and others to not use AMD[0][1]. Dell officially ended their Intel exclusivity 13 years ago when they agreed to sell servers with Opterons[2].

[0] https://en.wikipedia.org/wiki/Advanced_Micro_Devices%2C_Inc.....

[1] https://business-ethics.com/2010/07/23/0901-dell-inc-agrees-...

[2] https://www.cnet.com/news/dell-opts-for-amds-opteron/


But make no mistake about it, Intel has a fat wallet and will use it against AMD ( as confirmed in some leaked presentation )

Big discounts, kickbacks, "deals" and "programs" are being deployed and more on the way.


To put some perspective on this, Intel's Marketing, general and administrative costs in 2018 were 6.7 billion[0] whereas AMD's net revenue was 6.4 billion[1].

[0] https://s21.q4cdn.com/600692695/files/doc_financials/2018/An...

[1] http://ir.amd.com/static-files/438f4934-2883-4c85-9193-d5218...


And as a potential customer it's important to remember that you're the one paying that $6.7 billion.


I'm reminded of that any time I see cheap products with slick marketing (mostly fast food) - you have to assume they're cutting corners in their actual product to pay for all that graphic design and video editing.


Or their revenue and income are so massive that taking a bit from it for advertising is a drop in the bucket. Intel has revenue of nearly 71 billion/year vs AMD at 6.5 billion.


That may be true, but despite that they still cut corners. See Spectre and Meltdown vulnerabilities and the impact to the performance of Intel CPUs once mitigated. In fact, Intel recommended disabling HyperThreading if you want to be completely protected from the vulnerabilities.


And I hope some anonymous whistleblowers within Dell will do the right thing and expose the practice.


Expose what practice?

'Big discounts, kickbacks, "deals" and "programs" are being deployed and more on the way.' are generally perfectly legal.


> Expose what practice?

The practices that have resulted in Intel being sued/settling/fined by companies/states/FTC/EU/Japan.


Sort of. Maybe. But as an moderately large Intel customer, I assure you they are very afraid of any accusations of unfair practices or anti-trust behavior, so they are somewhat crippled in their ability to fight with money.


It was clear while Dell ended its exclusivity with Intel, it was still an Intel shop. They continue to push Intel machines across all front, offering AMD merely as a checkbox to avoid any anti competitive or legal problems.

HPE on the other hand tends to be a little more supportive.

And I could only guess there are lots of push from Intel to ask their customer not to offer any AMD machines.


I remember how bizarre that was at the time: Dell had stuck exclusive to Intel through the whole NetBurst era, despite AMD being superior. And then once Core 2 came out and Intel was back on top, Dell started selling AMD.


"tepidly"

No kidding ;)

"Initial performance studies on Rome based servers show expected performance"

Which is true, but misses the real point.


Even the sensational title “Rome: is it real”. Obviously it’s real. It’s fair to ask “how does it perform in real world scenarios”


Dell has been selling current generation AMD since at least mid last year. We have a few R6415 and R7415s at work.


Curious what type of workload you are using them for?

We're a VMware shop so I'd be hesitant to deploy non-Intel just because I don't want to get caught holding the bag in case AMD can't keep their momentum up for the next 5 years when we'd most likely looking to lifecycle the environment.


What kind of workload do you run that relies on hardware? I mean you mention, you use VMware anyway.


Manufacturing is weird like that. We like VMware because we can refresh our compute and storage without downtime, since a lot (almost al) of the applications are not built for HA.


I think no OEM can wait till 2022 for a decent answer of Intel though.


Can anyone attribute the recent surge in AMD performance to recent hiring blitz or some other shift? I know they were struggling and trying out new things with bulldozer modules and trying to compartmentalize as much as possible. It reminds me of when Intel hired all those Israeli engineers who went back to the PIII design and miniaturized it, gaining performance by throwing away years of work on the P4.

Was there any of this at play recently at AMD?


AMD's apparent velocity is likely due to multiple factors, first and foremost is Intel has essentially been stuck, trying to shrink die sizes (something AMD has essentially outsourced), suffering from shortcuts leading to Spectre/Meltdown and the performance regressions due to patching those, and inability to get their new architecture out the door.

All this coupled with really good execution from AMD, new architecture that has turned out to work really well and not really having to deal with Spectre/Meltdown regressions.

So yes AMD is doing very well, Intel's current problems makes them look even better. How much of this is hiring and how much is just trucking along? Who knows.


> AMD's apparent velocity is likely due to multiple factors, first and foremost is Intel has essentially been stuck, trying to shrink die sizes

AMD's success has nothing to do with Intel's failure.

AMD engineered a beast of a processor architecture that runs circles around the competition at a price that renders competing options obsolete. Ryzen's performance track record would not be any different even if Intel did not sold processors with security problems or succeeded shrinking die sizes.

It boggles the mind how some people try to spin AMD's outstanding progress as something that Intel did instead of a collosal technical achievement by AMD.


That's not what I said at all.

I said AMD is doing very well, and they look like they are doing even better because Intel is having problems.

I recall an anecdote from an AMD engineer (from some article) "we expected to be competing against Intel's next-gen architecture" (paraphrased).


Plus Intel has been such a dominate player for so long they've had no reason to innovate and largely haven't in some time. Their complacency caught up to them.


The anti-Intel circle jerk has to slow down a bit. Intel management is terrible but their engineers have been delivering amazing features. Did you know nowadays the main performance bottleneck is memory latency? For 8 years Intel has the best LLC around and with a very fancy adaptive replacement algorithm. Also the recent release of 10th gen came with 50% more L1 and 100% more L2, plus AVX-512 with finally a decent subset even on mobile i3 chips. And there are a lot of other important details most people miss.


The fact that Intel has an order of magnitude bigger R&D budget than AMD but still losing on many key metrics means that Intel is either doing a really bad job or AMD is full of geniuses. Nobody is saying that Intel does nothing, but they could have done so much more if they managed it better.


You have to take into account that Intel is having problems getting their next-gen architecture out the door.... who knows what it will bring good or bad... heck maybe even some of the delay is "oh crap Zen even beats our next-gen". Perahps Intels next-gen is just vaporware and they really have been sitting around raking in the $$$ b/c they could.

I'm really not trying to defend Intel (I understand that it may come across that way), just how we look at the situation. Indeed you could be spot on, just that the situation at Intel is likely multi-faceted.


ECC memory for desktops? PCI-E 4? more PCI-E lanes? There are lots of things Intel doesn't consider for their desktop CPU but AMD does. Security mitigations takes huge amount of CPU power.

Who cares about management vs engineers, we are customers and want fine reasonably priced products. Amd delivers on this front much more than Intel. Rest are details.


Yes, since Sandy Bridge (2011) each new generation of Intel CPUs brought a single digit percent increase in single-thread performance only.


Reminds me of the book by ex ceo only the paranoid survive


AMD chips suffer from Spectre too (which is the hard to fix issue). They didn't really take any fewer "shortcuts" than Intel. And they weren't "shortcuts".


That’s highly misleading, and the very careful way that you worded your reply tells me that you know this. Why the obfuscation?


I am a phd researcher studying linguistics and deception. You mentioned the op was being misleading and communicated an understanding of the op's subversive intentions. Could you help me understand how op was being deceptive and how you knew based on their choice of language?

If you don't want to comment publicly and are still interested in helping me you can reach me at: gfair @ uncc.edu


I'm not OP but it comes from the fact that Spectre is a very specific bug, but is colloquially used to refer to all speculative execution bugs found after as well. So its true that AMD is susceptible to Spectre; However, the mitigations have a much lower effect on performance. Additionally AMD is not susceptible to the multitude of additional speculative execution bugs found in the time since Spectre was first announced.


Spectre was used to refer to all speculative execution timing leaks from the very beginning. That's why they called it "Spectre" - because it's going to be hanging over us for a long time.


I wasn't obfuscating anything. Spectre is a fundamental flaw that affects Intel and AMD processors.

Speculative execution is not really a "shortcut". Intel did make one shortcut but that resulted in Meltdown, which is easy to fix (conceptually anyway - I'm sure it was a lot of work!)

Any idea why I've been downvoted so much for saying something that surely everyone here remembers (wasn't that long ago) and is easily verified on Wikipedia:

https://en.m.wikipedia.org/wiki/Spectre_(security_vulnerabil...

> As of 2018, almost every computer system is affected by Spectre, including desktops, laptops, and mobile devices. Specifically, Spectre has been shown to work on Intel, AMD, ARM-based, and IBM processors.


Given that others also asked, I'll give a rundown of the discussion:

The context of the discussion was cpuguy83's suggestion that compared to AMD, Intel has been "suffering from shortcuts leading to Spectre/Meltdown and the performance regressions due to patching those".

cpuguy83's comment was very concise, so let me elaborate that the distinction between Spectre and Meltdown is essential here. Both are security flaws in CPUs that were published at the same time. Spectre is an industry-wide problem that also hit AMD, but "Meltdown" is the result of an Intel-specific implementation choice that I think one might fairly describe as a "shortcut". (Even IshKebab later agrees: https://news.ycombinator.com/item?id=21342048) Meltdown was also more immediately dangerous, and the software workarounds that were necessary to mitigate it in existing CPUs cost a lot of performance.

Given this context, I'll quote the relevant item from cpuguy83 again and then IshKebab's reply:

> cpuguy83: Intel [... has been ...] suffering from shortcuts leading to Spectre/Meltdown and the performance regressions due to patching those.

> IshKebab: AMD chips suffer from Spectre too (which is the hard to fix issue). They didn't really take any fewer "shortcuts" than Intel. And they weren't "shortcuts".

Note how IshKebab carefully ignores the Meltdown part of cpuguy83's comment to be able to claim that Intel hasn't been doing any worse than AMD, and that there were no shortcuts. For Spectre in the stricter sense, that's technically true. It's not true in the context of the entire Spectre/Meltdown event, which was cpuguy83's argument.


Isn’t the big performance hit because Intel wasn’t checking permissions where they should have? How is that not a shortcut? They skip a step for speed.


That’s Meltdown, not Spectre.

Meltdown arguably does fit your description, but AFAIK the cost of checking permissions in the right place is almost zero, so it’s arguably better described as “we never realized it would be dangerous to not check permissions here” than “we skipped the check for performance’s sake”. (AMD processors were not vulnerable to Meltdown.)

Spectre, on the other hand, is sort of an inherent flaw of speculative execution (not related to permissions checks). Speculative execution itself is definitely a shortcut, but it’s a shortcut that’s crucial to the performance of all modern high-performance processors, with the result that nobody really knows how to deal with Spectre. Intel was apparently hit harder than AMD by side channel mitigations collectively, apparently because Intel was doing more aggressive speculation – but those mitigations are only partial. Both vendors’ processors are still vulnerable to Spectre attacks even with mitigations applied [1], and that will remain the case even on future processors, for the foreseeable future.

[1] https://arxiv.org/abs/1902.05178


Contrary to popular sentiment, Meltdown is also not an Intel-unique bug, it also affected POWER and ARM processors.

AMD essentially got lucky on this one - their neural-network based branch predictor is difficult for an attacker to train to follow specific code paths, which is a necessary component of Meltdown/Spectre style attacks. Pretty much every other processor that does speculation is affected.

The potential for cache timing to serve as a side-channel leak was not widely appreciated in the industry, although it was theoretically described as far back as the early 90s.


Absurd to see this response flagged so heavily when it is correct on every point...


Given that I have the highest voted reply and several people asked, I've written a rundown here: https://news.ycombinator.com/item?id=21345820

Yes, the response is technically correct. But...


Jim Keller [1] is widely credited for a large amount of Zen's success. Interestingly he is now at Intel.

[1] https://en.wikipedia.org/wiki/Jim_Keller_(engineer)


Mike Clark is the person who designed Ryzen and not Jim Keller. He is the guy who came up with the name Zen as well.

https://www.statesman.com/business/20160904/amid-challenges-...


There's a good talk by him about how Moore's law isn't dead: https://www.youtube.com/watch?v=oIG9ztQw2Gc.

Obviously, people have different ideas about what they mean when they say "Moore's law is/is not dead", but I appreciate Jim's point that even if the current innovation curves we've been riding are slowing down, there are other innovation curves that we can take advantage of, and that the combination these curves mean that there's still plenty of innovation and improvement to be made in processor design.

Of course, the question of whether the x86 architecture will be the foundation for that future improvement remains to be seen. ARM and RISC-V are both eating x86 from below, and the fact that both of the processor architectures allow for greater competition among processor implementations suggests to me that one (or both) may catch up with x86 at some point in the future.


Especially the part at https://youtu.be/oIG9ztQw2Gc?t=1917

That Sunny Cove architecture slide shows a massive increase in the number of execution units.

It may be that Intel's next architecture, combined with memory speed improvements, is competitive with AMD.

But AMD also isn't sitting still. AMD's contributions (from what I can tell) include:

* Core counts have blown past 6 cores / 12 threads

* CPU prices have just been cut by half

* PCIe lanes have blown way past 8

* ECC ram is being offered in consumer PCs


I've also heard rumors of things like 4-way SMT. Of course, that's still a ways off (if it ever materializes).


I wouldn't put much stock in that particular rumor unless AMD is going all in on servers to the detriment of consumer chips. Or maybe they're developing two cores but they've been working hard to re-use engineering effort until now. They don't have the number of silicon engineers that Intel has and not even Intel is developing separate server and consumer cores.


4 and 8 way SMT exists on IBM POWER8/9.


Please don't. Or rather, please just give me enough cores at a reasonable price so I can disable inevitably leaky SMT.


Jim Keller moving between companies is the real tick-tock cycle.


IIRC he went at every important cpu company back to back. Wasn't he part of the Apple soc team too ?


Yes. Designed Alpha for DEC, Designed Athlon/Athlon XP/Athlon 64 for AMD, designed network switches for SiByte/Broadcom, designed Apple's SOCs at PA Semi, designed Zen for AMD, designed AI accelerators at Tesla, now back at Intel.

Apart from SiByte/Broadcom pretty much a greatest hits list of disruptive architectures over the last 20 years.


They may be less known, but a huge percentage of the world's internet backbone runs on those Broadcom chips.


Aren't Broadcom network switching SOC the industry standard?


Yes, but those are fixed function designs like Trident or Tomahawk (yes, they name their switch chips after missiles).

When Broadcom uses the term SOC, they mean "Switch on a Chip" to differentiate their single chip designs from older multi chip architectures.

SiByte was MIPS-based, more like an NPU than an (Broadcom term) SOC.


Man what a career.


Yes, he is a very well off man now.

He moved to Intel during last transfer season


I think AMD has always had the talent, but made a bad bet with bulldozer's architecture of two cores sharing a front-end and FPU[1]. I also think desktop software (specifically games) moved faster than some anticipated when it came to taking advantage of multiple cores, causing them to be seem resource starved.

[1] At least that's how I think they work. One decode and FPU per pair of integer cores, right?


Bulldozer's failure was really one of market positioning. It would have been much better received if the CMT modules had been presented as a single, dual-threaded core, positioned against an Intel HTed core, rather than as two separate cores.


While I generally agree that CMT threads are not a real "core" (and has never been presented as such by any other company that has explored CMT), the real problem with Bulldozer was a frequency-optimized, deeply pipelined design. AMD literally repeated the exact mistake that Intel did with Netburst. It would never have been good no matter how you positioned it.


And Apple with their low clocked super-braniac cell phone chips has been achieving surprisingly good performance for their power envelope.


Even if Bulldozer had been marketed as quad-core it was still losing to Intel quad-cores and it had a larger die.


Nevermind the power consumption. I had an FX9590, which was a 4.7ghz 8-core at 220w.

220 watts!!!


And my R7 1700 pulls up to 230w at 4.1GHZ all core. Some things never change, except vastly more performance per watt.


I'm typing this on a bulldozer system, it works fine for compiling software which is what I mostly do.


AMD been executing well on multiple fronts. One big advantage is that AMD divorced themselves from their fab (now global foundries). So when a particular fab fails at a shrink they can just switch. In the 7nm case that means moving from global foundries to TSMC.

Additionally they were first to hypertransport (Intel followed with QPI) and knocked it out of the park with chiplets. Having two dies per package was pretty common, even way back at the 66 MHz pentium pro. But HT allows AMD quite a bit of flexibility. They can switch lanes between hypertransport (now updated and called Infinity Fabric) between pci-e and IF.

The this helps them on multiple fronts. It decouples the CPU and memory interface and pci-e standards. It raises their yield by using smaller chips. It also allows them to use different processes for different chips, so now the I/O die can use an older process.

One big impact is now only does the yield increase, but also the increased number of chips lets AMD amortize their R&D over more dies, and also customize for different produces without having to spend the extra R&D on numerous different dies. The low end desktop/laptops get 2 chips (1 cpu + 1 I/O). Higher end chips get 3 chips (2 cpu + 1 I/O). The high end servers get 9 (8 cpu and 1 I/O). So they can go from under 65 watts and under $200 to over 250 watts and over $5,000 all based on the same chiplets.

The AMD first generation Epyc did expose some weaknesses in OS/Applications that didn't like the high variations in latency to main memory and I/O. 4 chips inside a single socket had their own pair of memory channels and would have to use Hypertransport/IF to get to the other 6 memory channels. The design was reasonable, but many apps were NUMA aware and ran poorly.

In the second generation AMD moved all memory channels to the I/O chiplet and now the socket is a single NUMA domain and all chiplets see identical latency. The NUMA tweaks, 10-15% improvement in IPC, big improvement on the floating point side, and 1.5 x more cores (depending on the model) means that for many real world codes that AMD Epyc generation 2 chips (rome) are twice as fast at real world codes as the previous generation, which Intel is still trying to match. Meanwhile Intel is still trying to get a Xeon shrink they promised in 2017 working.

So generally ability to switch fabs and how well HT/IF works with chiplets caught Intel at a really bad time and for once AMD seems to be actually executing well and producing non-trivial volumes into multiple market segments.


https://en.wikipedia.org/wiki/Jim_Keller_(engineer)

Some say Jim Keller's work at AMD lines up nicely with AMD's resurgence.


His work also lines up nicely with the DEC Alpha, early Athlon/Opteron, and Apple A4/A5.

With no malice, I'm not aware if anything of significance came from his time at Tesla.

It will be interesting to see if AMD can keep things going past Zen 3 next year (which is likely to be the end of Keller's direct influence), and also what Intel can do with his guidance, which I would expect to release sometime in 2021-2023.


Tesla not long ago presented their impressive in-house neural network processing chip. Quite likely, that Keller was an important contributor to this. So there would be quite some significant outcome from his time at Tesla.


He built their in-house chip chip that's in their v3 self-driving computer, no?



It was Mike Clark who is the Lead man for Ryzen and Epic designs.


Big tech companies generally have tons of talent that they waste on having engineers basically stop each other doing technical things. It's a waste to have smart and expensive people work at cross purposes, but it frequently happens for internal political reasons. Sometimes a company in this situation gets the fear of God put into it and realizes that it actually needs to compete. At this point, all the red tape mysteriously disappears and all the process overhead that very serious people claimed was 'essential' disappears without the sky falling. Consequently, the company can now actually compete, and long-timers within the company who blocked progress for years get embarrassed, sidelined, or fired.

This dynamic is the reason competition is essential. Without external force compelling action, every organization trends toward stasis.


The internal Intel competitive analysis is not bad here: https://www.techpowerup.com/256842/intel-internal-memo-revea...

I don't know about µarch details but a bunch of bigger-picture things have contributed to AMD's run:

Spinning off their foundry ops led to AMD getting to use TSMC, who turned out to have a great process node at 7nm. Indirectly, it probably helps that other huge customers mean TSMC can amortize their process development costs across all of them.

The chiplet approach with a separate I/O die has various advantages:

- The Zen 2 chiplet is identical from the cheapest client CPU to the most expensive server CPU. Surely reduces complexity, and silicon that won't work in a server part might work in a client chip or whatever.

- Relatedly, binning 8-core chiplets is way more forgiving than binning huge monolithic CPUs like Intel's doing; with AMD lots of cores and high perf doesn't require a huge, uniformly near-perfect die, just enough chiplets that meet the spec that you can glue together.

- The I/O die is on a very mature, probably cheaper GloFo process, which may have made it less costly for AMD to offer interesting features on the I/O side (PCIe 4 and 128 lanes even for the cheapest server part, AES-128 in the memory controller, etc.).

Won't happen this gen, but I kind of expect Intel to eventually use some version of chiplets for their larger parts. They're talking about their advanced stacking/packaging tech, so not totally outlandish. Short-term they do have a two-die Cascade Lake mega-CPU planned, but I mean more broadly.

One cost of chiplets is in higher DRAM access latencies. AMD's done things to try and mitigate that, including just using lots of L3 cache. From benchmarks it seems to work.

The Intel analysis also mentions their shift to higher-margin parts as important.

I don't know how critical it is to the story but AMD also did some deals that may've given them resources to invest in their designs, like the GloFo spinoff, the deal to sell Epyc 1 clones in China, and the deals to produce Xbox and Playstation chips.

Finally, Intel would ordinarily have their own progress that would make AMD look comparatively worse. But 10nm stalled an incredible amount of time and Intel's post-Skylake core design, Sunny Cove, depended on it.


I may suppose that this is a result of several less fortunate iterations (bulldozer, anyone?) which led finally to a machine that checks all interesting boxes.

You can't build a new thing right from scratch, but you can market intermediate imperfect results to cover the losses somehow; I suppose that's what AMD did a few years ago.

Now they have finally hit the 10, and reap the benefits.


That's my sense of it too. A ~decade of tinkering with this modular architectural approach is effectively an evolutionary process, plus perhaps valuable input from a once-a-generation expert like Keller, results in leapfrogging the current state of the art.

The key, as you say, was surviving an unpredictable amount of time till the tinkering paid off.


Zen Is really a from a scratch design far from a refinement of bulldozer which was a dead end. Instead of experimenting with exotic architectural solutions they built a unexciting but extremely solid design (i.e. the improvements and wins were in the small details, not the overall architecture) and won. This parallels the years that Intel wasted on the dead ends of Netburts or Itanium.


Maybe I went too far back in time with Bulldozer, but Zen 2 is direct evolution of Zen. And to the extent that Bulldozer was an experimentation with a new multi-core architecture that more or less failed, I'm pretty sure AMD learned some valuable lessons that informed the Zen designs.


From what I read, AMD explicitly chose some details to be close to Intel's, e.g. cache size, size (and existence) of micro-op cache and so on. Software / compilers and Intel CPUs are optimized for each other, so it's best to be similar and a maybe a little better where it makes sense. While that strategly doesn't yield something vastly better than Intel has, AMD really needed something good so it wouldn't go bankrupt. Due to the process situation and a few nice energy-related innovations like clock stretching (allows running at voltages very close to instability), AMD actually did get more power-efficient high core count CPUs.


> results in leapfrogging the current state of the art.

Intel also helped by botching their move to 10 nm.



Not involved at all but I also wonder something similar. It’s like when you have nothing left to lose, there’s more freedom to innovate


AMD has always been strongly marketing driven going all the way back to its founding with Jerry Saunders brought into the team.


What does that even mean? In CPUs the fastest win in general. There isn't a way to market yourself out of a slow , hot or expensive chip. This isn't cola or sneakers.


I don't know why this factual statement should be voted down; even Jerry has said as much.


I've read in multiple places that AMD's resurgence is largely attributable to controversial IP deals and partnerships made with China, providing a struggling AMD with the resources necessary for Zen development.

e.g. https://www.wsj.com/articles/amd-to-license-chip-technology-...

Edit:

https://en.wikipedia.org/wiki/AMD%E2%80%93Chinese_joint_vent...


The first Zen chips taped out well before that WSJ article. AMD's deals with China have helped their financial situation some, but they're not what allowed Zen products to exist in the first place.


How big are those deals compared to the game consoles?


Thanks to amd’s recent developments, i basically cancelled buying a new mac pro. Max ram speed 2666 ghz, max per core perf 3.5 ghz? What’s this 2015?


Yeah, the Vega II GPU is pretty interesting, but the rest of the system is pretty lack luster. A 3rd gen Threadripper, 3600mhz ram, PCIE 4, 64-128 pcie lanes, and the option of either Nvidia or AMD GPUs seems just like a much better deal.


Given that Apple is tightly coupled with Intel for now, I would assume that they did not expect AMD to deliver great new CPUs and did not create the Mac Pro to compete with them in mind IMO.


Indeed, but why would anyone buy a Mac Pro given the massive gap? The OS aside, which for me personal is a massive incentive, I don't see a good reason to do so.


64c EPYC with 3.4TFlops - wow! That's GPU territory!


Linus Tech Tips was actually able to play Crysis using CPU rendering for the graphics: https://www.youtube.com/watch?v=HuLsrr79-Pw


Actually I'm pretty sure the Rome numbers are for double precision whereas most numbers quoted for GPUs are for single precision or less, making Rome's 3.4tf even more impressive.


Half of Titan V/V100 FP64 sounds unbelievable! Can't wait to get my hands on 64c Threadripper with TRX80!


Yea I can't wait for the announcements to happen in the coming weeks. So far the best rumours I've seen seem to have them topping out at 48c for the new TR line, but none of them have been particularly authoritative looking, just that there haven't been leaks of anything looking at 64c yet. I suspect that what might happen is that they'll announce with up to 48c but then a few months down the line they'll announce the 64c cpu. That would line up well with what looks like caused the delays with them not being able to get the quantity of the chiplets they need. They'd be able to frontload all the lower core count demand and then when they don't need nearly as many of them start making the larger ones.


I really hope 32c would work with my Zenith Extreme x399, and 64c with 8-channel TRX80/WRX80. So I could upgrade old TR with a 32c Zen 2-one and buy another 64c with 4TB ECC LRDIMM for some Machine Learning tasks. I am also fine if AMD decided to do 64c TR with Zen 3 only (4xSMT?). But based on Blur ad, I guess they are going to release 64c TR based on Zen 2 as well, just to completely obliterate Intel in HEDT, even if it costs $5000.


Yea the x399 compatibility will decide when my upgrade happens. The Zen+ TRs weren't enough for me to justify but the Zen2 ones seem like they've finally hit. If I need to do a motherboard and other upgrades that'll delay me doing so for a while (need to see how PCIe passthrough and other stuff settles out with the new chipsets) but in either cases I'm going to end up upgrading to this next gen one way or another.


How much does threadripper usually cost relative to epyc for same # of cores?


20-30% usually. You get faster cores but no LRDIMM (i.e. you are constrained effectively to 128GB ECC UDIMM, at best 256GB ECC UDIMM if you are lucky to get 32GB ECC UDIMM modules). EPYC has 4TB ECC LRDIMM ceiling, new TR on TRX80 might have the same ceiling as well. I am glad that AMD provides TR as they make way less $ on them than on EPYC, but it's a great marketing tool for them. I am running some TRs for Deep Learning rigs (PCIe slots are most important) on Linux, and they are great, Titan RTXs and Teslas run without any issue, but Zen 2 should give me much better performance on classical ML with Intel MKL/BLAS in PySpark/SciKit-Learn, so I can't wait to get some.


Naive question: Are you able to use MKL on an AMD chip without jumping through too many hoops?


Yes, just pip install ..., but it's 2x slower than on Intel for Zen/Zen+. Only Zen 2 is close to Intel.


Intel makes rather pessimistic assumptions about AMD and uses the model name to pick which code path to use and ignores the CPU flags for floating point, etc.

So if you want to compare performance fairly I'd use gcc (or at least a non-intel compiler) and one of the MKL like libraries (ACML, gotoblas, openblas, etc). AMD has been directly contributing to various projects to optimize for AMD CPUs. They used to have their own compiler (that went from SGI -> cray -> pathscale or similar), but since then I believe have been contributing to GCC, LLVM, and various libraries.


Yeah, still, Zen 2 is much faster in OpenBLAS and is faster in MKL than Zen/+ as well.


It's lumpy and depends on exactly when you ask.

If shopping I'd compare the highest end Ryzen + motherboard and the lowest end Epyc single socket chip and motherboard and try to guesstimate that price/performance for your workload.

Generally the Threadrippers seem like a much lower volume product and the motherboards are often quite expensive (for the current generation). Both Ryzen and Epyc enjoy significantly higher volumes.

Keep in mind that Threadripper has twice the memory bandwidth of Ryzen, but half the memory bandwidth of Epyc.


Why not just get the 7702p?


I guess TR will be a bit cheaper and higher clocked? And I don't really care that much about ECC errors for ML.


For reference (because I didn't know the comparison):

MI60 has 7.4 TFLOPs double precision at TDP=300W.


Side note on these CPUs.

We are one step/generation away from running BGP IPv4 routing in PC CPU L3 cache."256MB L3 cache." I believe one needs 512MB L3 cache to fit the current routing tables in cache enabling very fast route lookups on generic PC hardware.


Current global BGP IPV4 routes fits in 150 MB of RAM with all BGP attributes (in BIRD).

For forwarding, you do not need most attributes, but you may need better data structures for best-match lookups.


Interesting that they are using Redhat 7.6 with a 3.3 kernel. I'm guessing RedHat must backport a bunch of stuff in order to make an older Linux release work well on such newer hardware.


> RedHat must backport a bunch of stuff

Yes. Though you have also the option of running a newer kernel (through EPEL but maybe not necessarily)


It's actually ELRepo, which is community run, not EPEL, which is semi officially run by Red Hat. But yes, very impressive you can get latest stable and long term kernels on CentOS!


Redhat only added Linux 4 kernel in RHEL 8. As you stated they backport a ton of stuff for customers.


no guessing required. That's exactly RH's approach.


Lol let me guess what’s in the link. Some information about rome, some information comparing it to other Amd processors in some use cases and no sign of dell comparing it to intel.


I'm surprised that 4NPS gives even a mild benefit to bandwidth (maybe 13% or so), given that the central IO die handles all communication anyway. You'd think that with the architecture here, there'd be no real benefit to splitting the RAM channels.


I was surprised by this as well, but my results for a different memory bandwidth bound workload match what Dell is reporting: When testing a rome CPU at Netflix, I found I could serve 200+Gb/s of TLS encrypted video with 4nps, 192Gb/s with 2nps, and 184Gb/s with 1NPS.

My best guess was that I didn't fully understand the architecture of the IO die. Eg, there was some benefit to being local that I didn't understand fully.


Perhaps, as article sort of suggests, inside the IO chiplet there are four quadrants that each have two memory controllers associated with them, and the cross-quadrant bandwidth isn't sufficient for 75% non-local memory bandwidth.


I made a video yesterday with some details on this topic: https://youtu.be/ghFx_jyP1U8?t=390


The central I/O die does not have any cache on it. L3 caches are shared across just 4 cores in a CCX (not even CCD).

IIRC the next generation of Ryzen will introduce CCD or even I/O die caches. I think one of Intel's Broadwell chips had L4, so it's the same idea.


But there is concept of local and remote memory for NPS4 vs NPS1 right? NPS4 only accesses memory in 1 quadrant while NPS1 distributes it to all corners. The difference in latency + all other queuing delays in switches can add up to the difference.


If your stack is NUMA aware, and can manage that 4/8 partitions, enabling it would seem to reduce memory contention between chiplets; even if there isn't a latency penalty for going a bit farther to memory (which there likely is, but probably small)


The STREAM benchmark doesn't have any memory contention at all. Its basically a full tilt "write to memory / read from memory" kind of benchmark.


I would expect memory bus contention in a benchmark like that.


I knew AMD was on the right track back when i built a 4 CPU Opteron (64cores total) server for physics sim and one for number crunching (Comsol and genetic sequencing) and started doing the math on the core/thread count relationship to perf. It took them a few years but man did they deliver. I just wish I had invested in them.

Can't wait to build a beast as my Valve Index VR rig.


Those Opterons were terrible which is why AMD was practically giving them away at low prices. They only got on the right track five years later when they ditched that architecture.


Very true, they had a fair share of issues, but it showed the different direction they wanted to head which I agreed with. One of the biggest issues was I could only get the quad cpu boards from two manufacturers and they were very hit or miss mb's. (at it was mb's that exacerbated the issues of the opterons)


I don't understand articles that do benchmarks like infiniband or Ethernet, but set up a system where the nic is limiting the test. It would be useful to do the same thing with enough cards to saturate the memory bandwidth.


Basically no one runs that way. And in this article they're not benchmarking two systems against each other but different tuning parameters for one system. And if the tuning helps then the NIC wasn't the limit.


No one runs what way? Running with multiple NICs is very common. You can see in the graph that it peaks at line rate very early. Running a multiple NIC test would have resolved that.


Why do these multimillion dollar companies have such shitty diagrams?

*edit: shitty as in pixelated; not the content


You want the shitty diagrams because those come from actual engineers. The beautiful ones come from marketing.


This is a really poor excuse. Just because you're technical doesn't mean you're not responsible for presenting your content in a appealing and clean way.


Being an engineer doesn't mean you have zero design mindset. Engineers make beautiful diagrams too.


beauty is in the eye of the beholder. A diagram that conveys the information that took the minimum amount of time to generate and has the aesthetic that appeals to the engineer is beautiful. Maybe you have different taste?


My comment was concerning the quality of the image not the content. The pictures in the article are pixelated.


Point was, you're not likely to get a purely functional resource out of marketing, not whether engineers are capable of producing marketing-quality materials.


My comment was concerning the quality of the image not the content. The pictures in the article are pixelated.


They didn't spend all their money making diagrams, duh.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: