AMD Zen Microarchitecture: Dual Schedulers, Micro-Op Cache and Memory Hierarchy

geertj · on Aug 18, 2016

SEV (Secure Encrypted Virtualization, [1]) is a hugely interesting feature that will be available with Zen. Once it's mature and perfected, it would allow you to securely run a VM in the cloud that is protected against someone who controls the hypervisor. And you'd also be able to attest that indeed you're running in such a protected VM.

How do you protect against someone controlling the hypervisor? Read the paper. But the high level is to encrypt memory using keys that cannot leave the processor and are only available to a specific VM ASID (Address Space Identifier), assisted by a secure firmware similar to the Secure Enclave. Attestation uses an on-chip certificate signed by an AMD master key during fabrication.

There were some discussions on this on the linux-kernel mailing list [2]. As I understand it, the current generation of SEV is still somewhat leaky, but there's no fundamental reason why those leaks cannot be closed.

[1] http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/... [2] http://www.mail-archive.com/linux-doc@vger.kernel.org/msg025...

AnthonyMouse · on Aug 18, 2016

I still don't understand how this is ever supposed to work. Generally when someone finds a vulnerability, you take countermeasures or take the system offline until it can be patched (or apply the patch immediately).

With this, the party in control of the system is also in control of that, so every time a new vulnerability is found they can exploit it before patching it to retroactively get access to your data. Or never patch it at all and use the vulnerability itself to forge attestations that the vulnerability is patched.

sliverstorm · on Aug 18, 2016

It might not make your guest truly impervious, but it certainly raises the bar for your bad actor host.

Depending on how determined you imagine your bad actor host, you can probably never get around things like "zero day is discovered, host disconnects guest from internet preventing you from patching zero day, exploits guest".

Or are you talking about vulnerabilities in SEV itself?

AnthonyMouse · on Aug 19, 2016

Vulnerabilities in SEV itself.

In theory you can't actually do it at all. The key is inside the chip, the attacker has physical control over the chip, an attacker with enough resources is going to be able to extract the key. You have no hope against a state-level attacker or even many university research departments. The assumption seems to be that the attacker won't be that sophisticated.

The problem is there are also likely to be attacks which won't require significant resources once published. Researchers are always coming up with new ways to extract keys from "tamper proof hardware" using timing or power consumption or whatever else. Some future version of the hardware will protect against that specific attack but that's too late for all the secrets you trusted to the current version.

urza · on Aug 18, 2016

It reminds me of Permutation City, the scifi book by Greg Egan.. encrypted realities inside other realities.. :)

nitrogen · on Aug 19, 2016

See also Rainbows End by Vernor Vinge, where a secure encrypted sub-CPU plays a role.

spaceheeder · on Aug 18, 2016

In addition to cloud VMs, I wonder what applications this might have on local systems. Systems that boot from encrypted partitions and can't have the keys recovered by cold boot attacks? Secure graphics acceleration of different guests in a Qubes system? etc.

geertj · on Aug 18, 2016

I believe that the enabling technology for SEV (SME - Secure Memory Encryption) would indeed protect against cold boot attacks. The SME keys are not stored in memory themselves and therefore once the CPU reboots and the SME keys are erased, the memory contents are lost forever.

fulafel · on Aug 18, 2016

DRM and keeping the user from rooting their PC. This is like MS/Intel Trusted Computing on steroids.

snuxoll · on Aug 18, 2016

AMD has had a on-die management CPU (dubbed the PSP - Platform Security Processor) for years now, similar to the Intel ME, this is something completely different. Memory encryption doesn't prevent you from doing anything with your device, as far as userland is aware they still have access to the full address space with no knowledge encryption is happening - the only difference is memory IS encrypted so even if you manage to freeze memory to preserve the charge it will be useless as soon as it is removed from the host machine or it is rebooted.

comex · on Aug 19, 2016

> Memory encryption doesn't prevent you from doing anything with your device, as far as userland is aware they still have access to the full address space with no knowledge encryption is happening

It depends what you mean by "userland". The purpose of SEV is to allow a guest VM (using hardware virtualization) to run without trusting the host, including remote attestation. Traditionally hardware virtualization is used to run a full operating system which was installed at the impetus of the user, but there is no rule that it can only be used for that. If this feature is enabled on desktop parts, it's equally possible for black box DRM software running, say, on a non-virtualized Windows system, to include a small unikernel and automatically set it up to run in SEV mode. The whitepaper proposes that people running VMs in the cloud use remote attestation to upload disk encryption keys such that the VM can only decrypt the disk if it hasn't been tampered with, but the 'cloudiness' could just as well go the other way: cloud DRM servers sending decryption keys, for both video and perhaps the code itself, to enclaves on desktop PCs.

Using SEV alone for DRM would have a significant limitation compared to using the PSP: since all interaction with the outside world is still through the host, it would be hard to prevent the host from grabbing the raw decrypted video data as it leaves. But this still prevents recovering the original bitstream; allows 'perfect' obfuscation of many facets of how exactly the code works; and could probably be used in combination with the PSP in some manner. And in some DRM applications, the ability to grab the output may not matter. Imagine a video game where the bulk of the game was inside an enclave, preventing piracy but also all reverse engineering and modding.

Of course, a video service or video game that only runs on AMD CPUs won't get very popular... but conveniently, Intel is coming out with their own feature, SGX, that provides similar capabilities, though with a different design (it's designed more directly for the DRM use case). One might imagine that eventually most systems will have CPUs that support one or the other.

fulafel · on Aug 19, 2016

> since all interaction with the outside world is still through the host, it would be hard to prevent the host from grabbing the raw decrypted video data as it leaves

Wasn't this part already paved earlier by Microsoft, when Hollywood wanted to guarantee no unencrypted HD video leaves the PC? It might have weaknesses but the principle is already estabilished.

A secure crypto path from black box VMs to smart TVs also leaves the door open for all kinds of nasty scenarios involving TV pwnage. You also will have no way of decrypting the data that the VM exfiltrates from your PC.

kuschku · on Aug 18, 2016

The whole point is that you, as user, can not use your (legally guaranteed right in the EU) to debug software, reverse it, modify it, etc.

That is a big fucking disadvantage.

If I buy software (buying a license also counts, if you’re in the EU, or renting a license), I want to be able to use it like I’d use a table I buy: I can saw one leg off, repaint it, turn it into a chair. I want to be able to mod the game, skin it, theme it, do a total conversion.

This is preventing me from using my rights.

snuxoll · on Aug 18, 2016

I don't see how this prevents any of that. While you are inside the running computer memory appears to you as it always has been, if you need to do post-moterm debugging you are going to need a proper crash dump anyway. This is nothing but a security benefit, it will prevent keys for full-disk crypto from remaining in memory where they can be retrieved.

If you're talking about the PSP or ME then I agree, they are dangerous and the inability to gain any insight into what they do means they should be considered hostile entities (especially if they may have access to the internal CPU memory where the encryption keys are stored).

kuschku · on Aug 19, 2016

If the game is fully encrypted, and the DRM uses PSP or ME to keep the RAM of the game itself at all time encrypted so I can not read or debug it, it directly does prevent that.

snuxoll · on Aug 19, 2016

Yes, but that's a matter of using the PSP or ME, not SME which is what I was discussing. SVE brings some "interesting" things to the table since they could technically spin up a VM to run the game and keep the memory protected from the host, but they'd have to pass the GPU directly into the VM which would cause all sorts of other issues in a PC environment (why the *&!$ can't I tab out of this game!).

kuschku · on Aug 19, 2016

> which would cause all sorts of other issues in a PC environment (why the *&!$ can't I tab out of this game!)

As if that would work today – look at No Mans Sky, [Alt]-[Tab] already doesn’t work.

Wait 2 years, and we’ll see exactly this. Already today DRM is often implemented as kernel modules, and the OS – especially on windows – prevents debugging for normal users.

sangnoir · on Aug 18, 2016

> In addition to cloud VMs, I wonder what applications this might have on local systems

Here's one application for the red team: AV-resistant malware, rootkits and next generation APTs

lancemjoseph · on Aug 19, 2016

I thought homomorphic encryption was supposed to fill the niche of allowing one to securely run VMs in a cloud environment. I've not heard of serious progress on this front the last time I went looking. Will we always require hardware with a "secure enclave"-like device to safely store keys in a public cloud? Is it possible to implement this scheme purely in software or is some "trusted" hardware always necessary?

MertsA · on Aug 19, 2016

Forewarning, I am by no means an expert on anything that follows.

Homomorphic encryption would allow for "true security" where the party doing the computation doesn't ever have the encryption keys necessary to see what data they're operating on. This is something more akin to a TPM. The key that can read all of the data is in the possession of the party doing the computation, but it's stored in the CPU and the CPU will not give that key to anyone. Theoretically the key could be read off of the CPU but in practice this would require either a flaw, sidechannel, or a lot of time with an electron microscope.

For practical purposes, I believe that all implementations of secure cloud computing are going to be like this where the key is just secured physically. It's possible with homomorphic encryption to have someone securely do computations on data that they can't see all in software, but I just don't see any major breakthroughs happening that would make this fast enough to be practical.

Dylan16807 · on Aug 19, 2016

Homomorphic encryption is currently hilariously slow as I understand it, and even if you solve that it can't branch on data. All paths have to be evaluated and summed.

api · on Aug 18, 2016

> "it would allow you to securely run a VM in the cloud that is protected against someone who controls the hypervisor"

> Attestation uses an on-chip certificate signed by an AMD master key during fabrication.

This is absolutely fantastic for security in the cloud, but it is important to note that this will not protect against nation state level actors.

Rest assured that the USG will obtain the AMD master signing key with or without AMD's permission. Other nation states may do likewise. The rest will have to wait for a leak, and if that key is leaked this feature will become almost nonexistent.

tcoppi · on Aug 18, 2016

This is interesting. The most compelling use case IMO is protection against cold boot attacks rather than virtualization, at least until SEV has been proven empirically to do what they claim. Virtualization security is hard to get right in general and adding another layer of complexity probably won't help in the short term.

emn13 · on Aug 18, 2016

Even if it turns out to be leaky, it could still be a big deal: I think it's fair to say that the greatest cloud risk isn't actively and persistently hostile providers - mostly because that sounds like an almost hopeless task. A more realistic risk is that via a VM-breakout or other hack hostile code manages to run on the hypervisor or to at least indirectly influence the hypervisor and other VMs. And that kind of code may well be harder to exploit with even slightly leaky encryption. A hacked hypervisor may not be entirely under the control of the hacker; or breaking the encryption may cause side-effects (such as instability) that causes watchers to take note; or it may simply be quite complex and require case-by-case exploits that are generally impractical.

Even a less that perfect protection from the hyper-visor may still have some value.

I'd be more worried about the performance overhead, personally - I can't imagine using this if the impact is significant, and it seems like it almost has to be.

usefulcat · on Aug 18, 2016

> I'd be more worried about the performance overhead, personally - I can't imagine using this if the impact is significant, and it seems like it almost has to be.

Not necessarily. Bandwidth to main memory is already typically several times less than to L1 or L2 caches. If processor caches are not encrypted, then it seems conceivable that you could have some dedicated encryption/decryption silicon and it probably wouldn't even have to be as fast as it would need to be for general purpose use (like the Intel AES instructions).

Even if it does cost bandwidth or latency to main memory, if it's by a small enough amount it could still be a worthwhile tradeoff for some applications.

sliverstorm · on Aug 18, 2016

I think it's fair to say that the greatest cloud risk isn't actively and persistently hostile providers - mostly because that sounds like an almost hopeless task.

Definitely agree. I know we like our security systems impervious to anyone and everyone, be they script kiddie or the entire NSA. However if you hand your machine over to someone else to run, (in either VM guest or box form), I think you need to acknowledge that you are incontrovertibly accepting slight vulnerability to the colo/VM host.

derefr · on Aug 18, 2016

Sounds like something that would make the claims made by https://en.wikipedia.org/wiki/Denuvo copy-protection, actually plausible.

mtgx · on Aug 18, 2016

Is SME like Intel's SGX?

anonymousDan · on Aug 18, 2016

Yup. I'm not too up to date on the specifics, but one advantage I remember reading from the paper over SGX is that you can run programs with raised privileges, whereas in SGX enclaves can only have user privileges.

anonymousDan · on Aug 19, 2016

Do you know is there hardware available with SEV capabilities yet? Or is there even a roadmap/timeline for its release somewhere? Would be really interested to get my hands on it.

arcanus · on Aug 18, 2016

I've been recently reading reports from some of my banking friends (and actually chatted with some folks I know at AMD) because I'm curious about AMD's turnaround. Even just last year AMD looked to be in very dire straights, and are still operating at a loss.

However, they seem to have a strong technical pipeline and they have historically punched above their weight-class. Does it look like they are going to make it?

snovv_crash · on Aug 18, 2016

The issue is the long lag time between new ideas being implemented at a design level, and the many iterations of fabbing and tweaking that need to take place before it can actually be sold. The majority of people in tech are too used to something being written in the morning and deployed in the afternoon to understand what it is like having a 3-month lag in your testing cycle, and minimum twice that till release.

Just like Intel had the P4 hole that it had to drag its way out of, so now AMD has had Bulldozer. Notice how Intel has been quite conservative with each individual tick/tock, trying to keep their pipeline full. Doing crazy changes risks causing a pipeline stall which could last years. Each new architecture is risky, and AMD screwed up with Bulldozer. From early signs it looks like Zen is a winner, hopefully AMD can stick with it for a while.

tormeh · on Aug 18, 2016

I think bulldozer was just ahead of its time. I am fortunate to have one, and it has aged a lot better than the same-price alternatives from Intel due to the more multithreaded nature of e.g. games nowadays. That's not much comfort for AMD, since being ahead of your time is just another way to fail, but at least AMD had vision.

Mankind Divided's recommended specs are FX-8350 or i7 3770. The price difference between the two in their heyday was $100 in AMD's favor.

merb · on Aug 18, 2016

price was always in AMD's favor. problem is more and more the energy consumption. also the i-series of intel was really really solid and good, at least until broadwell (didn't seen too much skylake yet). that especially lost amd a lot of ground in the server space. and a lot of "high end" gamer favored the Intel i7 series aswell, even when they weren't as cheap as amd.

tormeh · on Aug 18, 2016

The i7 was a lot better for gaming than the 8350 was at the time. Hell, an i3 was better at the time. The 8350 was priced accordingly. There was no price advantage. What I'm saying is that AMD's design has aged a lot better.

merb · on Aug 19, 2016

AMD was cheaper (always) not better. You can be cheaper and worse and people will still buy the worse since it's cheaper even if price / quality on intel would've been better.

tormeh · on Aug 19, 2016

You obviously haven't actually checked the prices.

CarVac · on Aug 19, 2016

How exactly?

I see the evidence of that happening with AMD GPUs versus Nvidia, but versus Intel processors?

Can you explain?

AnthonyMouse · on Aug 19, 2016

When Bulldozer was released, most software only used one or two threads. Which meant that the 8-thread FX was slower than the 2-thread i3 because the extra threads did nothing and it had lower single-thread performance.

Now that newer software is using more threads, the old FX gets a big performance boost while the old i3 is only the same speed it ever was.

milcron · on Aug 18, 2016

How did Intel lose ground with Broadwell?

clevernickname · on Aug 19, 2016

IIRC it took much longer to deliver than expected, didn't improve much upon Haswell, sold in a limited set of SKUs, then was succeeded by Skylake a few months later.

phire · on Aug 19, 2016

AMD started work on Zen back in 2012 after it became clear the Bulldozer was a complete failure. It's taken them 4-5 years to get it ready for release. They still had Steamroller and Piledriver in the pipeline, so they released those anyway, despite the fact that line of processors is now dead.

It took about the same amount of time for Intel to release the Core and Core 2 architectures after realising they had made a huge mistake with the Pentium 4.

I've heard some people theorising that Intel might have worked their current architecture into a corner and they might have problems innovating out of it. I guess we will see when information about Zen's performance shows up.

yvdriess · on Aug 19, 2016

What is shown on the slides is not that innovating. If anything, they seem to have thrown the bulldozer innovations out of the window and moved the architecture closer to the sandybridge-era processors.

phire · on Aug 19, 2016

> If anything, they seem to have thrown the bulldozer innovations out of the window

That's basically the point of Zen, Bulldozer was an architectural dead-end that wasn't going anywhere.

Besides, it's not like Intel have massively innovated since Sandybridge. Ivy, Haswell, Broadwell and Skylake are little more than successive perfections of the Sandybridge architecture.

It's hard to tell from the slides, but it looks like Zen is a much wider architecture than Intel, with 10 execution ports (4 ALU, 2 AGU, 2 FP ADD, 2 FP MUL). Sandybridge had 6, Haswell and later have 8 execution ports. Bulldozer had 4 integer execution ports plus 2 float ports, which are shared between each pair of cores.

The most interesting thing about those slides is the layout of the blocks marked "Scheduler". Intel chips all have a single scheduler, Bulldozer had one float scheduler (shared) and one integer scheduler. But I'm counting 7 schedulers on the Zen slides, one float scheduler managing the 4 floating point execution ports and 6 integer schedulers, one for each execution port.

yvdriess · on Aug 19, 2016

> It's hard to tell from the slides, but it looks like Zen is a much wider architecture than Intel, with 10 execution ports

The text mentions it has to fuse the four FP ports to do a single 256-bit AVX per cycle. This is significantly less wide than Intel architectures (half/quarter). We can interpret the width thus as 4+2+1 ports, which is in the Haswell ballpark.

What is maybe more telling here is the 16-byte load/stores, Haswell is doing 32-byte at the same rate. It points to Zen abandoning FP bandwidth in both client and server. Perhaps they want to rely on GPGPU with the on-chip GPU to do compute workloads?

> The most interesting thing about those slides is the layout of the blocks marked "Scheduler". Intel chips all have a single scheduler, Bulldozer had one float scheduler (shared) and one integer scheduler. But I'm counting 7 schedulers on the Zen slides, one float scheduler managing the 4 floating point execution ports and 6 integer schedulers, one for each execution port.

Depends what they mean with Scheduler. If it means reservation stations for micro-ops, then that's already the case in other micro-architectures. If Scheduler means assigning micro-ops per port, than there can logically only be a single one.

phire · on Aug 19, 2016

> The text mentions it has to fuse the four FP ports to do a single 256-bit AVX per cycle. This is significantly less wide than Intel architectures (half/quarter). We can interpret the width thus as 4+2+1 ports, which is in the Haswell ballpark.

4+2+2, no need to combine all 4 ports, just the two multiplies or the two adds.

The text is speculation of the journalist. There It's possible that each port is actually 256 bits wide and fusing them is only needed for the 512bit AVX instructions that Intel don't even support yet.

Even if AMD are splitting the 256 bit fpus in half, that is still a huge win over average code, because 128bit SSE instructions are much more common than AVX instructions, and AMD can execute upto four of them per cycle.

Even Intel disable the upper half of their FPU most of the time to save power, AVX instructions get split into two 128bit micro-ops unless until a threshold is encountered and the upper half powers up.

> If Scheduler means assigning micro-ops per port, than there can logically only be a single one.

I assume that means one Re-order buffer per port. Bulldozer already had two Re-order buffer, one for float instructions and one for interger instructions, which proves multiple ROBs for different ports are possible. You just need to track dependencies across ROBs.

I'm guessing that tracking deprbdiencies across 7 schedulers is not much harder than tracking deprbdiencies across 2.

makomk · on Aug 19, 2016

With the current state of the tech press, that's probably a good idea as any difference from how Intel does things is spun as a negative. Take this paragraph for the Anandtech article, for example: "Unlike Bulldozer, where having a shared FP unit between two threads was an issue for floating point performance, Zen’s design is more akin to Intel’s in that each thread will appear as an independent core and there is not that resource limitation that BD had. With sufficient resources, SMT will allow the core instructions per clock to improve, however it will be interesting to see what workloads will benefit and which ones will not." Intel-style SMT actually has more contention for shared processor resources than Bulldozer did, not less, because far more is shared. Despite that, AMD's switch to it is being spun as a positive simply because it's closer to what Intel do.

greggyb · on Aug 19, 2016

I'd say that it's not just spin and trying to be closer to what Intel's architecture is.

The FP contention between the cores in a Bulldozer module makes all recent AMD chips perform objectively worse in most benchmarks than their peers from Intel.

Intel's architecture isn't a priori a goal to achieve. Intel's performance in real-world workloads is a good goal.

There are some heavily-threaded, integer-heavy workloads that Bulldozer and related parts are still incredibly competitive at, even compared to current-gen Intel parts. For the right workload, a Bulldozer-family processor can be a real screamer and they are priced incredibly aggressively. We should recognize, though, that the architecture is high performance only for these specific workloads.

Perhaps AMD should have pursued more innovative architectures. I am not saying that Intel's is perfect. But it is important to note that for current general purpose computing workloads, Intel's architecture is superior to Bulldozer.

koolba · on Aug 18, 2016

AMD seems to operate like a middle America mom and pop sandwich shop. Products are decent, lots of people rooting for them to succeed, they never seem to pull ahead, yet they also manage to live on (if only to continue their struggle).

PedroBatista · on Aug 18, 2016

Story of my life.

tcoppi · on Aug 18, 2016

I don't think they are in any danger of bankruptcy. Whether they return to profitability is still an open question but it seems likely that they should be by some point next year. How profitable, and what happens after that, is anyone's guess. There is a lot of room to grow in the graphics and CPU space, but they have tough competitors. I do think their stock is very fairly valued at ~$6 though, I think any potential gains from Zen over the next year or so are priced in to that. It definitely isn't a value buy anymore.

Edit: As the day goes on it seems like they're trading back over $7, so obviously the market disagrees with me :) They were trading around $10 before losing profitability, and in the $20-$40 range during their mid-2000s heyday, so maybe the market is expecting performance closer to that. I think that is optimistic still, but again who knows.

blueprint · on Aug 18, 2016

IMHO it's not actually possible for something to ever be 100% priced in, which has the surprising meaning that people need to also discuss how much of an event is priced in when they speculate whether something is or isn't priced in.

Obi_Juan_Kenobi · on Aug 18, 2016

People tend to ignore the inherent risk of time when considering something 'priced in'.

If an event is set to happen in an hour, you can be fairly certain it will happen, so it should be strongly 'priced in'. If it's happening months or years out, a lot could happen in the meantime, thus it's not 'fully' priced in. It's not a discrete event, but rather some sort of curve (perhaps sigmoid?) that depends on perceived risk.

vanattab · on Aug 18, 2016

Can you elaborate on your meaning?

blueprint · on Aug 19, 2016

Please see my reply to eximius.

eximius · on Aug 18, 2016

Why is it impossible?

blueprint · on Aug 19, 2016

It can be virtually guaranteed that not everyone who might buy or sell will know about and properly understand a given event.

eximius · on Aug 20, 2016

People can be correct for the wrong reasons, though. Wisdom of the crowds (and the Central Limit Theorem) would make me expect that it would be very close is most cases, with random noise being the most common distance from the truth.

vegabook · on Aug 18, 2016

What you have to keep in mind, is that AMD is valued at a quarter of Nvidia, and at 1/25th of Intel. Therefore the market has a huge scepticism discount built in. AMD doesn't have to hit the ball out of the park with Zen. It just needs to get to 85% area of Intel at same tdp, and the market will call it a winner. If it does 95%, call AMD stock 50% undervalued, minimum.

Obliquely, also remember that while Nvidia is winning on the "pure" GPU front with the awesome 1080 etc, the major problem for Nvidia is that its tech needs a host processor, and its ARM attempts are going nowhere (Nintendo NX notwithstanding). AMD does not face this problem. It's becoming clearer that pure-parallel is not always optimal. Hybrid GPU/CPU architectures have a lot of upside as we are seeing with the Xeon Phi use cases, which smackdown on Nvidia bigtime as soon as you mix even the slightest bit of dependency in your algorithms.

I am very bullish on AMD. I believe its stock has double potential, because the price is so catastrophically pessimistic already. And without even talking market valuations, I think we have had enough of monopoly-style price gouging on Xeon and Tesla.

barkingcat · on Aug 18, 2016

There is an extremely long lag time between research, development, product, and having that reflect in the financial numbers. I'd say it's on the scale of 6 or 8 years. So whatever dire straights or whatever recovery that you see now, was in the works since the early 2010's.

Whatever is happening in the company now is what you will be seeing in 2021 or 2023. Whether they will make it depends on how well the managing team handles that long lead time - for their leaders to give the engineers and product people as much time as possible to keep the company alive until each product comes into being.

phonon · on Aug 18, 2016

Well, Jim Keller returned to AMD August 2012, so I would say around then :-)

vanattab · on Aug 18, 2016

And then left AMD in Sep 2015 to join Tesla in Jan 2016 as Vice President of Autopilot Hardware Engineering.

daveguy · on Aug 18, 2016

After completing the Zen design. From leaked benchmarks it looks like he did an excellent job on the design (yet again) and they have been testing and ramping up for fabrication since he left. After they have milked this design with iterative improvements for 5-10 years maybe he will come back for the next generation.

Tuna-Fish · on Aug 18, 2016

He did not just complete Zen, he completed the high-level design of Zen and it's two successors.

CPUs are designed in a pipelined manner -- once the high-level team finishes up their work and passes it on to the low-level guys, they immediately start working on the next version, with the first version still years away from release. The total cycle from drawing board to store shelves is >5 years, and the skill sets required of high-level designers are very different than those required to turn it into silicon, so this just allows them all to have something to work on.

The long lag time is why chip companies can have disasters like P4 or Bulldozer, and they last so long. When some of the basic assumptions are wrong, the designers won't actually find out until after the design hits silicon, at which point the next two versions are already pretty much locked.

VT_Drew · on Aug 18, 2016

>and are still operating at a loss.

No their not, there Q2 financials had them in black.

mtgx · on Aug 18, 2016

> Does it look like they are going to make it

Probably, but I still think they would do far better with an owner like Qualcomm (granted Qualcomm would still have enough money in the bank to actually do something interesting with AMD after the acquisition).

tcoppi · on Aug 18, 2016

Any acquisition of AMD will be seriously complicated by potentially needing to re-negotiate the patent cross-licensing agreements with Intel. There are lots of companies that could have been great buyers for AMD in the past that have passed or not even looked into it because of this, I don't think it will change for the foreseeable future unfortunately.

arcanus · on Aug 18, 2016

> re-negotiate the patent cross-licensing agreements with Intel

I've always wondered why intel does not buy AMD... is it just because of an expected anti-trust lawsuit? With Nvidia, ARM, IBM, it does not seem that a monopoly in x86 would dominate ALL architectures, but perhaps it would not be viewed that way.

tcoppi · on Aug 18, 2016

Probably anti-trust, otherwise I think Intel would have bought them a few years ago just for the huge discount on their graphics division, which Intel has never seemed to be able to do well.

VanillaCafe · on Aug 18, 2016

This has been the same story of AMD for approximately 20 years.

snuxoll · on Aug 18, 2016

Not having a unified L3 cache is an interesting choice, I can see how it would significantly reduce the cost of the chip and considering many multi-threaded workloads are operating on separate chunks of data chances are it shouldn't incur a noticeable performance penalty (especially in virtualization workloads, I'm interested to see what their 32-core server chip ends up looking like).

gpderetta · on Aug 18, 2016

On the other hand an unified (inclusive) L3 cache helps with maintaining cache coherency, which need to be explicitly handled in a non-unified design.

I guess a big benefit of the separate caches is that if only half cores are in use, you can power half of it down, saving power and TDP.

sliverstorm · on Aug 18, 2016

A unified L3 is expensive in a number of ways. It is large, which means it is geographically remote, as well as slow (for caches, big == slow). This costs lots of access latency.

It also has a bandwidth problem. If 64 threads are vying for access, you either build it with few access ports and it gets choked, or you build it with many access ports which is costly in area, power, & speed.

Two separate peer caches automatically have twice the bandwidth of one similar double-size cache, for the price of NUMA & cache coherency challenges.

There is no one right answer here. Bandwidth is far more important and coherency much easier in a small L1; as you go down the hierarchy, bandwidth needs shrink and coherency is more expensive.

BlackMonday · on Aug 19, 2016

I remember a rumour about a HPC-APU from AMD which would combine 16 Zen cores with a Vega GPU and HBM (High Bandwith Memory) as a L4 cache. I know a L4 cache would be much slower than a L3 cache, but I'm curious, could HBM as a L4 cache be one of the reasons why they didn't use a unified L3 cache?

Disclaimer: I don't know sh*t about hardware design as you can probably guess from my posting. ;o)

snuxoll · on Aug 19, 2016

L4 cache is more used as embedded memory for the on-die GPU, last I checked Intel only included their eDRAM L4 cache on Iris Pro equipped model as any on-die GPU worth its salt is going to be bandwidth constrained even with a relatively low amount of GPU cores.

Same situation with Zen, if they're going to include even a Polaris it would be highly memory constrained if it had to hit system RAM all the time, so another fat chunk of memory on-die will be necessary to not starve it and keep latency down (as it stands the RX 480 can pump 256GB/s).

BlackMonday · on Aug 19, 2016

Yeah, the bandwith problem is already noticeable with AMD's current APU's even though they use small GPU cores compared to discrete grapics cards. Faster DDR3/4 memory brings noticeable FPS improvements. If they already had HBM they would run circles around Intel (which they probably already do unless the competitor is a Iris Pro with eDRAM).

Could the CPU also profit from the HBM memory? The bandwith is much better than with DDR4 main memory (even if it is 2 or 4 channels), and I would guess the latency as well because it would be on the same die?

snuxoll · on Aug 19, 2016

HBM won't be on-die, but it will be on-package - HBM relies on chip stacking to get the desired throughput in a small surface area, regardless the latency and throughput would stomp system DRAM something awful, and if it's a proper L4 cache then the CPU would benefit as well.

IBM does something similar (though not for graphics) in recent POWER CPU's with the Centaur memory controller(s), they are off-chip memory controllers with a bunch of eDRAM to act as a L4 cache (though the difference here is each system has multiple centaur controllers to handle different DIMM slots). They're able to burst to ~96GB/sec to system memory using this, having a good amount of on-package HBM would probably yield similar gains :)

tcoppi · on Aug 18, 2016

Cache coherency might not necessarily be an issue if they treat each Cores->L3 pair as a NUMA node. I don't think they are doing that since we probably would have already heard about it if they are, but AMD has done crazier things before, and they are pretty good at NUMA architectures.

gpderetta · on Aug 18, 2016

what do you mean? NUMA nodes are still coherent.

tcoppi · on Aug 18, 2016

You're right, I don't know what I was thinking.

tcoppi · on Aug 18, 2016

It could even help with low threadcount workloads since the L3 will presumably be able to be fewer cycles away from each core than a unified last level cache would be.

tcoppi · on Aug 18, 2016

The timeframe is slightly disappointing since I think a lot of people were expecting Q3/Q4 2016.

The architecture itself sounds pretty much like what everyone was expecting, a traditional fat and wide core. Their power management and foundry process will probably make the difference as to whether final performance is impressive or not, may also be the cause of the delay.

BlackMonday · on Aug 18, 2016

AMD stated for a while (march or so) that they maybe have small shipments in december, but the bulk of shipments will realy only start in Q1 2017.

Anyway, the first benchmark is promising, and I hope Zen can also keep up with Broadwell performance in other benchmarks/workloads, as well as in power efficency.

maerek · on Aug 18, 2016

As someone who is fascinated by articles like this one, but doesn't have a background in CE/EE, any recommendations for literature/classes I could take so that I can better understand the topics being discussed?

dr_zoidberg · on Aug 18, 2016

Read this link: http://www.lighterra.com/papers/modernmicroprocessors/

It's a good mix between high-level and highly-detailed.

paulmd · on Aug 18, 2016

I very much recommend Agner Fog's Microarchitecture. It's a rather ponderous tome, but it is quite simply the definitive resource on the actual design and performance of real-world x86 CPUs.

It does have a brief introduction on some of the basic execution fundamentals but then it jumps right in, so you will probably need some external introduction if you are not generally familiar with the topic.

http://www.agner.org/optimize/microarchitecture.pdf

tcoppi · on Aug 18, 2016

A computer architecture class. For books, [1] is what you will probably use in any decent computer architecture class, and [2] is a good read from a more general audience perspective, if a bit dated.

1. https://www.amazon.com/Computer-Architecture-Fifth-Quantitat...

2. https://www.amazon.com/Inside-Machine-Introduction-Microproc...

deaddodo · on Aug 18, 2016

I would argue that this is the best: https://www.amazon.com/dp/B00HCLUL5O/ref=dp-kindle-redirect?...

Albeit, slightly older and very technical.

GrumpyYoungMan · on Aug 19, 2016

For a good high-level introduction aimed at technical readers who aren't CompEs, I suggest Jon Stokes "Inside the Machine: An Illustrated Introduction to Microprocessors and Computer Architecture". It's a readable and reasonably in-depth explanation of how a modern processor works without going into the level of detail that a full-blown computer architecture textbook would.

Charles Petzold's "Code: The Hidden Language of Computer Hardware and Software" is also well regarded, but is aimed more at non-technical people.

mark-r · on Aug 18, 2016

I've been a big booster of AMD for a long time, but recently the performance/power is so much in Intel's favor that I've been forced to use Intel for my last couple of PCs. I hope Zen makes them competitive again.

Zardoz84 · on Aug 18, 2016

The most important thing for me. Zen cores have the AMD equivalent of Intel AMT ? (I don't remember the name).

If it haves it, I would avoid it like a pest, and get an FX-8370 or 8350 to replace my now aging FX-4100. The last thing that I like to have on my computer is a hidden uncontrollable CPU doing things that could affect to my privacy.

milcron · on Aug 19, 2016

Unfortunately, it seems impossible to acquire a modern x86/x64 chip without such hidden firmware. The last Intel CPUs without it are from 2008, and the last AMD CPUs without it are from 2013.

If you can tolerate using a different CPU architecture, Raptor Engineering's Talos Secure Workstation looks very intriguing. https://www.raptorengineering.com/TALOS/prerelease.php

rasz_pl · on Aug 19, 2016

>last Intel CPUs without it are from 2008

and those have cpu wide ring0 escalation bug https://www.blackhat.com/docs/us-15/materials/us-15-Domas-Th...

imtringued · on Aug 19, 2016

Why is the power8 architecture not more widely used? The performance is competitive with the intel's Xeon series and for memory or IO bound workloads a power8 CPU is often superior.

SXX · on Aug 18, 2016

What you talking about is Platform Security Processor (PSP) and according to libreboot website it's built in all their CPUs released since 2013.

snuxoll · on Aug 19, 2016

Libreboot is inaccurate on this one, as far as I can tell this is only on the Puma chips right now, but it's likely it will make it's way to other chips as they come out. Newer Bulldozer-derived (Family 15h) cores made today such as the current FX lineup do not have the PSP.

Zardoz84 · on Aug 19, 2016

So, I'm going to grab an FX8350

dmm · on Aug 19, 2016

Get the 8320e at Microcenter for $90 and over clock it. It'll be faster and cheaper.

snuxoll · on Aug 18, 2016

AMD has the Platform Security Processor in the Puma chips (Mullins and Beema) which serves the same purpose as the Intel ME (Management Engine). I would not be surprised if this made its way into Zen.

bitL · on Aug 19, 2016

I just wish AMD made drivers for Win 7 as well - then I could switch from 4-core 4790k/32GB to 8-core ZEN/64GB ECC and keep using all the Adobe video editing stuff.

clevernickname · on Aug 19, 2016

Citation needed. There would have been a huge stink if AMD had stopped supporting Windows 7 already.

bitL · on Aug 19, 2016

http://techreport.com/news/29611/win10-will-be-the-only-wind...

clevernickname · on Aug 20, 2016

What is the practical implication of this? In China Windows XP on modern hardware is still by far the dominant OS, so I would be very surprised if any of the major manufacturers would stop releasing drivers for that, let alone 7.

gruez · on Aug 19, 2016

The article states that it only applies to new CPUs/APUs. I also vaguely remember that Intel's doing the same as well

bitL · on Aug 20, 2016

> only applies to new CPUs/APUs

i.e. Zen

akerro · on Aug 18, 2016

Should I wait for Zen or buy i7 now?

toast0 · on Aug 18, 2016

i7 doesn't tell you anything useful for comparison with Zen. Ask more about if you should wait for Zen or buy Skylake. Or if you should wait for Zen or Kaby Lake or Cannonlake.

i7 just says you're going to get the top of the performance (and price) list for a desktop/mobile processor.

RussianCow · on Aug 18, 2016

Nobody knows if Zen will actually live up to expectations yet, so I would wait until reviews and benchmarks come out, and base your decision on that. I made the mistake of ordering a Bulldozer CPU as soon as they came out.

mtgx · on Aug 18, 2016

Wait. It should be a good chip performance wise, and great value for the money, too. Also, the more we put pressure on Intel, the better it is for everyone in the long term.

daveguy · on Aug 18, 2016

This. Waiting will mean you have a choice of architectures and it is looking like this architecture may pull the Intel prices down too. Used to be Intel's top of the line desktop CPUs were all under $1k because of the amd competition. You can make a better price/power/performance decision with the new competition. At least I hope it is new competition. Unless you need more compute immediately I would wait. Intel's improvements price and performance wise have been lackluster the last couple of years since they are well ahead of AMD. Competition can only improve things.

theandrewbailey · on Aug 18, 2016

Depends on what you have and how urgently you need the greater speed.

My main system is still running a i7-2600 from over 5 years ago. That GTX 680 I have in there is still plenty fast. The upgrade question is: how pretty do I want Star Citizen to be?

noir_lord · on Aug 18, 2016

I'm still rocking an i5-2500 from the same generation.

It is still completely fine for everything I ask of it even against the much newer machine at work, with the upgrade to an SSD a while back it basically felt like a new machine.

gsnedders · on Aug 18, 2016

I upgraded from a i5-2400 to a second-hand i7-3770K around a year ago given they're socket compatible: the difference between them is larger than the difference between the i7-3770K and the i7-6700K, from what I saw looking at benchmarks when I did this. There's certainly workloads where there's a noticeable different in performance.

noir_lord · on Aug 18, 2016

I have an i5-3570K in the work desktop, I really don't notice that much difference though the i7-3770K was/is a beast in comparison, for my workloads I just don't see much benefit in the i7's, I'll likely get another i5, all the ones I've had have been excellent on the $/perf scale going back 5 years or so.

akerro · on Aug 18, 2016

I would like to build a PC that would compile stuff quickly... Android, Java, Spring/Hibernate, some Rust and JS recently. Currently takes a few minutes to build any of my projects on a laptop I have. I think more physical cores will boost it more?

mutagen · on Aug 18, 2016

First question, does that laptop have a SSD? Most developer machines do these days, that's the single biggest improvement to build times you can make. Then look at CPU, IO, memory utilization during builds to see where improvements can be had.

akerro · on Aug 18, 2016

Yes, it comes with top fastest SSD and fast RAM.

theandrewbailey · on Aug 18, 2016

Is that an NVME SSD or SATA SSD? NVME is about 4 times faster than SATA in terms of pure transfer bandwidth, though I'm not sure about random access speeds.

akerro · on Aug 18, 2016

Well. it was the fastest when I bought it 3 years ago, it's SATA. I'm not buying another one, it would be wast of money, better get proper CPU and mobo.

bitL · on Aug 19, 2016

NVMe makes sense only if you are either pushing bandwidth limits (e.g. processing large RAW 14-bit 4K/8K video on a scratch drive) or have hundreds of threads with concurrent I/O operations. In real world, you are barely going to notice any difference between SATA2 and SATA3 SSDs, not to mention M.2 PCIe ones.

imtringued · on Aug 19, 2016

With SATA2 you might as well just use a HDD. Of course I'm assuming you actually optimized how your data is layout on the storage medium to take advantage of the sequential read speed of your HDD, SSD or even RAM. Even my HDD usually reaches 160MB/s so a SATA2 connected SSD is only twice as fast at four to six times the cost. Yes SSDs are better at IOPS but an application that heavily depends on IOPS is often a result of poor design.

http://media.bestofmicro.com/Q/0/378072/original/AS-SSD_Sequ...

bitL · on Aug 19, 2016

For real-world reaching speeds of 250MB/s while having significantly reduced latency comparing to HDD as well as read/write IOPS >60k is what gives you the snappy feel of SSDs. If you are unlucky and have e.g. Sandisk G25 SSDs with write IOPS in 10k range, you'd barely notice any difference to a fast HDD. But if you even get an SSD in a USB stick like Sandisk Extreme 64GB with reasonable IOPS, you can install OS X/Linux/Windows there and it would give you that snappy feel. Bandwidth beyond certain threshold is not what is giving you the snappiness. If you take NVMe that reaches 3000MB/s comparing to SATA3 with 550MB/s, booting time reduces by like 1s - would you really want to pay 2x price when that is the only benefit you'd notice? Or starting your app would take 0.63s instead of 0.67s?

Seriously, invest into NVMe if you are video producer (I can't imagine processing my 4K movies on SATA SSD or HDD, even 24fps playback on SATA SSD can't happen in RAW format as it needs >1MB/s) or you do some heavy I/O server stuff. If you don't do any of the above, invest your $ into capacity instead, i.e. given 512GB NVMe vs 1TB M.2 SATA I'd go with 1TB one.

majewsky · on Aug 19, 2016

> [4K movie] 24fps playback [...] in RAW format [...] needs > 1 MB/s

Actually, much more than that. 3840 * 2160 * 3 bytes * 24 Hz = 569.53125 mebibytes / second

bitL · on Aug 19, 2016

4K CINE 14-bit 24fps RAW - 4096 * 2160 * 5.25 bytes * 24Hz = 1063.125MiB ;-) I should have written >1GB/s in the previous comment :-D

rkalla · on Aug 18, 2016

Generally speaking... "productivity" and media styles of workloads do better with more cores/high memory machines. Gaming typically does better with higher clock speeds (which means less cores in every case I've seen). If you want that in a laptop form factor though, I am not aware of Intel sticking massive-cored chips in the mobile form factor, so your choices will be limited to whatever the fastest i7 is you can pickup... unless you were thinking of building a desktop?

UPDATE - Oh mutagen's point about SSD is absolutely spot on... the faster the storage the better _first_... then worry about the rest of that stuff I mentioned.

arca_vorago · on Aug 18, 2016

I decided to wait for Zen months ago. I sure hope its worth it. My plan is zen beast for vr gamedev/play (on linux, not windows) Cant VR without the computer to support it, and 90fps keeps the VR discomfort away.

daveguy · on Aug 18, 2016

Wait for Zen, but evaluate benchmarks. You can probably still get a cheaper Intel then even if amd doesn't live up to the hype (because at least amd will be closer).

SixSigma · on Aug 18, 2016

Always wait.

Always buy now.

akerro · on Aug 18, 2016

t-thanks...

bitL · on Aug 18, 2016

Am I the only person whom the Zen logo makes cringe? They shouldn't copycat Intel Inside logo there. Really bad taste...

evanriley · on Aug 18, 2016

It's not copying the Intel Inside log, although I'll agree they look _similar_ its most likely based off of the Ensō[0], which is part of _Zen_ Buddhism.

[0] https://en.wikipedia.org/wiki/Ens%C5%8D