AMD has done extremely well with multi-chip(let) modules. Zen cores & zen clusters (on Core Chiplet Die, CCD) are wonderfully small, and a huge amount of the regular stuff cores do is relegated to the IO Die (CCX), which is not as cutting edge.
But wow there's a bunch of power burned on interconnect between CCDs and CCX. And now AMD's new southbridge, Promontory 21, made by Asmedia, is another pretty significant power hog, and the flagship X670 tier is powered by two of these.
There's absolutely a challenge to bring power down. I'm incredibly super impressed by AMD's showing, & they've done very well. But they've been making trade-offs that have pretty large net impacts, especially if we measure at idle power.
Whoof, oops, thanks. I'd been using CCD and IOD as terms until this post, but had a "omgosh, I've been doing it wrong" panic & changed into what we have here. My mistake. Thank you for correcting us back!!
Manufacturing costs force them to go down the chiplet path. Its actually impressive they can remain competitive at all given TSMCs margin of 67%. [0]
If Intel Foundry manages to keep up with TSMC, thats a lot of pricing power advantage over AMD. Or they could make lower-power CPUs that would be uneconomical for AMD.
Intel can't figure out how to make the next generation of chips. They used to be able to innovate on a steady regular basis, but they're still trying to get 7nm right and they've been working on that for many years while TSMC is on to 3nm. Makes you wonder what happened to that company that they got so far behind.
Why is EBITDA margin better in this context than net margin? Wouldn’t manufacturing facilities have a ton of depreciation and amortization that should be incorporated into the costs?
Does TSMC even have non-bulk customers at 7nm/5nm/4nm? I would have thought the mask costs are so high that it isn't economical except for the biggest companies.
Cerebras has probably bought less than 100 7nm wafers, total. Though they may have gotten many more sales than the last I checked. Tesla Dojo is probably about the same. I've seen loads of random chips like that on 7nm. I guess it depends what you consider "bulk".
I'm not up-to-speed on modern chipsets, but WTF is the Southbridge doing that it needs that much power? Is Thunderbolt going through it or something? I think of the Southbridge as a mostly ignorable part of the chipset (up until something goes horribly wrong).
Lots of fast PCIe lanes eat a lot of energy. A server motherboard contains tons of more PCI devices when compared to a consumer desktop systems, and they are not the cards, but the small units enabling the advanced features in servers, which are embedded on the motherboards themselves.
Thunderbolt is just a PCIe encapsulator of some sort, which can also do plethora of other things.
The one contradiction I have to point out here is that server motherboards dont need big southbridges: the cores themselves have gobs of PCIe. 1 and 2P AMD Epyc server cores have 128 lanes of PCIe.
I wish modern chips did a better job of breaking down where power went. It'd be so interesting to know how much power is going to usb controllers, how much is going to PCIe. I'd also hope that they could do things like shut down parts of the chip, if there's no USB or PCIe devices plugged in. But these chips seem to have a pretty high starting place of power consumption. Although maybe it's in part because the first example were flagship motherboards with a whole bunch of extra things peppered across the board - fancy NIC chips, supplementary thunderbolt controllers, sound cards, wifi - so maybe there was just an unusual lot of extra stuff going on. But it has been shocking seeing idle power raise so much on the modern platforms. It feels like there's a lot of room for improvement in power-down.
There's very little difference in the number of PCIe and M2 slots between Intel and AMD on their consumer platforms. The only difference really between AM5 and LGA 1700 motherboards is a lot more AMD boards have one M2 PCIe 5.0 slot, while only the very top Intel boards have this feature.
AMD has 24 PCIe 5.0 lanes directly from the CPU available for user, while Intel has 16 5.0 + 4 4.0. Cheaper motherboards might not expose all of those, or downgrade some to 4.0 to save on on-board components. In addition, both have more on the chipset, which is connected to (additional, reserved) PCIe 4.0 lanes. The best AMD chipsets have 12 4.0 and 8 3.0 lanes, while the best Intel ones have 20 4.0 and 8 3.0 lanes. An important point is that the connection between the chipset and CPU is twice as wide on intel (8x vs 4x).
So overall, AMD has more and faster IO available directly from the CPU, but less lanes from the chipset, and with a weaker connection to the chipset. If PCIe 5.0 drives become available and the transfer speed to storage is important, I'd say AMD is better, otherwise I'd say Intel has more IO.
Where "top" means "most expensive". The Z790 board I recently purchased for around $300 was pretty barebones and lackluster (no TB, meager IO from ports and headers, wattage constrained VRM relative to 13th gen TDP, etc), but it was the least costly way to work with an Intel proprietary technology.
It'll last another four or five years, but it was my first Intel build since the slocket days, and likely my last.
Bulldozer [0] was a major flop. Sounded too good to be true: 2 cores sharing the same FPU because integrated GPUs should be executing these instructions much faster anyway. Maybe it was just ahead of its time and they simply couldn't deliver
But it failed hard which coincided with Intel releasing a major winner with their last planar architecture - sandy bridge.
As a result, AMD spent years circling the drain and their stock dipped below 2 dollars. Some people made good money buying around that time.
Energy efficiency and improved packaging are things I can readily agree with. The last thing though - “hybrid algorithms leveraging AI efficiency” sounds an awful lot like a buzzword sales pitch.
This article reminds of this other one [1] posted about 1 month ago.
It’s an interview with some guys that just got done building an exascale supercomputer, in which it was originally estimated to need 1000 megawatts but ultimately only needs 60. The reporter asks about zettascale and the power requirements; they wave it off and say that the big question about whether it will even be possible in the next 10 years is getting the chip lithography small enough so that you can physically build a working zettascale supercomputer.
Re. The “hybrid algorithms” bit:
I was at this talk. The example she gave was a physics sim like CFD, iterating between a fast/approximate ML-based algorithm and slow/accurate classical physics algorithm, with the output of each feeding in as the starting point of the next round. But this was just an example, clearly there are lots of area you could apply a similar approach.
This has been something I've been incredibly pleased with on the apple silicon SOC's. Albeit slowly, being able to load large datasets or blender scenes on a portable, efficient laptop and still being able to use the GPU is a nice touch.
Of course performance wise it doesn't touch the $1k+ graphics cards with crazy amounts of ram, but for students and if I need to do something quick on the go, its a really useful tool.
I expect Xilinx's AI engines to never be integrated into anything AMD. Because Xilinx AI engines are VLIW - SIMD machines running their own instruction set.
----------
AMD is doing the right thing with Xilinx tech: they're integrating it into ROCm, so that Xilinx AI engines / FPGAs can interact with CPUs and GPUs. But there's no reason why these "internal core bits" should be shared between CPU, GPU, and FPGA.
Afaik, having worked in HPC, one area where this can be employed is error bias correction of CFD models. E.g. weather models have various biases that need to be corrected for - so far this is just done with some relatively simple statistics afaik.
AMD already uses a Perceptron for their branch prediction. I would say they're talking about support for ML speed ups in hardware but maybe the plans also include a more complex complete neural net in hardware for data prediction.
- Half "hard maths" algorithms. i.e. cominbatorics, geometry, etc.
- Half "fuzzy maths" algorithms. i.e. heuristics, approximation, machine learning.
The idea being to solve the parts that can be easily solved by hard maths with those hard maths so that you can reduce the problem space for when you apply the fuzzy maths to solve the rest of the problem.
In other words, it's taking the problem, breaking out discrete pieces to solve with well established hard maths, using heuristics & numerical solutions to tackle the remaining known problems without "easy" analytical solutions, then using ML to fill in all the gaps and glue the whole thing together.
The article gives a concrete example in the same paragraph: "For example, AI algorithms could get close to a solution quickly and efficiently, and then the gap between the AI answer and the true solution can be filled by high-precision computing."
Interestingly the example is backwards (statistical reasoning first, hard reasoning second) compared to traditional usage of "hybrid" in AI and control contexts.
From the article it looks like half AI guessing the solution, half some static algorithm fixing the result to be better. Not sure how it's supposed to really work.
one variant of this is sciml approaches where you use an ODE solver wrapped around a NN. the ODE solver guarantees you get the right conversation laws which NNs don't do well and the NN is more accurate than the hand written model since it doesn't ignore the higher order effects.
> The last thing though - “hybrid algorithms leveraging AI efficiency” sounds an awful lot like a buzzword sales pitch.
Oddly, to me this just sounds like efficiency gains by potentially introducing massive security holes i.e. the vector of the Meltdown/Spectres of the future. It also seems like they're trying to sell AI as some sort of secular qubit that they'll be error-correcting.
Not a cpu designer, but I remember AMD using AI for branch prediction since long ago. Hopefully they still mention AI in that sense and not only branding.
From the perspective of a consumer, my MacBook Pro is basically the perfect laptop, at least in theory.
I love the battery life and performance of the hardware, not to mention the unrivaled build quality of the MacBook (screen, trackpad, keyboard).
In practice, however, MacOS limits the capabilities of the hardware such that I cannot daily drive my MacBook Pro as a work or personal computer (poor containerization support, an annoying development toolchain, and no _real_ support for video games).
When Asahi Linux is mainlined, stable and features full hardware acceleration - the MacBook running Linux will likely be the best laptop money could buy. Until then, please AMD, Intel, release some mobile hardware that's at least as good. It sucks so bad seeing what is possible with today's technology but that being exclusive to a company unsuccessfully determined to ring fence you into their API ecosystem.
My biggest complaint is that I have a "16-core Neural Engine" in my MacBook, but nothing that can be run on it. Sure, internal macOS tools such as the fingerprint reader or even webcam might use it, but not a single ML project on github makes use of it. I can be lucky if ML projects don't depend on CUDA to run.
Honestly, I think I'll sell my MacBook Pro and just buy a new one with less cores.
Bought it expecting to be able to abuse it towards ML, but it's barely even supported.
From my experience developing C++ on it, as long as you don't use anything exotic, everything is portable (i.e. just recompile).
On the other hand, for the missing apps and other stuff, I'm running a Linux VM via VMWare Fusion Pro. It works efficiently, and interoperability is good. Just add another internal network card to the VM, and keep an always on SSH/SFTP connection. Then everything works seamlessly.
Never had any problems for 7 or so years when developing that application and writing my Ph.D. in the process.
From my experience developing C++ on it, 99+% of everything is portable, but the ~1% that isn't, causes a disproportionate amount of annoyances and extra work. We still have a program that we have to run in an x86 docker container (with all the problems that brings), just to get it to work on our M1 Macs.
The being said, the M1 Pro is a more than good enough piece of hardware that I'm willing to put up with this.
From my experience, the newest VMWare versions and Linux kernels are very good at conserving power when the VM is mostly idle.
I'm doing on that my 2014MBP, and it didn't cut the endurance to half, but need to re-test it for exact numbers. However, it doesn't appear in "power hungry applications" list unless you continuously compile something or run some service at 100% CPU load. Also, you can limit the resources it can use if you want to further limit it down.
I tried a few apple watches, and almost got used to the routine of daily charging, but other annoyances made me try others. Eventually I settled for a Garmin smartwatch and the difference that having to charge once every 10-15 days is huge (you give up a few features, but surprisingly little, at least for my use case). I hope this new emphasis on energy efficiency enables this on laptops (and cellphones)
Works great until you're the author of an application written in Rust and want to distribute MacOS binaries which are automatically generated in CI/CD.
The only (legal) way to compile a Rust binary that targets MacOS is on a Mac. So your CI needs a special case for MacOS running on a MacOS agent. Annoyingly, cross compiling CPUs architectures doesn't work so you need to an Intel and arm64 Mac CI agent - the latter being unavailable via Github actions.
To make things even more bizarre, Apple doesn't offer a server variant of MacOS or Mac hardware, which seems to indicate they expect you to manually compile binaries on your local machine for applications you intend to distribute.
Asahi Linux is introducing support for some brand new Apple Silicon features faster than macOS.. M1 has a virtual GIC interrupt controller for enhanced virtualization performance. Linux supports it, macOS does not.. M2 introduced Nested Virtualization support. The patches for supporting that on Linux are in review; macOS still doesn't support it.
Apple appears to have a one of a kind special license for ARM (due to being a founder of the company) so they can pick and choose otherwise "required" extensions to support and add their own extensions as well. You can't directly compare an Apple design to a specific ARM version because of this.
Kernel integration and a virtualized filesystem that isn't bottlenecked by APFS. Docker is excruciating on Darwin systems.
> What are you talking about exactly?
Apple makes hundreds of weird concessions that are non-standard on UNIX-like machines. Booting up a machine with zsh and pico as your defaults is not a normal experience for most sysadmins, nevermind the laundry-list of MacOS quirks that make it a pain to maintain. For personal use, I don't think I'd ever go back to fixing Mac-exclusive issues in my free time.
> no _real_ support for video games
Besides Resident Evil and No Man's Sky (this generation's Tomb Raider and Monument Valley), nobody writes video games for Metal unless Apple pays them to.
For a while, MacOS had a working DirectX translation stack for Windows games, too. Not since Catalina though.
> Kernel integration and a virtualized filesystem that isn't bottlenecked by APFS. Docker is excruciating on Darwin systems.
Docker is great as long as you don't use bind mounts. I use it daily for development in dev containers.
> Besides Resident Evil and No Man's Sky (this generation's Tomb Raider and Monument Valley), nobody writes video games for Metal unless Apple pays them to.
There are plenty of great games on macOS. Factorio, Civilization, League of Legends, Minecraft. But you're right that there aren't too many AAA games.
> the unrivaled build quality of the MacBook (screen, trackpad, keyboard)
Typing this on a ThinkPad X13, after years of MBPs, I beg to differ. The screen on the ThinkPad is better, even at a slightly lower resolution of 1920 x 1200. It's IPS like the Mac of course, but also Anti-Reflective (matte), which Apple hasn't offered since 2008?
The ThinkPad has Left, Right, and Scroll mouse buttons, as well as a TrackPoint stick. Not so on the Mac.
Finally, the keyboard. You're going to laud Apple for their quality keyboards, really? The ThinkPad has a nicer keyboard feel (subjective, I know), has actual Home, End, PageUp, PageDn, and Delete keys, two Ctrl keys, and praise Jesus, gaps between the function keys, so you can use them confidently w/o looking.
Apple has released Rosetta for linux. I believe the use case is running x86 binaries inside of an Arm virtual machine on Apple Silicon, instead of emulating an entire x86 CPU and running the entire OS as x86. Apparently it works pretty good and some people have even used it on non-Apple arm chips. Anyways, I wonder if it could be used in combination with something like Proton to emulate x86 on linux.
> poor containerization support, an annoying development toolchain
Just run an always-on (headless!) Linux VM in the background, and don't use the host macOS for anything besides desktop apps (Slack, VSCode, Browser, mpv, terminal emulator but always ssh into the VM, etc). The same way you deal with a Windows machine.
This works good enough unless you work on hypervisors or other bare metal only tech. But hey, that's currently non-existing on M1 Macbooks (or undocumented and locked to Apple ecosystem) anyways.
> and no _real_ support for video games
That's the real deal breaker if you are into gaming.
The Windows machine (X86) has WSL2 though, which seamlessly integrates with VSCode. Add Windows Terminal and for me it made Windows the better dev platform. I still prefer Linux though.
Sometimes working local is just the easiest/fastest most convenient.
> Just run an always-on (headless!) Linux VM in the background, and don't use the host macOS for anything besides desktop apps (Slack, VSCode, Browser, mpv, terminal emulator but always ssh into the VM, etc).
This is what I do. Specifically, I use Canonical's Multipass, and treat tmux as my "window manager," -- I even mapped my iTerm profile's Command+[key] to the hex code for tmux-prefix+[key], so that Command essentially feels like the Super key in, say, i3. For example, rather than having to type Control+p h, Command+h selects the pane to the left.
The Multipass VM is flawless. Closing the laptop doesn't shut it down, and with the tmux resurrect plugin, sessions persist between Mac restarts (which are rare). If I didn't know better, I'd think it was just a native terminal session.
If I need proper x86_64, I just ssh into my super beefy Linux NAS at home via Tailscale. Both Linux machines are identical in terms of dotfiles/etc, so it feels exactly the same.
I've truly never been happier with Linux. I no longer obsess over my window manager (which used to be a serious time sink for me), and I still get what is IMO the best desktop experience via my Mac. The only tinkering I do is checking out the occasional new neovim plugin, but I really enjoy doing that, as it has a tangible benefit to my dev workflow, and kind of feels like gardening, in a way -- I like the slow but persistent act of improving and culling my environment.
I still have a PC, but I actually recently uninstalled WSL2. It never felt truly finished or "right", and Windows Terminal can be incredibly sluggish -- keystrokes have far more latency than iTerm. I've actually started to embrace just letting "Windows be Windows," even learning Powershell (and enjoying it more than I'd expected).
I've also pretty much moved from gaming on a PC to PS5. So, for me personally, I don't really see a place for Windows anymore. Every single time I boot into my PC, something is wrong -- most recently, it literally won't shutdown unless I execute `shutdown /s`, and no amount of troubleshooting has been able to fix it. I know Windows like the back of my hand, and still it's a constant feeling of death by a thousand cuts.
MacOS has seemingly gotten worse over the years not better. Features removed. Interface has been changed for the worse. The drive to unify the desktop experience with iOS. I could go on.
i've used a mac for development for various projects which have been globally impactful for over a decade. it's literally unix, man.
docker on mac can be improved, but if i'm developing for other architectures it's much easier to just test natively. toolchain for all the languages i use is exactly the same as any other *nix.
gaming i'll give you a point, but that's why i have windows dual boot at home. /shrug
Been using MacOSX since Powerbook up to first gen Macbook - and I finally gave up: I installed Linux on it instead.
Pretty much all servers are running Linux even back then. Using Unix on work computer/laptop causes way too many encounters with various quirks and glitches. It continuously drags down your productivity too many times everyday.
After much stress, I installed Linux on my Macbook instead - but then I encountered various hardware-related glitches instead.
It’s Unix but it’s not Linux. When Linux is your target it makes sense to use it on your development stack entirely. The irony being that Microsoft seem to make the best Linux OS for development for my use cases of docker / server side / cloud operations.
Yup, been doing that for decades. It's called "vendor lock-in". And its unethicality has been discussed comprehensively for decades as well.
And when countries tried to move into an open document format - they'll bully the country, using the strong arm of the Uncle Sam, until they're back to MS Office again.
So when people are amazed by Bill Gates' charities - I don't. His money comes from the sufferings of countries.
I understand that corporations are bastards and built on foundations of the crushed skulls of children and involuntary human sacrifice but Office on the Mac is not a bad product and is improving. But the windows version has just been around longer and had more work done on it. And don’t get me started on LibreOffice - it’s buggy as hell. Even more than Office on a bad day
I’ve met Bill. He wouldn’t be out of place among HN’s defective half: the dubious pro SaaS VC funded US university alumni…
I wasn't talking about the product - I was talking about MS using the product to lock the whole world into its own =proprietary= format (so no one could reliably open & process it - and then others got blamed for it, not Microsoft), AND then aggressively attack those who try to escape from its lock, even countries.
while WSL2 is great for what it is it just doesn't compare to working in a linux distro. there are a lot of pain points where external tools wont always work well with WSL.
I also like the productivity customizations posseble in Linux while the same are difficult or impossible on Windows.
of course if your work is tied into the windows eco system having WSL is good to have.
Controversially, I have a better development experience on Windows using msys2 + zsh (basically "git-bash" on steroids). I would put that development experience almost on par with MacOS.
WSL2's virtualized workflow just causes too many issues for me. WSL1 was better IMO but it wasn't significantly better than msys2 and also had issues (like you still need remote development tools to mount codebases inside editors) - unless you want to run/develop Linux binaries while on Windows.
For anything that isn't making basic non containerized applications (simple web applications, web servers), Windows is pretty good.
For anything more involved, requires multiple containers/compose/etc, I prefer Linux as it has the tools I need available natively and no gotyas or performance penalties.
That said, credit to Microsoft on WSL2. The auto-scaling hardware provisioning inside the VM has made containerized workflows on Windows much better. To me, it's just not better than running Linux inside VMWare/Hyper-V/VBox and "DIY"ing WSL2 yourself, something I had been doing for years before WSL2 anyway. WSL2 is more fool-proof then hand-rolling a Linux VM, so there is that.
That's interesting to hear. I've not used Windows in over a decade (I primarily work on mac) but I've heard that WSL was quite compromising and not a great experience for people who primarily use Linux tools.
bash (and other shells), coreutils, pipes, git, text/cli utilities etc. work just fine on WSL, I'd call them Linux tools. My in-shell workflow consists of using mainly those + VSCode (with the WSL plugin) + ssh'ing somewhere now and then, and it's entirely sufficient for this purpose. I haven't tried running typical webserver/db services on it though.
Mac being a certified Unix just means that Unix certification is meaningless, bought and paid for drivel. OS X was a certified Unix before it had atomic renames!
Su is just talking up AMD’s strengths, many of which come care of TSMC and for which the original R&D was largely funded by Apple. She is not wrong, and AMD has certainly made large gains in HPC recently, but AMD does not monopolize all possible paths to success here.
I agree that a lot of AMDs wins have come from TSMC. That being said, I feel like the biggest win for them over the past 5 years (well 6 now)... was moving to chiplets, which all started at Global Foundaries. Having the same chip, from the lowest end consumer chip all the way to the top end server chip means that they can spend less time and cost developing up processors for every segment, and they just bin the chips in terms of quality.
Intel is starting to make changes towards this structure but they haven't fully committed to it yet.
A move to chiplets would be just prolonging their terminal suffering if they didn't replicate Haswell in the first Ryzen chips, making them somewhat performance-competitive and due to chiplets also economically viable with the ability to wow consumers with the first 8 core desktop chips and later 12/16/24/32-core ones.
In particular, the RDNA3 graphics card line launch has been a dud.
The Nvidia 4090 turned out to be far ahead of the AMD 7900XTX.
Then there turned out to be an overheating issue on the AMD cards.
And now it just looks like both Nvidia and AMD are price gouging GPU buyers instead of competing with each other. They are deliberately keeping prices high and creating artificial shortages of GPUs, because that's what kept prices high during covid/crypto mining.
I used to be really cheering for AMD as the underdog, but I guess its true that none of these companies are your friend, they're just there to shake you down.
AMD had a chance to really pull ahead of Nvidia by being "the good guy" and actually offering end users great value for money, but instead they've chosen to emulate Nvidia.
Substantial customer good will has been lost by AMD.
Their graphics division has certainly been lacking vs Nvidia's cards, but are still far ahead of Intel's. Their desktop and server CPUs are crushing the market.
> They are deliberately keeping prices high and creating artificial shortages of GPUs
That's simply not true. Check out TechTechPotato's youtube channel for the explanation on this, but under-shipping isn't price fixing. They're just shipping less to distributors because there's less demand, allowing the distributors and retailers to keep a stable amount on hand.
One big reason that demand is low is that the current generation of GPUs are way too expensive, with top-end GPUs about twice what they used to cost a few years ago, very poor improvements in performance-per-$, and anythign below the top end offering even more dubious value for money especially on the AMD side. Originally those high prices were a result of high demand compared to supply, but the companies seem to have gotten greedy and decided they can permanently keep prices high and just throttle back supply to keep them there.
I’m consistently surprised just how good the RX580 really is. It can handle most games on medium at 1440p, but I tend to just chill on ultra at 1080p. Plus I play on TV, so the smaller resolution is actually better from where I’m sitting
RDNA2 (particularly, the narrower range from 6600 to 6800), is a huge step up in performance per watt. The lowest end one in there, 6600, is faster than a vega64 (which is much faster than rx580), yet uses less than half the power.
RDNA3's lower end chips, once they hit the market, are expected to further improve on this.
Most gamers won't upgrade to the current RDNA3 chips, because the current RDNA3 chips are top of the line, expensive, ~300w monsters.
Amen to that. The need for gamers to be on the hardware treadmill is no longer relevant. Five+ year old hardware can still run basically anything, albeit at reduced fidelity.
I keep eyeing a new build, but realistically, I know it’s just a vanity project because so few games will take full advantage of the better hardware. My favorite games in the past years could have run on ten year old hardware.
I think "service" is the key word here: I have an RX 580, and while it's kind of an incredible card in its longevity, it's really creaky at 1440p even with older games.
Performance per watt has really come a long way since GCN.
> Their desktop and server CPUs are crushing the market.
Seeing plenty of people choose 13th gen over Zen 4, the platform pricing for Zen 4 just wasn't very attractive. AMD [had to] significantly cut prices across the lineup by 20-30 %.
Also worth remembering that for the vast majority of people, any performance differences between Intel and AMD are utterly and absolutely insignificant as to be completely meaningless.
Nobody needs two-digit CPU core counts and 5~6GHz clock speeds to do their emails, communicate on Skype/Discord/Teams/Slack/Zoom/whatever, browse Facebook and Twitter, watch Youtube, and even play some vidja gaemz. An i3 or even a god damn Celeron is perfectly fine.
So at that point, Intel's superior stability (read: less jank) wins out by a hair and otherwise nobody really cares because there's no practical difference. The vast majority of people will just buy whatever's cheaper or just happens to be on the display table that day.
> The Nvidia 4090 turned out to be far ahead of the AMD 7900XTX.
7900XTX is on par or better than 4090 in alot of games, as well as some games favouring nvidia more than amd... (obviously not talking about Ray Tracing)
Coupled with a lower price...
I'm not quite sure what you're talking about since you're saying the oppisite is all reviews I've seen.
It really depends on the games. Modern Warefare 2 for example, 7900xtx outperforms 4090 in every resolution. While in fortnite the 4090 outperforms the 7900xtx even more.
@ 4k gaming then 4090 on average is /much/ better than the 7900xtx, but looking at 1080/1440p that lead deminishes alot.
edit: At the end of the day tho its all too damn expensive now.
you put "supply glut" in scare quotes, but again, do you know that their motive is cynical?
they're talking to the public, i.e. investors, and they could easily be saying "our current sales figures are lower not because our product is not popular, but because there is currently a large inventory downstream to meet current demand. When that glut is cleared, expect our sales to resume."
if downstream sellers have sufficient inventory, the only way to induce them to buy more would be for AMD to drop prices. If AMD cards are in hot demand and selling out immediately, restricting supply would be artificially boosting prices. But if the downstream pipeline is full, it's not right to say that reduced demand from wholesalers while that glut clears is AMD artificially boosting prices.
They had a chance to meet a much better price to performance but IMO nvidia showed that the market is willing to pay a premium for GPUs and both major players are exploiting that.
Nvidia relaunched the 4080 12gb as the 4070ti reducing the price by $100. There has to be one hell of a profit margin on the high end cards.
> They are deliberately keeping prices high and creating artificial shortages of GPUs,
I'm not sure if you're referring to AMD "undershipping", but if you read how they use that word it's pretty clearly a bad thing. AMD has been shipping less (to retailers) than what they could, or would like to.
> The Nvidia 4090 turned out to be far ahead of the AMD 7900XTX.
I just don't get why most people care? It's 60% more expensive, too. Even the 7900XTX is ludicrous at $1000.
Give me a $400 card from this generation that competes with a $500 card from the previous generation, and I'll call it a win.
A $1600 card winning anything seems like an irrelevant battle. Is the volume / sales for those cards high enough to be the real focus, when cards like the GTX 1060 were the volume leaders by a long shot?
This is not a completely hashed-out thought. But I'll share it and see what others think.
My impression is that the simplest way to improve energy efficiency is to simplify hardware. Silicon is spent isolating software, etc. Time is spent copying data from kernel space to user space. Shift the burden of correctness to compilers, and use proof-carrying code to convince OSes a binary is safe. Let hardware continue managing what it's good at (e.g., out-of-order execution.) But I want a single address space with absolutely no virtualization.
Some may ask "isn't this dangerous? what if there are bugs in the verification process?" But isn't this the same as a bug in the hardware you're relying on for safety? Why is the hardware easier to get right? Isn't it cheaper to patch a software bug than a hardware bug?
A good reason why memory virtualization has not been "disrupted" yet seems to be fragmentation. Almost all low level code relies on the fact that process memory is continuous, it can be extended arbitrarily, and that data addresses cannot change (see Rust `Pin` trait). This is an illusion ensured by the MMU (aside from security).
A "software replacement for MMU" would thus need to solve fragmentation of the address space. This is something you would solve using a "heavier" runtime (e.g. every process/object needs to be able to relocate). But this may very well end up being slower than a normal MMU, just without the safety of the MMU.
> This is an illusion ensured by the MMU (aside from security).
Even in places where DMA is fully warranted, IOMMU gets shoe-horned in. I don't think there's any running away from costs to be paid for security (not the least for power-efficiency reasons).
But in this case the job of the hardware is to prevent the software from doing things, and it pays a constant overhead to do so whereas static verification as integrated into a compiler would be a one-time cost.
Arbitrarily complex programs makes even defining what is and isnt a bug arbitrarily complex
Did you want the computer to switch off at random button press; did you want two processes to swap half their memory. Maybe, maybe not
A second problem to consider is that verification is arbitrarily harder than simply running a program -- often to the extent of being impossible, even for sensible and useful functionality. This is why programs that get verified either don't allocate or do bounded allocations. But unbounded allocation is useful
It is possible to push proven or sanboxed parts across the kernel boundary. Maybe we should increase those opportunities?
Also separate address spaces simplify separate threads -- since they do not need to keep updating a single shared address space. So L1 and L2 cache should definitely give address separation. Page tables is one way to maintain that illusion for the shared resource of main memory... Probably a good thing
That's not to say there isn't a lot of space to explore your idea. It is probably an idea worth following
One final thought: verification is complex because computers are complex. Simplifying how processes interact at the hardware level. Shifts the burden of verification from arbitrarily long running and arbitrarily complex and changing software; to verifying fixed and predefined limitations on functionality. That second one has got to be the easier to verify
I like this idea, and given today's technology it feels like something that could be accomplished and rolled out in the next 30 years.
If the compiler (like rust) can prove that OOB memory is never accessed, the hardware/kernel/etc don't need to check at all anymore.
And your proof technology isn't even that scary: just compile the code yourself. If you trust the compiler and the compiler doesn't complain, you can assume the resulting binary is correct. And if a bug/0day is found, just patch and recompile.
The reality is that we do want to run code developed and compiled and delivered by entities we don't fully trust and who don't want to provide us the code or the ability to compile it ourselves. And we also want to run code that can dynamically generate other code while it's doing so - e.g. JIT compilers, embedded scripting languages, javascript in browsers, etc.
Removing these checks from the hardware is possible only if you can do without it 100% of the time; if you can trust that 99% of the binaries executed, that's not enough, you still need this 'enforced sandboxing' functionality.
Perhaps instead of distributing program executables, we can distribute program intermediate representations and then lazily invoke the OS's trusted compiler to do the final translation to binary. Someone suggested a Vale-based OS along these lines, it was an interesting notion.
I do not believe such OSes can ever be secure given how often vulnerabilities are found in web browsers's JS engines alone. Besides, AFAIK the only effective mitigation against all Spectre variants is using separate address spaces.
My understanding is that's more or less what Microsoft was looking at in their Midori operating system. They weren't explicitly looking to get rid of the CPU's protection rings, but ran everything ring 0 and relied on their .NET verification for protection.
eBPF does this, but its power is very limited and it has significant issues with isolation in a multi-tenant environment (like in a true multi-user OS). Beyond this one experiment, proof-carrying code is never going to happen on a larger scale: holier-than-thou kernel developers are deathly allergic to anything threatening their hardcore-C-hacker-supremacy and application developers are now using Go, a language so stupid and backwards it's analog to sprinting full speed in the opposite direction of safety and correctness.
Put another way: if AMD (and especially Intel) don't do something about this they're going to get completely eaten alive by ARM.
The amount of processing power available in a modern smartphone is truly mind-boggling. I'd love to see a chart showing the chip cost and energy cost of the power on an M1 chip in each previou syear. I would guess that 30+ years ago you'd be in the millions of dollars and watts of power but that's just a guess.
As we see from the modern M1/M2 Macbooks, these lower TDP SoCs are more than capable of running a computer for most people for most things. The need for an Intel or AMD CPU is shrinking. It's still there and very real but the waters are rising.
> Put another way: if AMD (and especially Intel) don't do something about this they're going to get completely eaten alive by ARM.
AMD’s latest parts are actually quite close to M1/M2 in computing efficiency when clocked down to more conservative power targets.
They crank the power consumption of their desktop CPUs deep into the diminishing returns region because benchmarks sell desktop chips. You can go into the BIOS and set a considerably lower TDP limit and barely lose much performance.
Where they struggle is in idle power. The chiplet design has been great for yields but it consumes a lot of baseline power at idle. M1/M2 have extremely efficient integration and can idle at negligible power levels, which is great for laptop battery life.
People keep repeating that Zen4 and M1 are close in efficiency but what is the source with actual benchmarks and power measurements?
At any rate, using single points to compare energy efficiency isn't a good comparison, unless either the performance or power consumption of the data points comparable. Like, the M1's little cores are 3-5x even more efficient when operating in an incomparable power class, and Apple's own marketing graphs show the M1's max efficiency is also well below its max performance [1]
Those perf/power curves are the basis of actually useful comparisons; has anyone plotted some outside of marketing materials? It might even be possible under Asahi.
Their results are invalid because they used Cinebench. Cinebench uses Intel Embree engine which is hand optimized for x86, not ARM instructions. In addition, Cinebench is a terrible general purpose CPU benchmark.[0]
Imagine if you're testing how energy efficient an EV and a gas car is. But you only run the test in the North pole, where the cold will make the EV at least 40% less efficient. And then you make a conclusion based solely on that data for all regions in the world. That's what using Cinebench to compare Apple Silicon and x86 chips is like.
Cinebench/4D does have "hand-optimized" ARM instructions. It would be a disaster for the actual product if it didn't. That's what makes it interesting as a benchmark: that there's a real commercial product behind it and a company interested in making it as efficient as possible for all customer CPUs, not just benchmarking purposes.
Albeit for later releases this is less true since most customers have switched to GPUs...
Cinebench/4D does have "hand-optimized" ARM instructions.
It doesn't. As far as I know, everything is translated from x86 to ARM instructions - not direct ARM optimization.
Cinema4D is a niche software within a niche. Even Cinema4D users don't typically use CPU renderer. They use the GPU renderer.
The reason Cinebench became so popular is because AMD and Intel promote it heavily in their marketing to get nerds to buy high core count CPUs that they don't need.
Generally you see this in the lower class chips that aren’t overclocked to within an inch of instability. It’s not uncommon to see a chip that uses 200w to perform 10% worse at 100w, or 20% worse at 70w.
I can’t be bothered to chase down an actual comparison, but usually you’ll see something along those lines if you compare the benchmarks for the top tier chip with a slightly lower tier 65w equivalent.
It's actually this idling power which is what defines battery drain for most people. All these benchmarks about how much it can for a certain compute intensive task is not that important considering that most of the time a laptop is doing almost nothing.
We just stare at an article in a web browser. We look at a text document. We type a bit in the document. An app is doing an HTTP request. The CPU is doing nothing basically.
Once in a while it has to redraw something, do some intense processing of an image or text, but it takes seconds.
It's the 99% in idling that counts and there most laptop CPU's suck.
Even when watching a video the CPU is not (should not be) doing much as there are HW co-processors for MPEG-4 decoding built in.
It's quite embarrassing how AMD and Intel have screwed up honestly.
And that's why so far AMDs mobile processors have been monolithic and not chiplet-based. That is supposed to change with Zen 4's Dragon Range, however most of the mobile lineup will still be monolithic and these high-power/high-performance processors should go exclusively to "gaming" notebooks.
I care a lot about idle power, even on my desktop PC. It seems crazy to me that in 2023 I still need to consider whether maybe I should shut down my computer when I'm not using it.
What should I be buying to not have to ask myself that question?
If you take a Zen 3 running at optimal clocks for efficiency (such as the 5800U) the difference between its computing performance per watt is competitive with the M1 if you account for the difference in node size (which TSMC claims gives 30% less power consumption at the same performance). As the article points out, the real efficiency gains will be domain specific changes such as shifting to 8 bit for more calculations.
How would x86-64 be as efficient with the same transistor & power budget when they have to run an extra decoder and ring within that budget? Seems physically impossible.
All else being equal, they can't. But the difference isn't as big as some people like to think. For a current high end core, probably low single digit %. And x86-64 has had a lot more effort going into software optimization.
As I understand it, the actual processing part of most chips nowadays is fairly bespoke, with a decoder sitting on top. I doubt decode can make up that large a portion of a chips power consumption (probably negligible next to the rest of the chip?), so other improvements can make up for the difference.
The latest ARM Cortex CPUs (models X2, A715 and A510) drop 32-bit support. Qualcomm actually includes two older Cortex-A710 cores in the Snapdragon 8 gen 2 for 32-bit support. Don't know much about Apple Silicon but didn't they drop 32-bit a couple of years back?
Google has purged 32-bit apps from the official Android app store, but as I understand it the Chinese OEMs that ship un-Googled AOSP ROMs with their own app stores haven't been as aggressive about moving to 64-bit.
Because the more complex decoder is traded in this case for a denser instruction set, which means they can trade it for less instruction cache (which is more power hungry).
Honestly I don't understand why there's not something like a 256 core ARM laptop with 4TB RAM.
The benefit of ARM is scale of multitasking due to not requiring the same kind of lock states that Intel's architecture requires, and can additionally scale much better than only one physical+virtual core pair.
I guess the only thing that's holding back ARM is Microsoft, as laptops are expected to run an desktop OS that people are comfortable with. Windows RT wasn't really a serious desktop OS and rather a joke made only for some IoT enterprises instead of end-users.
I wish there was more serious hardware than the standard broadcom or MediaTek chips, I'd definitely want some of that...be it as a mini ATX desktop/server format (e.g. as a competitor to Intel NUC or Mac Mini) or as a laptop.
With the ongoing energy crisis something like solar powered servers would be so much more feasible than with x86 hardware.
> Honestly I don't understand why there's not something like a 256 core ARM laptop
The high power ARM cores aren’t that small. If you took the M2 and scaled it up to 256 cores, it would be almost 7 square inches. You can’t just scale a chip like that, though, so the interconnects would consume a huge amount of space as well. It would also consume over 1000W.
The latest ARM chips are great, but some times I think the perception has shifted too far past the reality.
7 square inches would also include an enormous GPU and tons of accessories.
The actual cores are about .6/2.3 mm², and local interconnects and L2 roughly double that.
So with just those parts, 256 P-cores would be about 1.5 square inches, and 256 E-cores would be about half a square inch. And in practical terms you can fabricate a die that's a bit more than a square inch.
Of course it wouldn't use 1000 watts. When you light up that many cores at once you use them at lower power. And I doubt a 256 core design would have all that many P cores either.
As a rough estimate, you could take the 120mm² M1 chip, add 28 more P-cores with 110mm², 220 more E-cores with 300mm², 128 more MB of L3 cache with 60mm², 100mm² of miscellaneous interconnects, and still be on par with a high end GPU.
That sounds doable but is pushing it. A 128 core die, though, has nothing stopping it except market fit.
even a 128 core part made like that will perform pretty atrociously. scaling up the core count without scaling the cache count means you have a lot of cores waiting for memory. also when you have 128 cores, you almost certainly need more memory channels to have enough bandwidth.
Could we make the chips go slower like around 1Ghz? Maybe that is not feasible with the current software architecture to achieve great user experience.
> The benefit of ARM is scale of multitasking due to not requiring the same kind of lock states that Intel's architecture requires
I have no idea what you mean by this. The only x86 feature I can think of that might qualify as a 'lock state' is a bus lock that happens when an atomic read-modify-write operation is split over two cache lines. That has a very simple solution ('don't do that'--you have no reason to), and anyway, one can imagine more efficient implementation strategies
> can additionally scale much better than only one physical+virtual core pair
I have no idea what you mean by this either. Wider hyperthreading? It can be worthwhile for some workloads (and e.g. some ibm cpus have 4-way hyperthreading), but is not a panacea; there are tradeoffs involved.
The largest number of high-performance ARM cores you can get in a single socket is the Ampere Altra Max with 128 ARM Neoverse-N1 cores. At 2.6 GHz the processor consumes 190 W, and at 3.0 GHz up to 250 W. This is a server chip, not something you can put in a laptop.
I think because general compute is hard to parallelize, so 256 cores doesn't help much in practice. (Compute that does parallelize well already runs on GPU).
>I guess the only thing that's holding back ARM is Microsoft
It's not Microsoft holding it back. It's Qualcomm.
Apart from their very latest SOC (designed by a bunch of ex-Apple employees, no less) their CPUs have are significantly worse than x86 in terms of general performance and have persistently lagged 4 years behind Apple in terms of performance (3 years behind x86). They sell for the same price per unit as x86 CPUs do, so there aren't very many OEMs that take them up on the offer given the added expense of having to design a completely different mainboard for a particular chassis.
As such, x86 is the only game in town if you're buying a non-Apple machine; Qualcomm's products aren't cheaper and perform much worse outside of having more batter life. Sure, Qualcomm owns Nuvia now, but that acquisition will still take some time to bear fruit.
>> I would guess that 30+ years ago you'd be in the millions of dollars and watts of power but that's just a guess.
30 Years ago I don't think the compute power of a modern phone chip was available at any price, even in super computers.
On a tangential note, there are economists who think this increase in compute is somehow an increase in one of their measures - I don't recall which one. I disagree, because with that logic we all have trillion dollar tech in our pocket. Making a better product over time is expected, it's not some kind of increase in output.
The Top500 supercomputer list started in June 1993, just about 30 years ago. At the top is the CM-5/1024 by Thinking Machines Corporation at Los Alamos National Laboratory with 1,024 cores and peaking at 131.00 GFlop/s (billion floating point operations per second).
It's an Apples to ThinkingMachine Oranges comparison but CPU-Benchmark[1] ranks the Apple A16 Bionic used in the latest iPhones, its GPU - in the "iGPU - FP32 Performance (Single-precision GFLOPS)" section - at 2000 GFlop/s.
GadgetVersus[3] reports a GeekBench score of the A16 Bionic at 279.8 GFlop/s. - SGEMM test of matrix multiplication, it seems.
AnandTech[4] was reporting the A15 architecture ARMv7 came in at 6.1 GFlops in the "GeekBench 3 - Floating Point Performance" table, SGEMM MT test result, in 2015.
Interesting. I would have thought a few GFLOPs today would have been faster than the old super computer, but nope. The GPU is faster though. Still, the phone has both and can run on battery power while fitting in your pocket ;-)
In my experience talking to semiconductors folks, ARM is just not a concern anymore. The future is RISC-V, and ARM is already being seen as legacy tech. ARM's progress in the server space has stalled, the ARM Windows ecosystem is dead, Android has laid the groundwork for a move to RISC-V, and ARM has never and will never touch the desktop market.
> It proves my point beautifully when the only response to my comment
Your comment was beyond ignorant, and wrong. Most folks here are too smart, or busy, to reply to such nonsense. I am neither.
ARM is by far the most shipped and used arch every year. AMZ is even going in heavier on it. It's not legacy tech at all. So a person decided to show you how wrong you were, by listing what's probably the most impressive chip in all of our lifetimes, and it's guess what, ARM.
> Obviously I was referring to Linux/Windows workstations
The creator of Linux is using an ARM machine as a workstation today, AFAIK.
> if everyone was smart enough to pick up on that I wouldn't be paid as much as I am
If you're making more than a burger flipper at Wendy's, the world just isn't fair.
The one viable server ARM CPU core is now tied up in a Qualcomm-ARM legal spat and probably won't see the light of day and made it pretty clear to anyone not grandfathered in like Apple that it's not worth designing your own ARM core. ARM itself has been hemorrhaging employees both because of better offers from Apple and the RISC-V stealths, and because since the SoftBank push to get their money back has simply been a worse and worse place to work. Their ability to execute is extremely compromised.
Because if the long tail of the hardware industry, the writing can be on the wall long before it's clear based on what you can go out and buy off a shelf today.
I get it that you want RISC-V to succeed - so do I - and to advocate for it but I really don’t understand why it needs this sort of comment about Arm. I see exaggerated criticism of the Arm ISA elsewhere from people who ought to know better too - it’s really CISC, it’s 5000 pages vs 2 for RISC-V etc. It’s just not necessary.
I mean, nothing I said is exaggerated here. ARM doesn't even have a viable server core that can compete with x86 even as vaporware. SoftBank ruins everything they touch, and is super focused at the moment on stealing from Peter to pay Paul to get something out of the upcoming ARM IPO since their attempt to sell it off to Nvidia fell through. The rumor is they've been cutting R&D funding hard to get temporarily boost profitability. If anything this is more a dig at how vulture capitalism ruins productive companies.
As an aside, ARM has always been a hybrid CISC/RISC core. It has nothing to do with the number of instructions, but the fact that not having an I$ on the ARM1 forced it to have microcoded instructions to mainly to support LDM/STM. That's not a dig at ARM. It's a valid design; particularly at that gate count.
You jumped in in support of a comment that said Arm is ‘legacy’ tech. You said they don’t ‘even have a viable server core’. They are ‘haemorrhaging’ staff. Softbank have ‘ruined’ them.
Sounds more apocalyptic than exaggerated tbh.
I still don’t know why you think this is necessary.
The M1/M2 Macs run Linux pretty well. It's not perfect yet, but perfectly usable (especially as a desktop machine!) and support is improving every day.
I believe you're trying to move goalposts to avoid admitting you're wrong.
Graviton, M1/M2, Ampere etc but I’m sure you’ll be able to explain why Arm is seen as ‘legacy’ tech when billions of smartphones are being shipped every year with Arm CPUs.
Oh look, you named 4 areas where ARM development has already peaked. Hyperscalers are already looking to evolve from ARM in the near future, just look at how much attention Ventana got at RISC-V Summit. M1/M2 are Apple ecosystem specific phenomenon that haven't inspired any copycat products. Ampere has been a massive disappointment to everyone in the industry, see the fact that Nuvia had their entire business dead-to-rights pre-acquisition. ARM simply isnt at the cutting edge of the semiconductor industry anymore. Just because Apple and Qualcomm use it to great effect doesn't mean ARM is making any major innovative strides relative to the competition.
> ARM simply isnt at the cutting edge of the semiconductor industry anymore.
What you really mean is Arm isn’t the hot new thing anymore. Well it hasn’t been that for 20 years. Meanwhile billions of arm devices in leading edge nodes are being shipped. Oh well.
If RISC-V support by Microsoft is as bad as it has been for ARM, then I'm afraid RISC-V will never touch the desktop market, at all. Contrary to ARM, which is being pushed there with great success by Apple. Server-wise of course it's a different story...
If great success to you is that they put the M1 and M2 in a tower, I don't know what to tell you. Intel, AMD, and the x86 industrial complex don't care in the slightest what instruction set your Mac runs
Might I suggest taking a step back, re-reading your first comment and all the replies under it, and asking yourself "is it possible I might not be 100% correct, and maybe other opinions have enough merit to be worth considering why people aren't agreeing with me, rather than just changing my argument to make sure I'm still the winner of this thread"?
I’m not sure I expressed my point clearly. It wasn’t quite about Apple. So I will reformulate it here: the fate of any instruction set on the desktop is primarily decided by Microsoft.
Do you have any information that Microsoft is planning to support RISC-V at least as well as x86/x64? (That is to say, not with something like Windows RT, or Windows CE)
That would be tremendously good news, I shall add.
>In my experience talking to semiconductors folks,
Most, if not all SemiCoductor “folks” I know are very pragmatic. As in how a Real Engineer should be, unlike software engineers. And in my experience, only HN and the Internet are suggesting ARM is dead. Everything will be RISC-V.
RISC-V is free and open as in libre, by contrast to x86 and ARM which must be licensed from Intel/AMD and ARM and are thus subject to potential western economic sanctions.
Now, yes, China will just espionage and kangaroo court their way through and around such legalities anyway, but nonetheless RISC-V is less effort for more reward for China if it becomes at least on par with x86 and ARM.
Put more basically, it's a matter of national security. China can have an entire RISC-V ecosystem indigenously, unlike x86 and ARM.
If the US and/or UK place sanctions on exporting microprocessor technologies to China then that's that. Intel/AMD and ARM are subject to US and UK laws and regulations respectively.
RISC-V by contrast is much, much harder for any given country to regulate because of its free and open nature. At most the US and UK can embargo individual developments made within their jurisdictions, but they can't regulate the entire architecture. RISC-V doesn't have a kill switch named Intel/AMD or ARM.
ARM China is a wildly different animal than ARM. They went rogue a few years back and though SoftBank/ARM did a lot to get things back in line, it still shows up like this:
I'm loving it. I used cheap risc-v boards for several of my projects, most notably a GD32V in my keyboard. The equivalent stm boards weren't too expensive, mostly in the 10-20$ range, but weren't as easily available (and 10$ is still 3x the price of the chinese risc board)
Though the rp2040 has largely ended my cheap risc-v addiction
As an experiment quite a few years ago I got a laptop with a special version of the Intel CPU that was not as fast but much more power efficient.
ASUS UL30A-X5
Really an excellent computer, ran linux great (games didn't really exist yet though), and with tuning was coming in under 10W if the display brightness was turned down. First time I was able to get through flights without the system running dead.
I think in this case what's going on is that temperature rises increase resistance in a chip and therefore cause lower efficiency. If you can keep it cool, you can keep it more efficient. The move seems like a necessary one, a computer as powerful as that UL30A is probably inside the phone if you turn off the radio and display, that thing still had a giant battery and only lasted 10-12 hours.
I've seen AMD do some pretty impressive things, I wouldn't count them out. They're at least willing to attempt to compete on price.
From what I've seen the CPU/ALU/decode at the center being ARM or x86 may make less difference than you think. The amount of circuitry and silicon area (correlated with power) for non core is significant. MMUs, vector instructions, complex cache hierarchy, high speed IO (DDR, pcie, you name it) extremely complex network on chip (infinity fabric) to enable cpu interconnectivity, etc. Is very significant. Look at the IO die size vs the CCD size. As one poster pointed out using chiplets have great advantages, but there is a power hit. Thankfully newer tech is bringing that power down too. I'd love to see a power breakdown of a full chip to see what % is attributed to the cpu core itself.
In my own experience, the supposed ARM chip superiority claims are almost entirely marketing. I get significantly better performance (15-50%) from nearly all of my CPU workloads on modern Intel/AMD hardware vs the ARM Apple devices.
When I was an undergrad, one of my professors was exploring “approximate” computation (forget what it was technically called). The gist was that you build mathematical circuitry that approximates an answer instead of giving you a concrete answer (kinda like floating point but also applied to Boolean algebra, integer math etc). The reasoning was that the approximation could let you reduce the power. I wonder where that line of research has gone.
It's used inside Google's TPUs[0]. Useless for regular logic programming (You've wouldn't want a nuclear power plant, or your payment processor to approximately work), but has found a use case in scaling ML pipelines where the accuracy of individual values is less important compared to billions of parameters in aggregate.
I don’t know that it’s “useless” for logic programming. I agree it’s less likely to be used for normal everyday stuff but it could see more proliferation (eg video decoding). I think TPUs are an interesting first step but the holy grail (which indeed may be impossible) is to be able to reduce large scale programs to run on approximate circuitry. For example, if chrome used a fraction of the memory, CPU, and power, that would be significant even though no one would notice that the page renders slightly imperfectly if executed well. As the paper notes TPUs are the first application although I think what we’re doing there is probably quite primitive be what the researchers in the area are working on long term.
Happy to hear that AMD takes these issues seriously. The power draw of current Intel and Nvidia chips is seriously getting out of hand. There is absolutely no justification for drawing 40-50 watts of power for rendering a website!
They could score a quick win for power efficiency by making the Ryzen 9 5900 and Ryzen 9 3900 12-core 65W CPUs generally available instead of only selling them to OEMs.
If your production process and binning doesn't create enough supply for these parts, you only sell them to OEMs, because they can silently discontinue, or limit availability of these models when parts can not be found.
However it'll be a PR disaster if the parts have spotty availability in retailers, and will cause wild rumors to spread, from "AMD doesn't want you to have this CPU" to "AMD is going bankrupt".
Sure. But since AMD has been producing these CPUs for a while, their yield should be good by now, shouldn't it?
Anyway, I hear some people saying you can just buy the 105W "X" variant and limit the power usage to 65W in the BIOS. Does it really give the same result, also in idle power consumption?
Low power, stable low frequency and low power, peaking high frequency make different bins. The former goes to servers and latter to enthusiast laptops.
It's amazing on how you can power limit the newest Ryzen to only 105 and get high energy safes for much less performance reductions.
But the review-game (same reviewers who complain about power usage) drives AMD/Intel to get the last 5% of performance increase for 20% of power increase.
I dont think Intel or Nvidia got the memo on this. Both are pumping out seemingly more and more desparate products where their solution to more performance is just to throw more power at it, causing obscene levels of heat in the process.
Meanwhile down at the mid-end you've got Apple chipping away at them with ultra efficient ARM chips.
what? the 4x series cards are the most energy efficient graphics cards ever made. ignore the 4900, they just scaled the top end to have high power and high performance. look at the low tier. they all have significant power requirement reductions compared to the 3x cards. watts per fps is down
When I worked on London Underground, we investigated regenerative breaking and found it wasn't worth it for the power savings BUT it was worth it to reduce temperatures on platforms (LU applied a £ value to the comfort of passengers for modelling purposes).
Interesting to see the same fundamental issue at the nano scale...
Right? Like, isn't that one of the reasons people have been raving about Apple's MX chips? Because they're very energy efficient and have good performance characteristics?
Looking forward to this. It seems lately you can only get < 35W x86 CPUs through laptops or some very niche vendors or even mystery sellers on alliexpress. I’d like something that idles at sub 5W like ARM SBCs do but without the whole OS or boot jank.
I’m pretty sure people will suggest x86 options that fit into that description but again it looks like you have to scrape the internet for stuff like intel 1[2,3][1-9]00t or wathever the AMD alternative is.
Anecdotally, I use a CPU from an aliexpress mystery seller in my desktop machine. It's a i9-9980HK with about 40W TDP...
Which I unlocked to ~300W and installed a water cooling system.
It can idle at around 5W, but a synthetic load (e. g. prime95) quickly makes it draw full 300W and then throttle a bit. Fun stuff, definitely wouldn't do it again due to enormous strain on the PSU and an unreasonably high power draw relative to performance.
There are many many machines out there with x86 soldered CPUs all of them idling at less < 10W. E.g. ASUS PN-series with Celeron CPUs. You can even find them on physical stores. I have a PN40 idles at less than my RPI4b...
Has the environmental impact of less efficient processing been studied? I've been thinking about this since seeing those stories a while back about bitcoin miners collectively having the same power consumption as some smaller countries.
If anyone knows a dataset suggesting what % of world energy usage is used by computing hardware (and what proportion of that hardware is idle vs fully utilised) I'd love to see it.
Also, less on topic, what's the environmental impact of writing your app in something programmer friendly but power inefficient? I strongly suspect some big tech companies will be suffering with this (namely anyone who is still on RoR at scale).
AMD CEO, Dr. Lisa Su, has identified energy efficiency as the next challenge for the company. With the growing demand for high-performance computing, it is crucial to ensure that energy consumption is minimized to reduce the environmental impact and operating costs. AMD has already made significant progress in this area with their latest processors that deliver excellent performance while consuming less power. Dr. Su's focus on energy efficiency underscores AMD's commitment to sustainability and innovation, and we can expect to see further developments in this area in the coming years.
The last time a chip company talked about "Efficiency" in this way, we got +10 years of 5% increments in performance and 50% in price increases for 2 and 4 core CPUs..
I know it's my PTSD, the words make sense but I really hope this is not code for "we can't get much more performance out of our architectures, so we'll focus in selling "efficiency".. "
I really hope I'm wrong, since the power consumed by this chips has gotten a bit out of hand ( not as much as the GPUs.. )
Yes, but real world performance ( aka compute speed ) has been like desktop Linux, "it's here, but not really" ( we are talking about general desktop, laptop, server here, not mobile ).
Having said that, Apple's M1 was the real one that changed things, also ARM on the server is starting to actually be a mainstream thing, so I get why AMD and Intel are sweating, I just hope they can pull it off because a World where you need to buy a +$1000 aluminum box attached to the CPU you want to get is not a better World.. or worse: Doing business with Qualcomm.
Desktop Linux has been better than Windows out of the box for more than a decade now. Unless you're talking about market share, which will never increase without a hundred million dollar marketing blitz behind it.
My Thinkpad X1 Nano Gen 2 with Alder Lake has only 50% battery runtime of what X1 Nano with 1160G7 accomplished. For pretty much the same performance. Power constrained to say 5 or 7W, the 1160G7 feels faster.
I hope that Lenovo can offer something so light as a X1 Nano with AMD inside. Technically it should be both possible and feasible, given that the AMD CPUs are much more efficient/performant at low power levels.
So.... that means an Apple M1 killer is in the works? Anybody? That's the type of energy efficiency everyone can get onboard with.
Also, IMO software needs to take a lot of responsibility for energy efficiency, we just pawn that off on the hardware vendors. I wonder what the carbon cost of javascript is, I don't think I'd want to see the results, or python for that matter.
We can see in practice the gains in energy efficiency coming from very close integration from short distance, to chiplet, to everything on one die. These are our mobile phones down to the Apple M1/M2 laptops with a motherboard having everything packaged into the size close to the size of a mobile phone board.
Will neuromorphic chips ever go main stream amongst AI practitioners? The neurons on these chips are spiking, which is a whole different paradigm what is currently used in neural networks? These chips are however a thousandfold more efficient.
I've been using a Zen2 notebook for the past few years, and was honestly surprised by the processors performance for the first... 3 or 4 months. Then typical updates happened. And some quirks got in the way too. Like it having a decent iGPU that I can't use because the BIOS will only let me pick one of them (down to it not having a mux to pick one or the other? Not sure what's going on there, just that I can't do it). In general it's a great machine, but there's a host of little details that make the experience a bit worse than it should be.
Per my usual update schedule, I'm looking down the line to at least a Zen5(+?) upgrade in a few years time, so I hope they improve this kind of things in the future. However that's entirely up to OEMs deciding to make a good product, and a bit out of AMDs grasp.
The only reason ARM is competitive with x86 is due to heavy borrowing of its tricks (i.e. OOO execution). At the end of the day, Apple M2 and AMD Ryzen cores are not all that dissimilar.
I have never seen a realistic proposal for how this would make a practical consumer PC work better.
Having a clock that every component can agree upon means that components don't have to worry about each other anymore. Physics and information theory would suggest that removal of this centralized clock signal necessarily introduces additional latency in order to safely determine or modify system state.
Typical for a company that is stuck to be calling out what they should have been doing for the last decade as the thing to focus on. Basically he's promising more of the same.
What he should be doing is announce the next big thing. Which I imagine might include trendy things like tackling AI with some hardware/software stack that is energy and cost efficient and competitive with nvidia. Or a non intel architecture based chip intended for high end gaming/ar/vr type devices where energy efficiency and performance are going to matter more than compatibility with legacy PC hardware. AR is going to suck if you have to be tethered to a huge power supply or battery and carry a liquid cooling apparatus with you. This requires a different approach.
And even those things really should have been the focus for the last ten years. A slightly faster version of the thing they've been selling for the last ten years is not going to turn things around and there are only so many people still assembling PCs from parts that actually know and appreciate AMD as a brand.
But wow there's a bunch of power burned on interconnect between CCDs and CCX. And now AMD's new southbridge, Promontory 21, made by Asmedia, is another pretty significant power hog, and the flagship X670 tier is powered by two of these.
There's absolutely a challenge to bring power down. I'm incredibly super impressed by AMD's showing, & they've done very well. But they've been making trade-offs that have pretty large net impacts, especially if we measure at idle power.