AMD CEO: The Next Challenge Is Energy Efficiency

rektide · on Feb 22, 2023

AMD has done extremely well with multi-chip(let) modules. Zen cores & zen clusters (on Core Chiplet Die, CCD) are wonderfully small, and a huge amount of the regular stuff cores do is relegated to the IO Die (CCX), which is not as cutting edge.

But wow there's a bunch of power burned on interconnect between CCDs and CCX. And now AMD's new southbridge, Promontory 21, made by Asmedia, is another pretty significant power hog, and the flagship X670 tier is powered by two of these.

There's absolutely a challenge to bring power down. I'm incredibly super impressed by AMD's showing, & they've done very well. But they've been making trade-offs that have pretty large net impacts, especially if we measure at idle power.

emacs28 · on Feb 22, 2023

CCX stands for Core Complex

CCD stands for Core Complex Die (and neither terms refer to the IO die)

rektide · on Feb 22, 2023

Whoof, oops, thanks. I'd been using CCD and IOD as terms until this post, but had a "omgosh, I've been doing it wrong" panic & changed into what we have here. My mistake. Thank you for correcting us back!!

https://hn.algolia.com/?query=rektide%20ccd&type=comment

MrBuddyCasino · on Feb 22, 2023

Manufacturing costs force them to go down the chiplet path. Its actually impressive they can remain competitive at all given TSMCs margin of 67%. [0] If Intel Foundry manages to keep up with TSMC, thats a lot of pricing power advantage over AMD. Or they could make lower-power CPUs that would be uneconomical for AMD.

[0] https://www.macrotrends.net/stocks/charts/TSM/taiwan-semicon...

narrator · on Feb 22, 2023

Intel can't figure out how to make the next generation of chips. They used to be able to innovate on a steady regular basis, but they're still trying to get 7nm right and they've been working on that for many years while TSMC is on to 3nm. Makes you wonder what happened to that company that they got so far behind.

lotsofpulp · on Feb 22, 2023

Why is EBITDA margin better in this context than net margin? Wouldn’t manufacturing facilities have a ton of depreciation and amortization that should be incorporated into the costs?

brookst · on Feb 22, 2023

Gross margin is the appropriate measure of per unit cost/profit. Yes, EBITDA is not a good measure at all (for part economics).

tankenmate · on Feb 22, 2023

But don't forget that TSMC's margins for massive bulk purchasers like AMD will be lower than the average.

MrBuddyCasino · on Feb 22, 2023

Does TSMC even have non-bulk customers at 7nm/5nm/4nm? I would have thought the mask costs are so high that it isn't economical except for the biggest companies.

shaunsingh0207 · on Feb 22, 2023

7nm yes, 5nm/4nm not as much, apple pretty much bought out the entirety of 3nm. 7nm is a relatively mature process at this point.

mastax · on Feb 22, 2023

Cerebras has probably bought less than 100 7nm wafers, total. Though they may have gotten many more sales than the last I checked. Tesla Dojo is probably about the same. I've seen loads of random chips like that on 7nm. I guess it depends what you consider "bulk".

logimus · on Feb 22, 2023

Yup, obviously they are aware of AMD competition and can adjust their margins accordingly.

aidenn0 · on Feb 22, 2023

I'm not up-to-speed on modern chipsets, but WTF is the Southbridge doing that it needs that much power? Is Thunderbolt going through it or something? I think of the Southbridge as a mostly ignorable part of the chipset (up until something goes horribly wrong).

bayindirh · on Feb 22, 2023

Single word: PCIe.

Lots of fast PCIe lanes eat a lot of energy. A server motherboard contains tons of more PCI devices when compared to a consumer desktop systems, and they are not the cards, but the small units enabling the advanced features in servers, which are embedded on the motherboards themselves.

Thunderbolt is just a PCIe encapsulator of some sort, which can also do plethora of other things.

rektide · on Feb 23, 2023

The one contradiction I have to point out here is that server motherboards dont need big southbridges: the cores themselves have gobs of PCIe. 1 and 2P AMD Epyc server cores have 128 lanes of PCIe.

I wish modern chips did a better job of breaking down where power went. It'd be so interesting to know how much power is going to usb controllers, how much is going to PCIe. I'd also hope that they could do things like shut down parts of the chip, if there's no USB or PCIe devices plugged in. But these chips seem to have a pretty high starting place of power consumption. Although maybe it's in part because the first example were flagship motherboards with a whole bunch of extra things peppered across the board - fancy NIC chips, supplementary thunderbolt controllers, sound cards, wifi - so maybe there was just an unusual lot of extra stuff going on. But it has been shocking seeing idle power raise so much on the modern platforms. It feels like there's a lot of room for improvement in power-down.

xattt · on Feb 22, 2023

> Single word: PCIe

This explains Intel’s squandering of PCIe lanes for consumer desktops versus AMD’s generosity.

borissk · on Feb 22, 2023

There's very little difference in the number of PCIe and M2 slots between Intel and AMD on their consumer platforms. The only difference really between AM5 and LGA 1700 motherboards is a lot more AMD boards have one M2 PCIe 5.0 slot, while only the very top Intel boards have this feature.

Tuna-Fish · on Feb 22, 2023

AMD has 24 PCIe 5.0 lanes directly from the CPU available for user, while Intel has 16 5.0 + 4 4.0. Cheaper motherboards might not expose all of those, or downgrade some to 4.0 to save on on-board components. In addition, both have more on the chipset, which is connected to (additional, reserved) PCIe 4.0 lanes. The best AMD chipsets have 12 4.0 and 8 3.0 lanes, while the best Intel ones have 20 4.0 and 8 3.0 lanes. An important point is that the connection between the chipset and CPU is twice as wide on intel (8x vs 4x).

So overall, AMD has more and faster IO available directly from the CPU, but less lanes from the chipset, and with a weaker connection to the chipset. If PCIe 5.0 drives become available and the transfer speed to storage is important, I'd say AMD is better, otherwise I'd say Intel has more IO.

washadjeffmad · on Feb 22, 2023

> very top Intel boards

Where "top" means "most expensive". The Z790 board I recently purchased for around $300 was pretty barebones and lackluster (no TB, meager IO from ports and headers, wattage constrained VRM relative to 13th gen TDP, etc), but it was the least costly way to work with an Intel proprietary technology.

It'll last another four or five years, but it was my first Intel build since the slocket days, and likely my last.

mastax · on Feb 22, 2023

AM5 boards aren't any cheaper than that. I'd say that Intel boards are generally cheaper in fact, though that has narrowed with Z790.

You do get to use the AMD boards for more than two CPU generations though.

aidenn0 · on Feb 22, 2023

Oh, I assumed that PCIe was on the northbridge. Of course PCIe eats energy.

6510 · on Feb 22, 2023

To me the most impressive part is what incredible job they did in the years of intel dominance. They almost bulldozered themselves.

anoother · on Feb 22, 2023

I see what you did there.

osrec · on Feb 22, 2023

Please explain!

beebeepka · on Feb 22, 2023

Bulldozer [0] was a major flop. Sounded too good to be true: 2 cores sharing the same FPU because integrated GPUs should be executing these instructions much faster anyway. Maybe it was just ahead of its time and they simply couldn't deliver

But it failed hard which coincided with Intel releasing a major winner with their last planar architecture - sandy bridge.

As a result, AMD spent years circling the drain and their stock dipped below 2 dollars. Some people made good money buying around that time.

[0] https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)

Edit: oh, I missed the exclamation mark haha. Oh well. Too tired to even feel ashamed

anoother · on Feb 24, 2023

It was a pun: "Bulldozer" as in "(v.) to destroy" and also a former AMD microarchitecture which, as sibling reply mentions, was a flop.

Kon-Peki · on Feb 22, 2023

Energy efficiency and improved packaging are things I can readily agree with. The last thing though - “hybrid algorithms leveraging AI efficiency” sounds an awful lot like a buzzword sales pitch.

This article reminds of this other one [1] posted about 1 month ago.

It’s an interview with some guys that just got done building an exascale supercomputer, in which it was originally estimated to need 1000 megawatts but ultimately only needs 60. The reporter asks about zettascale and the power requirements; they wave it off and say that the big question about whether it will even be possible in the next 10 years is getting the chip lithography small enough so that you can physically build a working zettascale supercomputer.

[1] https://news.ycombinator.com/item?id=34604319

evancox100 · on Feb 22, 2023

Re. The “hybrid algorithms” bit: I was at this talk. The example she gave was a physics sim like CFD, iterating between a fast/approximate ML-based algorithm and slow/accurate classical physics algorithm, with the output of each feeding in as the starting point of the next round. But this was just an example, clearly there are lots of area you could apply a similar approach.

Certhas · on Feb 22, 2023

The main thing AMD has in their accelerators that enables this this is unified memory between CPU and GPU. Thats really interesting.

shaunsingh0207 · on Feb 22, 2023

This has been something I've been incredibly pleased with on the apple silicon SOC's. Albeit slowly, being able to load large datasets or blender scenes on a portable, efficient laptop and still being able to use the GPU is a nice touch.

Of course performance wise it doesn't touch the $1k+ graphics cards with crazy amounts of ram, but for students and if I need to do something quick on the go, its a really useful tool.

tankenmate · on Feb 22, 2023

Don't forget that AMD also bought Xilinx

zekica · on Feb 22, 2023

@WithinReason

I expect them to use Xilinix's AI engines primarily in their CPUs, APUs and GPUs - not so much FPGA.

dragontamer · on Feb 22, 2023

I expect Xilinx's AI engines to never be integrated into anything AMD. Because Xilinx AI engines are VLIW - SIMD machines running their own instruction set.

----------

AMD is doing the right thing with Xilinx tech: they're integrating it into ROCm, so that Xilinx AI engines / FPGAs can interact with CPUs and GPUs. But there's no reason why these "internal core bits" should be shared between CPU, GPU, and FPGA.

WithinReason · on Feb 22, 2023

How do you expect FPGAs to be useful here?

greenknight · on Feb 22, 2023

I think its also leaning into their new product MI300, which has 24 cpu cores with 8 compute dies. Both CPU and GPU (and memory) on a single package.

Conventional processing + AI together. Hybrid approaches.

m_mueller · on Feb 22, 2023

Afaik, having worked in HPC, one area where this can be employed is error bias correction of CFD models. E.g. weather models have various biases that need to be corrected for - so far this is just done with some relatively simple statistics afaik.

evancox100 · on Feb 22, 2023

Yes they were really crowing about MI250X adoption and touting the upcoming MI300

PartiallyTyped · on Feb 22, 2023

So differentiable programming with nns as approximators? Cool!

yccs27 · on Feb 22, 2023

I imagine something like AlphaZero would also benefit: It's basically a hybrid of tree search and neural network.

WithinReason · on Feb 22, 2023

“hybrid algorithms leveraging AI efficiency”: An example I came across recently:

https://arxiv.org/abs/2202.11214

A neural network-based solution for weather forecasting: "FourCastNet is about 45,000 times faster than traditional NWP models on a node-hour basis."

jayd16 · on Feb 22, 2023

I assume it means using ML to choose between multiple algorithms instead of more traditional heuristics.

That or the algorithms use a combination battery and gasoline.

undersuit · on Feb 22, 2023

AMD already uses a Perceptron for their branch prediction. I would say they're talking about support for ML speed ups in hardware but maybe the plans also include a more complex complete neural net in hardware for data prediction.

unixhero · on Feb 22, 2023

What are hybrid algorithms supposed to be anyways? Half algorithm & half dictionary?

Half algorithm & half user manual?

Half algorithm & half class?

jacoblambda · on Feb 22, 2023

Generally the concept of hybrid algorithms is:

- Half "hard maths" algorithms. i.e. cominbatorics, geometry, etc.

- Half "fuzzy maths" algorithms. i.e. heuristics, approximation, machine learning.

The idea being to solve the parts that can be easily solved by hard maths with those hard maths so that you can reduce the problem space for when you apply the fuzzy maths to solve the rest of the problem.

In other words, it's taking the problem, breaking out discrete pieces to solve with well established hard maths, using heuristics & numerical solutions to tackle the remaining known problems without "easy" analytical solutions, then using ML to fill in all the gaps and glue the whole thing together.

Certhas · on Feb 22, 2023

Half conventional HPC simulation that runs best on CPU, half Neural Networks that need GPUs. As proposed for example here

https://www.nature.com/articles/s41586-019-0912-1

and here

https://www.nature.com/articles/s42256-021-00374-3

> The next step will be a hybrid modelling approach, coupling physical process models with the versatility of data-driven machine learning.

The Frontier HPC system that AMD just delivered is aimed fully at that problem.

https://en.wikipedia.org/wiki/Frontier_(supercomputer)

ororroro · on Feb 22, 2023

The article gives a concrete example in the same paragraph: "For example, AI algorithms could get close to a solution quickly and efficiently, and then the gap between the AI answer and the true solution can be filled by high-precision computing."

Interestingly the example is backwards (statistical reasoning first, hard reasoning second) compared to traditional usage of "hybrid" in AI and control contexts.

Dylan16807 · on Feb 22, 2023

That's not concrete at all.

shmerl · on Feb 22, 2023

From the article it looks like half AI guessing the solution, half some static algorithm fixing the result to be better. Not sure how it's supposed to really work.

adgjlsfhk1 · on Feb 22, 2023

one variant of this is sciml approaches where you use an ODE solver wrapped around a NN. the ODE solver guarantees you get the right conversation laws which NNs don't do well and the NN is more accurate than the hand written model since it doesn't ignore the higher order effects.

LilBytes · on Feb 22, 2023

Half SQL statements and half array.sort() methods I assume.

abudabi123 · on Feb 22, 2023

    (solve (this :by strong-ai) (that :by weak-ai))

JustSomeNobody · on Feb 22, 2023

> The last thing though - “hybrid algorithms leveraging AI efficiency” sounds an awful lot like a buzzword sales pitch.

It's supposed to. Investors want to hear this, not some crap about efficiency, when the entire world is talking about AI.

Edit: TBC, I care about efficiency and it's not crap, but that's likely the view of investors.

pessimizer · on Feb 22, 2023

> The last thing though - “hybrid algorithms leveraging AI efficiency” sounds an awful lot like a buzzword sales pitch.

Oddly, to me this just sounds like efficiency gains by potentially introducing massive security holes i.e. the vector of the Meltdown/Spectres of the future. It also seems like they're trying to sell AI as some sort of secular qubit that they'll be error-correcting.

agumonkey · on Feb 22, 2023

Not a cpu designer, but I remember AMD using AI for branch prediction since long ago. Hopefully they still mention AI in that sense and not only branding.

apatheticonion · on Feb 22, 2023

From the perspective of a consumer, my MacBook Pro is basically the perfect laptop, at least in theory.

I love the battery life and performance of the hardware, not to mention the unrivaled build quality of the MacBook (screen, trackpad, keyboard).

In practice, however, MacOS limits the capabilities of the hardware such that I cannot daily drive my MacBook Pro as a work or personal computer (poor containerization support, an annoying development toolchain, and no _real_ support for video games).

When Asahi Linux is mainlined, stable and features full hardware acceleration - the MacBook running Linux will likely be the best laptop money could buy. Until then, please AMD, Intel, release some mobile hardware that's at least as good. It sucks so bad seeing what is possible with today's technology but that being exclusive to a company unsuccessfully determined to ring fence you into their API ecosystem.

addandsubtract · on Feb 22, 2023

My biggest complaint is that I have a "16-core Neural Engine" in my MacBook, but nothing that can be run on it. Sure, internal macOS tools such as the fingerprint reader or even webcam might use it, but not a single ML project on github makes use of it. I can be lucky if ML projects don't depend on CUDA to run.

Tepix · on Feb 22, 2023

Here's one project

https://machinelearning.apple.com/research/stable-diffusion-...

https://huggingface.co/blog/diffusers-coreml

Not surprisingly, they are quite a bit slower than those power-hungry nVidia GPUs.

thecupisblue · on Feb 23, 2023

Honestly, I think I'll sell my MacBook Pro and just buy a new one with less cores. Bought it expecting to be able to abuse it towards ML, but it's barely even supported.

bayindirh · on Feb 22, 2023

From my experience developing C++ on it, as long as you don't use anything exotic, everything is portable (i.e. just recompile).

On the other hand, for the missing apps and other stuff, I'm running a Linux VM via VMWare Fusion Pro. It works efficiently, and interoperability is good. Just add another internal network card to the VM, and keep an always on SSH/SFTP connection. Then everything works seamlessly.

Never had any problems for 7 or so years when developing that application and writing my Ph.D. in the process.

dagw · on Feb 22, 2023

From my experience developing C++ on it, 99+% of everything is portable, but the ~1% that isn't, causes a disproportionate amount of annoyances and extra work. We still have a program that we have to run in an x86 docker container (with all the problems that brings), just to get it to work on our M1 Macs.

The being said, the M1 Pro is a more than good enough piece of hardware that I'm willing to put up with this.

sufehmi · on Feb 22, 2023

Been there done that since the era of Powerbooks - and it always causes my laptop's battery to deplete much faster.

Because I'm using my laptop to, well, work on the go a lot; this is a showstopper.

bayindirh · on Feb 22, 2023

From my experience, the newest VMWare versions and Linux kernels are very good at conserving power when the VM is mostly idle.

I'm doing on that my 2014MBP, and it didn't cut the endurance to half, but need to re-test it for exact numbers. However, it doesn't appear in "power hungry applications" list unless you continuously compile something or run some service at 100% CPU load. Also, you can limit the resources it can use if you want to further limit it down.

sufehmi · on Feb 23, 2023

That's so good to hear. Very well done to everyone involved.

inasio · on Feb 22, 2023

I tried a few apple watches, and almost got used to the routine of daily charging, but other annoyances made me try others. Eventually I settled for a Garmin smartwatch and the difference that having to charge once every 10-15 days is huge (you give up a few features, but surprisingly little, at least for my use case). I hope this new emphasis on energy efficiency enables this on laptops (and cellphones)

datadeft · on Feb 22, 2023

> poor containerization support

I use UTM and Docker on a daily basis and it is extremely smooth. What exactly is missing?

> an annoying development toolchain

What are you talking about exactly? For example most Python and Rust builds just work out of the box.

> no _real_ support for video games

https://docs.unity3d.com/Manual/Metal.html

These claims look like a bit of an exaggeration.

apatheticonion · on Feb 24, 2023

Wanted to have a little rant about this:

> Rust builds just work out of the box

Works great until you're the author of an application written in Rust and want to distribute MacOS binaries which are automatically generated in CI/CD.

The only (legal) way to compile a Rust binary that targets MacOS is on a Mac. So your CI needs a special case for MacOS running on a MacOS agent. Annoyingly, cross compiling CPUs architectures doesn't work so you need to an Intel and arm64 Mac CI agent - the latter being unavailable via Github actions.

To make things even more bizarre, Apple doesn't offer a server variant of MacOS or Mac hardware, which seems to indicate they expect you to manually compile binaries on your local machine for applications you intend to distribute.

moondev · on Feb 22, 2023

nested virtualization is missing on m1/m2 on both MacOS and Asahi - it's unclear if this is a hardware or software limitation.

transpute · on Feb 22, 2023

Asahi said it's present in M2 hardware, will likely be supported in future MacOS.

https://social.treehouse.systems/@marcan/109838053800961073

  Asahi Linux is introducing support for some brand new Apple Silicon features faster than macOS.. M1 has a virtual GIC interrupt controller for enhanced virtualization performance. Linux supports it, macOS does not.. M2 introduced Nested Virtualization support. The patches for supporting that on Linux are in review; macOS still doesn't support it.

moondev · on Feb 22, 2023

Thanks for link, that is very exciting if it works as expected.

Curious what will enable this in M2 vs M1. Looking at https://developer.arm.com/documentation/102142/0100/Nested-v... it appears to indicate nested virtualization is in Armv8.3-A and both M1 and M2 are ARMv8.5-A according to https://en.wikipedia.org/wiki/Apple_M1 and https://en.wikipedia.org/wiki/Apple_M2

Will be interesting to see what other arm cpus gain this with the Linux patches.

Have you used (non-nested) virtualization at all on Asahi? If so what is your experience with performance and overall thoughts so far?

amaranth · on Feb 23, 2023

Apple appears to have a one of a kind special license for ARM (due to being a founder of the company) so they can pick and choose otherwise "required" extensions to support and add their own extensions as well. You can't directly compare an Apple design to a specific ARM version because of this.

smoldesu · on Feb 22, 2023

> What exactly is missing?

Kernel integration and a virtualized filesystem that isn't bottlenecked by APFS. Docker is excruciating on Darwin systems.

> What are you talking about exactly?

Apple makes hundreds of weird concessions that are non-standard on UNIX-like machines. Booting up a machine with zsh and pico as your defaults is not a normal experience for most sysadmins, nevermind the laundry-list of MacOS quirks that make it a pain to maintain. For personal use, I don't think I'd ever go back to fixing Mac-exclusive issues in my free time.

> no _real_ support for video games

Besides Resident Evil and No Man's Sky (this generation's Tomb Raider and Monument Valley), nobody writes video games for Metal unless Apple pays them to.

For a while, MacOS had a working DirectX translation stack for Windows games, too. Not since Catalina though.

shepherdjerred · on Feb 22, 2023

> Kernel integration and a virtualized filesystem that isn't bottlenecked by APFS. Docker is excruciating on Darwin systems.

Docker is great as long as you don't use bind mounts. I use it daily for development in dev containers.

> Besides Resident Evil and No Man's Sky (this generation's Tomb Raider and Monument Valley), nobody writes video games for Metal unless Apple pays them to.

There are plenty of great games on macOS. Factorio, Civilization, League of Legends, Minecraft. But you're right that there aren't too many AAA games.

jeberle · on Feb 22, 2023

> the unrivaled build quality of the MacBook (screen, trackpad, keyboard)

Typing this on a ThinkPad X13, after years of MBPs, I beg to differ. The screen on the ThinkPad is better, even at a slightly lower resolution of 1920 x 1200. It's IPS like the Mac of course, but also Anti-Reflective (matte), which Apple hasn't offered since 2008?

The ThinkPad has Left, Right, and Scroll mouse buttons, as well as a TrackPoint stick. Not so on the Mac.

Finally, the keyboard. You're going to laud Apple for their quality keyboards, really? The ThinkPad has a nicer keyboard feel (subjective, I know), has actual Home, End, PageUp, PageDn, and Delete keys, two Ctrl keys, and praise Jesus, gaps between the function keys, so you can use them confidently w/o looking.

duckmysick · on Feb 22, 2023

> no _real_ support for video games

Would Wine (Proton) work out of the box on M1 Linux? My knowledge of Wine is limited. Based on the How Wine works 101 article [1]:

> The code inside the executables is “portable” between Windows and Linux (assuming the same CPU architecture).

because the CPU architecture is different, Wine would need changes to support it too - just like x32 and x64. Is that correct?

1 - https://werat.dev/blog/how-wine-works-101/

scns · on Feb 22, 2023

It will be possible in the near future.

https://www.phoronix.com/news/Hangover-0.8.1-Released

gorbypark · on Feb 22, 2023

Apple has released Rosetta for linux. I believe the use case is running x86 binaries inside of an Arm virtual machine on Apple Silicon, instead of emulating an entire x86 CPU and running the entire OS as x86. Apparently it works pretty good and some people have even used it on non-Apple arm chips. Anyways, I wonder if it could be used in combination with something like Proton to emulate x86 on linux.

oblio · on Feb 22, 2023

It's not impossible but it is extremely hard to do in an efficient way.

rfoo · on Feb 22, 2023

> poor containerization support, an annoying development toolchain

Just run an always-on (headless!) Linux VM in the background, and don't use the host macOS for anything besides desktop apps (Slack, VSCode, Browser, mpv, terminal emulator but always ssh into the VM, etc). The same way you deal with a Windows machine.

This works good enough unless you work on hypervisors or other bare metal only tech. But hey, that's currently non-existing on M1 Macbooks (or undocumented and locked to Apple ecosystem) anyways.

> and no _real_ support for video games

That's the real deal breaker if you are into gaming.

teekert · on Feb 22, 2023

The Windows machine (X86) has WSL2 though, which seamlessly integrates with VSCode. Add Windows Terminal and for me it made Windows the better dev platform. I still prefer Linux though.

Sometimes working local is just the easiest/fastest most convenient.

rewgs · on Feb 22, 2023

> Just run an always-on (headless!) Linux VM in the background, and don't use the host macOS for anything besides desktop apps (Slack, VSCode, Browser, mpv, terminal emulator but always ssh into the VM, etc).

This is what I do. Specifically, I use Canonical's Multipass, and treat tmux as my "window manager," -- I even mapped my iTerm profile's Command+[key] to the hex code for tmux-prefix+[key], so that Command essentially feels like the Super key in, say, i3. For example, rather than having to type Control+p h, Command+h selects the pane to the left.

The Multipass VM is flawless. Closing the laptop doesn't shut it down, and with the tmux resurrect plugin, sessions persist between Mac restarts (which are rare). If I didn't know better, I'd think it was just a native terminal session.

If I need proper x86_64, I just ssh into my super beefy Linux NAS at home via Tailscale. Both Linux machines are identical in terms of dotfiles/etc, so it feels exactly the same.

I've truly never been happier with Linux. I no longer obsess over my window manager (which used to be a serious time sink for me), and I still get what is IMO the best desktop experience via my Mac. The only tinkering I do is checking out the occasional new neovim plugin, but I really enjoy doing that, as it has a tangible benefit to my dev workflow, and kind of feels like gardening, in a way -- I like the slow but persistent act of improving and culling my environment.

I still have a PC, but I actually recently uninstalled WSL2. It never felt truly finished or "right", and Windows Terminal can be incredibly sluggish -- keystrokes have far more latency than iTerm. I've actually started to embrace just letting "Windows be Windows," even learning Powershell (and enjoying it more than I'd expected).

I've also pretty much moved from gaming on a PC to PS5. So, for me personally, I don't really see a place for Windows anymore. Every single time I boot into my PC, something is wrong -- most recently, it literally won't shutdown unless I execute `shutdown /s`, and no amount of troubleshooting has been able to fix it. I know Windows like the back of my hand, and still it's a constant feeling of death by a thousand cuts.

thereddaikon · on Feb 22, 2023

MacOS has seemingly gotten worse over the years not better. Features removed. Interface has been changed for the worse. The drive to unify the desktop experience with iOS. I could go on.

unxdfa · on Feb 22, 2023

In the same situation. I’ve unfortunately realised that “worse is better” applies to hardware too if it supports the software you need.

vasco · on Feb 22, 2023

Millions of people use macs for work successfully.

unxdfa · on Feb 22, 2023

I did for ages. But there’s an impedance mismatch for a lot of people. Sometimes it’s a square peg for a round hole.

baq · on Feb 22, 2023

Windows too. That isn’t the point.

kanbara · on Feb 22, 2023

i've used a mac for development for various projects which have been globally impactful for over a decade. it's literally unix, man.

docker on mac can be improved, but if i'm developing for other architectures it's much easier to just test natively. toolchain for all the languages i use is exactly the same as any other *nix.

gaming i'll give you a point, but that's why i have windows dual boot at home. /shrug

sufehmi · on Feb 22, 2023

"it's literally unix, man"

Yup, it's NOT Linux.

Been using MacOSX since Powerbook up to first gen Macbook - and I finally gave up: I installed Linux on it instead.

Pretty much all servers are running Linux even back then. Using Unix on work computer/laptop causes way too many encounters with various quirks and glitches. It continuously drags down your productivity too many times everyday.

After much stress, I installed Linux on my Macbook instead - but then I encountered various hardware-related glitches instead.

So I finally gave up, and bought a Thinkpad.

unxdfa · on Feb 22, 2023

It’s Unix but it’s not Linux. When Linux is your target it makes sense to use it on your development stack entirely. The irony being that Microsoft seem to make the best Linux OS for development for my use cases of docker / server side / cloud operations.

And it runs Office better than Linux or Mac.

sufehmi · on Feb 22, 2023

"And it runs Office better than Linux or Mac"

Yup, been doing that for decades. It's called "vendor lock-in". And its unethicality has been discussed comprehensively for decades as well.

And when countries tried to move into an open document format - they'll bully the country, using the strong arm of the Uncle Sam, until they're back to MS Office again.

So when people are amazed by Bill Gates' charities - I don't. His money comes from the sufferings of countries.

unxdfa · on Feb 22, 2023

I understand that corporations are bastards and built on foundations of the crushed skulls of children and involuntary human sacrifice but Office on the Mac is not a bad product and is improving. But the windows version has just been around longer and had more work done on it. And don’t get me started on LibreOffice - it’s buggy as hell. Even more than Office on a bad day

I’ve met Bill. He wouldn’t be out of place among HN’s defective half: the dubious pro SaaS VC funded US university alumni…

sufehmi · on Feb 23, 2023

I wasn't talking about the product - I was talking about MS using the product to lock the whole world into its own =proprietary= format (so no one could reliably open & process it - and then others got blamed for it, not Microsoft), AND then aggressively attack those who try to escape from its lock, even countries.

unxdfa · on Feb 23, 2023

All the protocols and standards are documented here https://learn.microsoft.com/en-us/openspecs/main/ms-openspec...

jb1991 · on Feb 22, 2023

> The irony being that Microsoft seem to make the best Linux OS for development

WSL?

unxdfa · on Feb 22, 2023

Yes WSL2

bharathyes · on Feb 22, 2023

while WSL2 is great for what it is it just doesn't compare to working in a linux distro. there are a lot of pain points where external tools wont always work well with WSL.

I also like the productivity customizations posseble in Linux while the same are difficult or impossible on Windows.

of course if your work is tied into the windows eco system having WSL is good to have.

unxdfa · on Feb 22, 2023

Completely agree. For me it’s a trade off. I’d rather use Linux on the desktop myself but it doesn’t work for me or the business I work for.

apatheticonion · on Feb 22, 2023

Controversially, I have a better development experience on Windows using msys2 + zsh (basically "git-bash" on steroids). I would put that development experience almost on par with MacOS.

WSL2's virtualized workflow just causes too many issues for me. WSL1 was better IMO but it wasn't significantly better than msys2 and also had issues (like you still need remote development tools to mount codebases inside editors) - unless you want to run/develop Linux binaries while on Windows.

For anything that isn't making basic non containerized applications (simple web applications, web servers), Windows is pretty good.

For anything more involved, requires multiple containers/compose/etc, I prefer Linux as it has the tools I need available natively and no gotyas or performance penalties.

That said, credit to Microsoft on WSL2. The auto-scaling hardware provisioning inside the VM has made containerized workflows on Windows much better. To me, it's just not better than running Linux inside VMWare/Hyper-V/VBox and "DIY"ing WSL2 yourself, something I had been doing for years before WSL2 anyway. WSL2 is more fool-proof then hand-rolling a Linux VM, so there is that.

howinteresting · on Feb 22, 2023

WSL2 is great if you're stuck on Windows, but it's still really not there yet. For example:

https://github.com/microsoft/WSL/issues/8725

unxdfa · on Feb 22, 2023

Better to run docker native than docker desktop. Doesn’t leak.

jb1991 · on Feb 22, 2023

That's interesting to hear. I've not used Windows in over a decade (I primarily work on mac) but I've heard that WSL was quite compromising and not a great experience for people who primarily use Linux tools.

weeeeelp · on Feb 22, 2023

bash (and other shells), coreutils, pipes, git, text/cli utilities etc. work just fine on WSL, I'd call them Linux tools. My in-shell workflow consists of using mainly those + VSCode (with the WSL plugin) + ssh'ing somewhere now and then, and it's entirely sufficient for this purpose. I haven't tried running typical webserver/db services on it though.

unxdfa · on Feb 22, 2023

I’ve run 120 containers in native docker on it and it worked fine :)

howinteresting · on Feb 22, 2023

Mac being a certified Unix just means that Unix certification is meaningless, bought and paid for drivel. OS X was a certified Unix before it had atomic renames!

georgeburdell · on Feb 22, 2023

Su is just talking up AMD’s strengths, many of which come care of TSMC and for which the original R&D was largely funded by Apple. She is not wrong, and AMD has certainly made large gains in HPC recently, but AMD does not monopolize all possible paths to success here.

greenknight · on Feb 22, 2023

I agree that a lot of AMDs wins have come from TSMC. That being said, I feel like the biggest win for them over the past 5 years (well 6 now)... was moving to chiplets, which all started at Global Foundaries. Having the same chip, from the lowest end consumer chip all the way to the top end server chip means that they can spend less time and cost developing up processors for every segment, and they just bin the chips in terms of quality.

Intel is starting to make changes towards this structure but they haven't fully committed to it yet.

bitL · on Feb 22, 2023

A move to chiplets would be just prolonging their terminal suffering if they didn't replicate Haswell in the first Ryzen chips, making them somewhat performance-competitive and due to chiplets also economically viable with the ability to wow consumers with the first 8 core desktop chips and later 12/16/24/32-core ones.

spixy · on Feb 22, 2023

5nm? It was also funded by AMD since it is their customer, and by many other customers.

georgeburdell · on Feb 24, 2023

AMD just released a 5nm product a few months ago. Apple’s M1 is a year older. Apple funded 5nm

andrewstuart · on Feb 22, 2023

AMD is having some challenges at the moment.

In particular, the RDNA3 graphics card line launch has been a dud.

The Nvidia 4090 turned out to be far ahead of the AMD 7900XTX.

Then there turned out to be an overheating issue on the AMD cards.

And now it just looks like both Nvidia and AMD are price gouging GPU buyers instead of competing with each other. They are deliberately keeping prices high and creating artificial shortages of GPUs, because that's what kept prices high during covid/crypto mining.

I used to be really cheering for AMD as the underdog, but I guess its true that none of these companies are your friend, they're just there to shake you down.

AMD had a chance to really pull ahead of Nvidia by being "the good guy" and actually offering end users great value for money, but instead they've chosen to emulate Nvidia.

Substantial customer good will has been lost by AMD.

m00x · on Feb 22, 2023

Their graphics division has certainly been lacking vs Nvidia's cards, but are still far ahead of Intel's. Their desktop and server CPUs are crushing the market.

> They are deliberately keeping prices high and creating artificial shortages of GPUs

That's simply not true. Check out TechTechPotato's youtube channel for the explanation on this, but under-shipping isn't price fixing. They're just shipping less to distributors because there's less demand, allowing the distributors and retailers to keep a stable amount on hand.

makomk · on Feb 22, 2023

One big reason that demand is low is that the current generation of GPUs are way too expensive, with top-end GPUs about twice what they used to cost a few years ago, very poor improvements in performance-per-$, and anythign below the top end offering even more dubious value for money especially on the AMD side. Originally those high prices were a result of high demand compared to supply, but the companies seem to have gotten greedy and decided they can permanently keep prices high and just throttle back supply to keep them there.

kungito · on Feb 22, 2023

One big reson why the demand is low is because 1080ti or rx580 can still service 95% of gaming needs so what's the point of upgrading?

romantomjak · on Feb 22, 2023

I’m consistently surprised just how good the RX580 really is. It can handle most games on medium at 1440p, but I tend to just chill on ultra at 1080p. Plus I play on TV, so the smaller resolution is actually better from where I’m sitting

snvzz · on Feb 22, 2023

RDNA2 (particularly, the narrower range from 6600 to 6800), is a huge step up in performance per watt. The lowest end one in there, 6600, is faster than a vega64 (which is much faster than rx580), yet uses less than half the power.

RDNA3's lower end chips, once they hit the market, are expected to further improve on this.

Most gamers won't upgrade to the current RDNA3 chips, because the current RDNA3 chips are top of the line, expensive, ~300w monsters.

0cf8612b2e1e · on Feb 22, 2023

Amen to that. The need for gamers to be on the hardware treadmill is no longer relevant. Five+ year old hardware can still run basically anything, albeit at reduced fidelity.

I keep eyeing a new build, but realistically, I know it’s just a vanity project because so few games will take full advantage of the better hardware. My favorite games in the past years could have run on ten year old hardware.

i80and · on Feb 22, 2023

I think "service" is the key word here: I have an RX 580, and while it's kind of an incredible card in its longevity, it's really creaky at 1440p even with older games.

Performance per watt has really come a long way since GCN.

formerly_proven · on Feb 22, 2023

> Their desktop and server CPUs are crushing the market.

Seeing plenty of people choose 13th gen over Zen 4, the platform pricing for Zen 4 just wasn't very attractive. AMD [had to] significantly cut prices across the lineup by 20-30 %.

baq · on Feb 22, 2023

Let’s not forget that 12th and 13th gen are actually good chips

Dalewyn · on Feb 22, 2023

Also worth remembering that for the vast majority of people, any performance differences between Intel and AMD are utterly and absolutely insignificant as to be completely meaningless.

Nobody needs two-digit CPU core counts and 5~6GHz clock speeds to do their emails, communicate on Skype/Discord/Teams/Slack/Zoom/whatever, browse Facebook and Twitter, watch Youtube, and even play some vidja gaemz. An i3 or even a god damn Celeron is perfectly fine.

So at that point, Intel's superior stability (read: less jank) wins out by a hair and otherwise nobody really cares because there's no practical difference. The vast majority of people will just buy whatever's cheaper or just happens to be on the display table that day.

0cf8612b2e1e · on Feb 22, 2023

I keep having similar thoughts, but the software industry has shown a remarkable capacity to squander available hardware resources.

philliphaydon · on Feb 22, 2023

> The Nvidia 4090 turned out to be far ahead of the AMD 7900XTX.

7900XTX is on par or better than 4090 in alot of games, as well as some games favouring nvidia more than amd... (obviously not talking about Ray Tracing)

Coupled with a lower price...

I'm not quite sure what you're talking about since you're saying the oppisite is all reviews I've seen.

smolder · on Feb 22, 2023

Maybe you're confusing the 4080 with the 4090?

ripper1138 · on Feb 22, 2023

You sure? 7900 is probably better value but the 4090 outperforms it by at least 20% in games.

philliphaydon · on Feb 22, 2023

It really depends on the games. Modern Warefare 2 for example, 7900xtx outperforms 4090 in every resolution. While in fortnite the 4090 outperforms the 7900xtx even more.

@ 4k gaming then 4090 on average is /much/ better than the 7900xtx, but looking at 1080/1440p that lead deminishes alot.

edit: At the end of the day tho its all too damn expensive now.

senttoschool · on Feb 22, 2023

The lead diminishes at 1080p/1440p because games become more CPU bottlenecked than GPU.

fsckboy · on Feb 22, 2023

>They are deliberately keeping prices high and creating artificial shortages of GPUs

you've seen evidence for this, or it's your opinion?

>AMD had a chance to really pull ahead of Nvidia

with a graphics card line that you just told us is far behind Nvidia's?

Brybry · on Feb 22, 2023

AMD said it themselves. They're "undershipping" to "reduce downstream inventory". [1]

The take from multiple[2][3][4] journalists on that call is they're trying to avoid a "supply glut" and maintain high prices.

[1] https://seekingalpha.com/article/4574091-advanced-micro-devi...

[2] https://www.pcgamer.com/amd-undershipping-chips-to-help-prop...

[3] https://gamerant.com/amd-undershipping-graphics-cards/

[4] https://www.extremetech.com/computing/342781-amd-ceo-says-it...

fsckboy · on Feb 23, 2023

you put "supply glut" in scare quotes, but again, do you know that their motive is cynical?

they're talking to the public, i.e. investors, and they could easily be saying "our current sales figures are lower not because our product is not popular, but because there is currently a large inventory downstream to meet current demand. When that glut is cleared, expect our sales to resume."

if downstream sellers have sufficient inventory, the only way to induce them to buy more would be for AMD to drop prices. If AMD cards are in hot demand and selling out immediately, restricting supply would be artificially boosting prices. But if the downstream pipeline is full, it's not right to say that reduced demand from wholesalers while that glut clears is AMD artificially boosting prices.

ommz · on Feb 22, 2023

>you've seen evidence for this, or it's your opinion?

Probably alluding to the under-shipping of GPUs; stated by Lisa Su during the recent investor call[0]

[0]https://www.fool.com/earnings/call-transcripts/2023/02/01/ad...

lmm · on Feb 22, 2023

> with a graphics card line that you just told us is far behind Nvidia's?

Cheap and good enough can absolutely be a way to win.

andrewstuart · on Feb 22, 2023

Yup.

If AMD actually competed on price then they'd be shifting the GPU market away from Nvidia.

They seem content however just be second best but making bank.

layoric · on Feb 22, 2023

They had a chance to meet a much better price to performance but IMO nvidia showed that the market is willing to pay a premium for GPUs and both major players are exploiting that.

Nvidia relaunched the 4080 12gb as the 4070ti reducing the price by $100. There has to be one hell of a profit margin on the high end cards.

qwery · on Feb 22, 2023

> They are deliberately keeping prices high and creating artificial shortages of GPUs,

I'm not sure if you're referring to AMD "undershipping", but if you read how they use that word it's pretty clearly a bad thing. AMD has been shipping less (to retailers) than what they could, or would like to.

neogodless · on Feb 22, 2023

> The Nvidia 4090 turned out to be far ahead of the AMD 7900XTX.

I just don't get why most people care? It's 60% more expensive, too. Even the 7900XTX is ludicrous at $1000.

Give me a $400 card from this generation that competes with a $500 card from the previous generation, and I'll call it a win.

A $1600 card winning anything seems like an irrelevant battle. Is the volume / sales for those cards high enough to be the real focus, when cards like the GTX 1060 were the volume leaders by a long shot?

madmax96 · on Feb 22, 2023

This is not a completely hashed-out thought. But I'll share it and see what others think.

My impression is that the simplest way to improve energy efficiency is to simplify hardware. Silicon is spent isolating software, etc. Time is spent copying data from kernel space to user space. Shift the burden of correctness to compilers, and use proof-carrying code to convince OSes a binary is safe. Let hardware continue managing what it's good at (e.g., out-of-order execution.) But I want a single address space with absolutely no virtualization.

Some may ask "isn't this dangerous? what if there are bugs in the verification process?" But isn't this the same as a bug in the hardware you're relying on for safety? Why is the hardware easier to get right? Isn't it cheaper to patch a software bug than a hardware bug?

daemontus · on Feb 22, 2023

A good reason why memory virtualization has not been "disrupted" yet seems to be fragmentation. Almost all low level code relies on the fact that process memory is continuous, it can be extended arbitrarily, and that data addresses cannot change (see Rust `Pin` trait). This is an illusion ensured by the MMU (aside from security).

A "software replacement for MMU" would thus need to solve fragmentation of the address space. This is something you would solve using a "heavier" runtime (e.g. every process/object needs to be able to relocate). But this may very well end up being slower than a normal MMU, just without the safety of the MMU.

ignoramous · on Feb 22, 2023

> This is an illusion ensured by the MMU (aside from security).

Even in places where DMA is fully warranted, IOMMU gets shoe-horned in. I don't think there's any running away from costs to be paid for security (not the least for power-efficiency reasons).

stonemetal12 · on Feb 22, 2023

I doubt it. Special purpose hardware is usually more efficient than a software implementation running on general purpose hardware.

scraptor · on Feb 23, 2023

But in this case the job of the hardware is to prevent the software from doing things, and it pays a constant overhead to do so whereas static verification as integrated into a compiler would be a one-time cost.

tsegratis · on Feb 22, 2023

A problem to consider:

Arbitrarily complex programs makes even defining what is and isnt a bug arbitrarily complex

Did you want the computer to switch off at random button press; did you want two processes to swap half their memory. Maybe, maybe not

A second problem to consider is that verification is arbitrarily harder than simply running a program -- often to the extent of being impossible, even for sensible and useful functionality. This is why programs that get verified either don't allocate or do bounded allocations. But unbounded allocation is useful

It is possible to push proven or sanboxed parts across the kernel boundary. Maybe we should increase those opportunities?

Also separate address spaces simplify separate threads -- since they do not need to keep updating a single shared address space. So L1 and L2 cache should definitely give address separation. Page tables is one way to maintain that illusion for the shared resource of main memory... Probably a good thing

That's not to say there isn't a lot of space to explore your idea. It is probably an idea worth following

One final thought: verification is complex because computers are complex. Simplifying how processes interact at the hardware level. Shifts the burden of verification from arbitrarily long running and arbitrarily complex and changing software; to verifying fixed and predefined limitations on functionality. That second one has got to be the easier to verify

Taek · on Feb 22, 2023

I like this idea, and given today's technology it feels like something that could be accomplished and rolled out in the next 30 years.

If the compiler (like rust) can prove that OOB memory is never accessed, the hardware/kernel/etc don't need to check at all anymore.

And your proof technology isn't even that scary: just compile the code yourself. If you trust the compiler and the compiler doesn't complain, you can assume the resulting binary is correct. And if a bug/0day is found, just patch and recompile.

PeterisP · on Feb 22, 2023

The reality is that we do want to run code developed and compiled and delivered by entities we don't fully trust and who don't want to provide us the code or the ability to compile it ourselves. And we also want to run code that can dynamically generate other code while it's doing so - e.g. JIT compilers, embedded scripting languages, javascript in browsers, etc.

Removing these checks from the hardware is possible only if you can do without it 100% of the time; if you can trust that 99% of the binaries executed, that's not enough, you still need this 'enforced sandboxing' functionality.

plq · on Feb 22, 2023

This sounds like an exokernel design. What forces you to use the compiler that generates the trusted code to replace the MMU?

verdagon · on Feb 22, 2023

Perhaps instead of distributing program executables, we can distribute program intermediate representations and then lazily invoke the OS's trusted compiler to do the final translation to binary. Someone suggested a Vale-based OS along these lines, it was an interesting notion.

fireant · on Feb 23, 2023

WASM could be such an IR unironically https://github.com/nebulet/nebulet. But I doubt that we would be gaining performance/efficiency this way.

demindiro · on Feb 23, 2023

You're thinking of single address space OSes.

I do not believe such OSes can ever be secure given how often vulnerabilities are found in web browsers's JS engines alone. Besides, AFAIK the only effective mitigation against all Spectre variants is using separate address spaces.

KMag · on Feb 22, 2023

My understanding is that's more or less what Microsoft was looking at in their Midori operating system. They weren't explicitly looking to get rid of the CPU's protection rings, but ran everything ring 0 and relied on their .NET verification for protection.

tsss · on Feb 22, 2023

eBPF does this, but its power is very limited and it has significant issues with isolation in a multi-tenant environment (like in a true multi-user OS). Beyond this one experiment, proof-carrying code is never going to happen on a larger scale: holier-than-thou kernel developers are deathly allergic to anything threatening their hardcore-C-hacker-supremacy and application developers are now using Go, a language so stupid and backwards it's analog to sprinting full speed in the opposite direction of safety and correctness.

cletus · on Feb 22, 2023

Put another way: if AMD (and especially Intel) don't do something about this they're going to get completely eaten alive by ARM.

The amount of processing power available in a modern smartphone is truly mind-boggling. I'd love to see a chart showing the chip cost and energy cost of the power on an M1 chip in each previou syear. I would guess that 30+ years ago you'd be in the millions of dollars and watts of power but that's just a guess.

As we see from the modern M1/M2 Macbooks, these lower TDP SoCs are more than capable of running a computer for most people for most things. The need for an Intel or AMD CPU is shrinking. It's still there and very real but the waters are rising.

PragmaticPulp · on Feb 22, 2023

> Put another way: if AMD (and especially Intel) don't do something about this they're going to get completely eaten alive by ARM.

AMD’s latest parts are actually quite close to M1/M2 in computing efficiency when clocked down to more conservative power targets.

They crank the power consumption of their desktop CPUs deep into the diminishing returns region because benchmarks sell desktop chips. You can go into the BIOS and set a considerably lower TDP limit and barely lose much performance.

Where they struggle is in idle power. The chiplet design has been great for yields but it consumes a lot of baseline power at idle. M1/M2 have extremely efficient integration and can idle at negligible power levels, which is great for laptop battery life.

brigade · on Feb 22, 2023

People keep repeating that Zen4 and M1 are close in efficiency but what is the source with actual benchmarks and power measurements?

At any rate, using single points to compare energy efficiency isn't a good comparison, unless either the performance or power consumption of the data points comparable. Like, the M1's little cores are 3-5x even more efficient when operating in an incomparable power class, and Apple's own marketing graphs show the M1's max efficiency is also well below its max performance [1]

Those perf/power curves are the basis of actually useful comparisons; has anyone plotted some outside of marketing materials? It might even be possible under Asahi.

[1] https://www.apple.com/newsroom/2021/10/introducing-m1-pro-an...

rfoo · on Feb 22, 2023

> but what is the source with actual benchmarks and power measurements?

Every notebookcheck.net review. For example https://www.notebookcheck.net/AMD-Ryzen-7-6800U-Efficiency-R...

They also do the same to a lot more laptops they test.

Look at the multi-core results, Zen3+ comes pretty close.

Also the single thread result shows what GP said: AMD CPU drains too much power at idle.

senttoschool · on Feb 22, 2023

Their results are invalid because they used Cinebench. Cinebench uses Intel Embree engine which is hand optimized for x86, not ARM instructions. In addition, Cinebench is a terrible general purpose CPU benchmark.[0]

Imagine if you're testing how energy efficient an EV and a gas car is. But you only run the test in the North pole, where the cold will make the EV at least 40% less efficient. And then you make a conclusion based solely on that data for all regions in the world. That's what using Cinebench to compare Apple Silicon and x86 chips is like.

[0] https://www.reddit.com/r/hardware/comments/pitid6/eli5_why_d...

AshamedCaptain · on Feb 22, 2023

Cinebench/4D does have "hand-optimized" ARM instructions. It would be a disaster for the actual product if it didn't. That's what makes it interesting as a benchmark: that there's a real commercial product behind it and a company interested in making it as efficient as possible for all customer CPUs, not just benchmarking purposes.

Albeit for later releases this is less true since most customers have switched to GPUs...

senttoschool · on Feb 23, 2023

Cinebench/4D does have "hand-optimized" ARM instructions.

It doesn't. As far as I know, everything is translated from x86 to ARM instructions - not direct ARM optimization.

Cinema4D is a niche software within a niche. Even Cinema4D users don't typically use CPU renderer. They use the GPU renderer.

The reason Cinebench became so popular is because AMD and Intel promote it heavily in their marketing to get nerds to buy high core count CPUs that they don't need.

Panzer04 · on Feb 22, 2023

Generally you see this in the lower class chips that aren’t overclocked to within an inch of instability. It’s not uncommon to see a chip that uses 200w to perform 10% worse at 100w, or 20% worse at 70w.

I can’t be bothered to chase down an actual comparison, but usually you’ll see something along those lines if you compare the benchmarks for the top tier chip with a slightly lower tier 65w equivalent.

scns · on Feb 22, 2023

https://news.ycombinator.com/item?id=33231081

Panzer04 · on Feb 22, 2023

Cheers for the link - if I read right there’s a cliff around 100w where power use goes way up for extremely marginal improvements.

Below 100w it’s more linear, but that might depend on undercoating and the like as well.

flakeoil · on Feb 22, 2023

It's actually this idling power which is what defines battery drain for most people. All these benchmarks about how much it can for a certain compute intensive task is not that important considering that most of the time a laptop is doing almost nothing.

We just stare at an article in a web browser. We look at a text document. We type a bit in the document. An app is doing an HTTP request. The CPU is doing nothing basically.

Once in a while it has to redraw something, do some intense processing of an image or text, but it takes seconds.

It's the 99% in idling that counts and there most laptop CPU's suck.

Even when watching a video the CPU is not (should not be) doing much as there are HW co-processors for MPEG-4 decoding built in.

It's quite embarrassing how AMD and Intel have screwed up honestly.

dr_zoidberg · on Feb 22, 2023

And that's why so far AMDs mobile processors have been monolithic and not chiplet-based. That is supposed to change with Zen 4's Dragon Range, however most of the mobile lineup will still be monolithic and these high-power/high-performance processors should go exclusively to "gaming" notebooks.

jeffparsons · on Feb 22, 2023

I care a lot about idle power, even on my desktop PC. It seems crazy to me that in 2023 I still need to consider whether maybe I should shut down my computer when I'm not using it.

What should I be buying to not have to ask myself that question?

peoplefromibiza · on Feb 22, 2023

deep sleep is enough.

idle means the computer is turned on, a Mac on idle consumes less power than an x86 on idle, they both consume ~zero in deep sleep.

andy_ppp · on Feb 22, 2023

A Mac?

AshamedCaptain · on Feb 22, 2023

A laptop?

Salgat · on Feb 22, 2023

If you take a Zen 3 running at optimal clocks for efficiency (such as the 5800U) the difference between its computing performance per watt is competitive with the M1 if you account for the difference in node size (which TSMC claims gives 30% less power consumption at the same performance). As the article points out, the real efficiency gains will be domain specific changes such as shifting to 8 bit for more calculations.

senttoschool · on Feb 22, 2023

It really isn't competitive.

First of all, when you downclock anything, you're going to gain efficiency. If Apple downclocks M1, it can get even more efficient.

Second, most of these tests use Cinebench, which is highly optimized for x86, not ARM instructions. Geekbench should be used instead.

Third, the M1 is a SoC. Everything is on it. Everything is efficiently connected directly inside the chip.

Salgat · on Feb 22, 2023

Both the M1 and 5800U run around 15W and are already clocked for efficiency. The M1 Max is their higher clocked less efficient offering.

senttoschool · on Feb 24, 2023

This is false. The 5800U will boost well beyond 15w. Ignore their TDP marketing ratings.

jorvi · on Feb 22, 2023

How would x86-64 be as efficient with the same transistor & power budget when they have to run an extra decoder and ring within that budget? Seems physically impossible.

ejiblabahaba · on Feb 22, 2023

I found this to be a pretty expansive answer to this question: https://chipsandcheese.com/2021/07/13/arm-or-x86-isa-doesnt-...

jorvi · on Feb 22, 2023

Thank you for the detailed article!

jabl · on Feb 22, 2023

All else being equal, they can't. But the difference isn't as big as some people like to think. For a current high end core, probably low single digit %. And x86-64 has had a lot more effort going into software optimization.

Panzer04 · on Feb 22, 2023

As I understand it, the actual processing part of most chips nowadays is fairly bespoke, with a decoder sitting on top. I doubt decode can make up that large a portion of a chips power consumption (probably negligible next to the rest of the chip?), so other improvements can make up for the difference.

chasil · on Feb 22, 2023

How can AArch64 be as efficient when it implements all of the old 32-bit extensions?

Some don't, but a phone does.

appointment · on Feb 22, 2023

The latest ARM Cortex CPUs (models X2, A715 and A510) drop 32-bit support. Qualcomm actually includes two older Cortex-A710 cores in the Snapdragon 8 gen 2 for 32-bit support. Don't know much about Apple Silicon but didn't they drop 32-bit a couple of years back?

Google has purged 32-bit apps from the official Android app store, but as I understand it the Chinese OEMs that ship un-Googled AOSP ROMs with their own app stores haven't been as aggressive about moving to 64-bit.

shaunsingh0207 · on Feb 22, 2023

Apple also entirely drops 32-bit from their arm systems

snovv_crash · on Feb 22, 2023

The decoder has negligible power consumption and die area on a modern CPU.

zozbot234 · on Feb 22, 2023

Not if you have a complex ISA like x86 and want a very wide decode.

peoplefromibiza · on Feb 22, 2023

yes if you want to keep the transistor count low

M1 has 16 billion of them

AMD is below 10 billion

monocasa · on Feb 22, 2023

Because the more complex decoder is traded in this case for a denser instruction set, which means they can trade it for less instruction cache (which is more power hungry).

peoplefromibiza · on Feb 22, 2023

> when they have to run an extra decoder

it's not that expensive

it's a tradeoff, you lose something there, you gain somewhere else.

x86 has a more complex decoder exactly because it was less powerful and had to save on computing power and energy consumption, not being a mainframe.

Salgat · on Feb 22, 2023

My guess is it's related to the higher transistor count. The M1 for example has 16B transistors compared to the 5800U with 10.7B.

cookiengineer · on Feb 22, 2023

Honestly I don't understand why there's not something like a 256 core ARM laptop with 4TB RAM.

The benefit of ARM is scale of multitasking due to not requiring the same kind of lock states that Intel's architecture requires, and can additionally scale much better than only one physical+virtual core pair.

I guess the only thing that's holding back ARM is Microsoft, as laptops are expected to run an desktop OS that people are comfortable with. Windows RT wasn't really a serious desktop OS and rather a joke made only for some IoT enterprises instead of end-users.

I wish there was more serious hardware than the standard broadcom or MediaTek chips, I'd definitely want some of that...be it as a mini ATX desktop/server format (e.g. as a competitor to Intel NUC or Mac Mini) or as a laptop.

With the ongoing energy crisis something like solar powered servers would be so much more feasible than with x86 hardware.

PragmaticPulp · on Feb 22, 2023

> Honestly I don't understand why there's not something like a 256 core ARM laptop

The high power ARM cores aren’t that small. If you took the M2 and scaled it up to 256 cores, it would be almost 7 square inches. You can’t just scale a chip like that, though, so the interconnects would consume a huge amount of space as well. It would also consume over 1000W.

The latest ARM chips are great, but some times I think the perception has shifted too far past the reality.

Dylan16807 · on Feb 22, 2023

7 square inches would also include an enormous GPU and tons of accessories.

The actual cores are about .6/2.3 mm², and local interconnects and L2 roughly double that.

So with just those parts, 256 P-cores would be about 1.5 square inches, and 256 E-cores would be about half a square inch. And in practical terms you can fabricate a die that's a bit more than a square inch.

Of course it wouldn't use 1000 watts. When you light up that many cores at once you use them at lower power. And I doubt a 256 core design would have all that many P cores either.

As a rough estimate, you could take the 120mm² M1 chip, add 28 more P-cores with 110mm², 220 more E-cores with 300mm², 128 more MB of L3 cache with 60mm², 100mm² of miscellaneous interconnects, and still be on par with a high end GPU.

That sounds doable but is pushing it. A 128 core die, though, has nothing stopping it except market fit.

adgjlsfhk1 · on Feb 22, 2023

even a 128 core part made like that will perform pretty atrociously. scaling up the core count without scaling the cache count means you have a lot of cores waiting for memory. also when you have 128 cores, you almost certainly need more memory channels to have enough bandwidth.

Dylan16807 · on Feb 22, 2023

I explicitly included more cache.

And the memory controllers aren't that big on the die. You could include a bunch more on a 128 core model.

datadeft · on Feb 22, 2023

Could we make the chips go slower like around 1Ghz? Maybe that is not feasible with the current software architecture to achieve great user experience.

moonchild · on Feb 22, 2023

> The benefit of ARM is scale of multitasking due to not requiring the same kind of lock states that Intel's architecture requires

I have no idea what you mean by this. The only x86 feature I can think of that might qualify as a 'lock state' is a bus lock that happens when an atomic read-modify-write operation is split over two cache lines. That has a very simple solution ('don't do that'--you have no reason to), and anyway, one can imagine more efficient implementation strategies

> can additionally scale much better than only one physical+virtual core pair

I have no idea what you mean by this either. Wider hyperthreading? It can be worthwhile for some workloads (and e.g. some ibm cpus have 4-way hyperthreading), but is not a panacea; there are tradeoffs involved.

CodesInChaos · on Feb 22, 2023

I'd guess they're referring to ordinary reads/writes having acquire/release semantics on x86 and relaxed on ARM.

unnah · on Feb 22, 2023

The largest number of high-performance ARM cores you can get in a single socket is the Ampere Altra Max with 128 ARM Neoverse-N1 cores. At 2.6 GHz the processor consumes 190 W, and at 3.0 GHz up to 250 W. This is a server chip, not something you can put in a laptop.

Source: https://www.anandtech.com/show/16979/the-ampere-altra-max-re...

hyperthesis · on Feb 22, 2023

I think because general compute is hard to parallelize, so 256 cores doesn't help much in practice. (Compute that does parallelize well already runs on GPU).

camel-cdr · on Feb 22, 2023

I get that hugely parallel applications already run on the gpu, but wouldn't something like 4 power and 28 efficiency cores kinda make sense?

tromp · on Feb 22, 2023

Not as much as say, 4 power cores, 4 efficiency cores, and 24 gpu cores.

qball · on Feb 22, 2023

>I guess the only thing that's holding back ARM is Microsoft

It's not Microsoft holding it back. It's Qualcomm.

Apart from their very latest SOC (designed by a bunch of ex-Apple employees, no less) their CPUs have are significantly worse than x86 in terms of general performance and have persistently lagged 4 years behind Apple in terms of performance (3 years behind x86). They sell for the same price per unit as x86 CPUs do, so there aren't very many OEMs that take them up on the offer given the added expense of having to design a completely different mainboard for a particular chassis.

As such, x86 is the only game in town if you're buying a non-Apple machine; Qualcomm's products aren't cheaper and perform much worse outside of having more batter life. Sure, Qualcomm owns Nuvia now, but that acquisition will still take some time to bear fruit.

YourDadVPN · on Feb 22, 2023

> that acquisition will still take some time to bear fruit.

It might be a very long time considering Arm is suing to get Qualcomm to destroy Nuvia's work.

favaq · on Feb 22, 2023

Really looking forward to buying a 256-core laptop and seeing almost all tasks using 1 single core.

Let's get real here, most things can't be parallelised at all. We must strive for better single core performance.

koiueo · on Feb 22, 2023

Let's get real. Most times you're not doing a single task on your laptop.

These days I only see a single core loaded up to 100% when I grep through a big directory or when I encounter a bug in some software.

Most of the time, it's either all cores are equally idling or equally doing something heavy (like building a big project).

phkahler · on Feb 22, 2023

>> I would guess that 30+ years ago you'd be in the millions of dollars and watts of power but that's just a guess.

30 Years ago I don't think the compute power of a modern phone chip was available at any price, even in super computers.

On a tangential note, there are economists who think this increase in compute is somehow an increase in one of their measures - I don't recall which one. I disagree, because with that logic we all have trillion dollar tech in our pocket. Making a better product over time is expected, it's not some kind of increase in output.

jodrellblank · on Feb 22, 2023

The Top500 supercomputer list started in June 1993, just about 30 years ago. At the top is the CM-5/1024 by Thinking Machines Corporation at Los Alamos National Laboratory with 1,024 cores and peaking at 131.00 GFlop/s (billion floating point operations per second).

It's an Apples to ThinkingMachine Oranges comparison but CPU-Benchmark[1] ranks the Apple A16 Bionic used in the latest iPhones, its GPU - in the "iGPU - FP32 Performance (Single-precision GFLOPS)" section - at 2000 GFlop/s.

GadgetVersus[3] reports a GeekBench score of the A16 Bionic at 279.8 GFlop/s. - SGEMM test of matrix multiplication, it seems.

AnandTech[4] was reporting the A15 architecture ARMv7 came in at 6.1 GFlops in the "GeekBench 3 - Floating Point Performance" table, SGEMM MT test result, in 2015.

[1] https://www.top500.org/lists/top500/1993/06/

[2] https://cpu-benchmark.org/cpu/apple-a16-bionic/

[3] https://gadgetversus.com/processor/apple-a15-bionic-gflops-p...

[4] https://www.anandtech.com/show/8718/the-samsung-galaxy-note-...