More

cepth · 2024-06-17T07:08:09 1718608089

A couple of thoughts here.

* AMD's traditional target market for its GPUs has been HPC as opposed to deep learning/"AI" customers.

For example, look at the supercomputers at the national labs. AMD has won quite a few high profile bids with the national labs in recent years:

- Frontier (deployment begun in 2021) (https://en.wikipedia.org/wiki/Frontier_(supercomputer)) - used at Oak Ridge for modeling nuclear reactors, materials science, biology, etc.

- El Capitan (2023) (https://en.wikipedia.org/wiki/El_Capitan_(supercomputer)) - Livermore national lab

AMD GPUs are pretty well represented on the TOP500 list (https://top500.org/lists/top500/list/2024/06/), which tends to feature computers used by major national-level labs for scientific research. AMD CPUs are even moreso represented.

* HPC tends to focus exclusively on FP64 computation, since rounding errors in that kind of use-case are a much bigger deal than in DL (see for example https://hal.science/hal-02486753/document). NVIDIA innovations like TensorFloat, mixed precision, custom silicon (e.g., the "transformer engine") are of limited interest to HPC customers. It's no surprise that AMD didn't pursue similar R&D, given who they were selling GPUs to.

* People tend to forget that less than a decade ago, AMD as a company had a few quarters of cash left before the company would've been bankrupt. When Lisa Su took over as CEO in 2014, AMD market share for all CPUs was 23.4% (even lower in the more lucrative datacenter market). This would bottom out at 17.8% in 2016 (https://www.trefis.com/data/companies/AMD,.INTC/no-login-req...).

AMD's "Zen moment" didn't arrive until March 2017. And it wasn't until Zen 2 (July 2019), that major datacenter customers began to adopt AMD CPUs again.

* In interviews with key AMD figures like Mark Papermaster and Forrest Norrod, they've mentioned how in the years leading up to the Zen release, all other R&D was slashed to the bone. You can see (https://www.statista.com/statistics/267873/amds-expenditure-...) that AMD R&D spending didn't surpass its previous peak (on a nominal dollar, not even inflation-adjusted, basis) until 2020.

There was barely enough money to fund the CPUs that would stop the company from going bankrupt, much less fund GPU hardware and software development.

* By the time AMD could afford to spend on GPU development, CUDA was the entrenched leader. CUDA was first released in 2003(!), ROCm not until 2016. AMD is playing from behind, and had to make various concessions. The ROCm API is designed around CUDA API verbs/nouns. AMD funded ZLUDA, intended to be a "translation layer" so that CUDA programs can run as a drop-in on ROCm.

* There's a chicken-and-egg problem here.

1) There's only one major cloud (Azure) that has ready access to AMD's datacenter-grade GPUs (the Instinct series).

2) I suspect a substantial portion of their datacenter revenue still comes from traditional HPC customers, who have no need for the ROCm stack.

3) The lack of a ROCm developer ecosystem means that development and bug fixes come much slower than they would for CUDA. For example, the mainline TensorFlow release was broken on ROCm for a while (you had to install the nightly release).

4) But, things are improving (slowly). ROCm 6 works substantially better than ROCm 5 did for me. PyTorch and TensorFlow benchmark suites will run.

Trust me, I share the frustration around the semi-broken state that ROCm is in for deep learning applications. As an owner of various NVIDIA GPUs (from consumer laptop/desktop cards to datacenter accelerators), in 90% of cases things just work on CUDA.

On ROCm, as of today it definitely doesn't "just work". I put together a guide for Framework laptop owners to get ROCm working on the AMD GPU that ships as an optional add-in (https://community.frame.work/t/installing-rocm-hiplib-on-ubu...). This took a lot of head banging, and the parsing of obscure blogs and Github issues.

TL;DR, if you consider where AMD GPUs were just a few years ago, things are much better now. But, it still takes too much effort for the average developer to get started on ROCm today.

vegabook · 2024-06-17T09:26:26 1718616386

Summary: AMD works if you spend 500m USD+ with them. Then they'll throw an army of their own software engineers into the contract who will hold your hand every step of the way, and remove all the jank for you. By contrast, since at least 10 years ago, I could buy any GTX card and CUDA worked out of the box, and that applied right down to a $99 Jetson Nano.

AMD's strategy looks a lot like IBM's mainframe strategy of the 80s. And that didn't go well.

cepth · 2024-06-17T09:41:16 1718617276

No, not really?

The customers at the national labs are not going to be sharing custom HPC code with AMD engineers, if for no other reason than security clearances. Nuclear stockpile modeling code, or materials science simulations are not being shared with some SWE at AMD. AMD is not “removing jank”, for these customers. It’s that these customers don’t need a modern DL stack.

Let’s not pretend like CUDA works/has always worked out of the box. There’s forced obsolescence (“CUDA compute capability”). CUDA didn’t even have backwards compatibility for minor releases (.1,.2, etc.) until version 11.0. The distinction between CUDA, CUDA toolkit, CUDNN, and the actual driver is still inscrutable to many new devs (see the common questions asked on r/localLlama and r/StableDiffusion).

Directionally, AMD is trending away from your mainframe analogy.

The first consumer cards got official ROCm support in 5.0. And you have been able to run real DL workloads on budget laptop cards since 5.4 (I’ve done so personally). Developer support is improving (arguably too slowly), but it’s improving. Hugging Face, Cohere, MLIR, Lamini, PyTorch, TensorFlow, DataBricks, etc all now have first party support for ROCm.

jedbrown · 2024-06-17T17:49:39 1718646579

> customers at the national labs are not going to be sharing custom HPC code with AMD engineers

There are several co-design projects in which AMD engineers are interacting on a weekly basis with developers of these lab-developed codes as well as those developing successors to the current production codes. I was part of one of those projects for 6 years, and it was very fruitful.

> I suspect a substantial portion of their datacenter revenue still comes from traditional HPC customers, who have no need for the ROCm stack.

HIP/ROCm is the prevailing interface for programming AMD GPUs, analogous to CUDA for NVIDIA GPUs. Some projects access it through higher level libraries (e.g., Kokkos and Raja are popular at labs). OpenMP target offload is less widespread, and there are some research-grade approaches, but the vast majority of DOE software for Frontier and El Capitan relies on the ROCm stack. Yes, we have groaned at some choices, but it has been improving, and I would say the experience on MI-250X machines (Frontier, Crusher, Tioga) is now similar to large A100 machines (Perlmutter, Polaris). Intel (Aurora) remains a rougher experience.

kkielhofner · 2024-06-17T11:13:26 1718622806

> The customers at the national labs are not going to be sharing custom HPC code with AMD engineers, if for no other reason than security clearances. Nuclear stockpile modeling code, or materials science simulations are not being shared with some SWE at AMD. AMD is not “removing jank”, for these customers.

I work closely with OLCF and Frontier (I have a job running on Frontier right now). This is incorrect. The overwhelming majority of compute and resource allocation are not "nuclear stockpile modeling code" projects or anything close to it. AMD often gets directly involved with various issues (OLCF staff has plenty of stories about this). I know because I've spoken with them and AMD.

Speaking of Frontier, you get fun things like compiling an AWS project just to get RCCL to kind of work decently with Slingshot interconnect via libfabric[0] vs NCCL that "just works", largely due to Nvidia's foresight with their acquisition of Mellanox over five years ago.

> Let’s not pretend like CUDA works/has always worked out of the box.

It is and has been miles beyond the competition and that's clearly all you need. Nvidia has > 90% market share and is worth ~10x AMD. 17 years of focus and investment (30% of their R&D spend is software) when your competitors are wandering all over the place in fits and starts will do that. I'm also of the personal opinion that AMD just doesn't have software in their DNA and don't seem to understand that people don't want GPUs, they want solutions that happen to work best on GPUs and that entails broad and significant investment in the accompanying software stacks.

AMD has truly excellent hardware that is significantly limited by their lack of investment in software.

> There’s forced obsolescence (“CUDA compute capability”).

Compute capability is why code targeting a given lineage of hardware just works. You can target 8.0 (for example) and as long as your hardware is 8.0 it will run on anything with Nvidia stamped on it from laptop to Jetson to datacenter and the higher-level software doesn't know the difference (less VRAM, which is what it is). Throw in "+PTX" when building and it will run on anything up too (albeit not taking full advantage of new hardware). With official support, without setting various environment variable and compiler hacks to end up with code that often randomly crashes (I know from personal experience). It is extremely common for projects to target SM 7.x, 8.x and 9.x. The stack just figures it out from there.

This is the PTX intermediary available with CUDA and the driver that makes this possible, where in AMD land you have some pretty drastic differences within CDNA or RDNA families not to mention CDNA vs RDNA in the first place.

IMO it's an elegant solution that works and makes it simple, even more so than CPUs (AVX, etc). How would you suggest they divide something like eight year old Pascal vs Blackwell? In terms of obsolescence, Pascal is a great example - it's supported by up to and including latest drivers, CUDA 12, and everything in their frameworks support matrix[1] of which AMD doesn't have an equivalent. Like we saw with CUDA 11, CUDA 12 will be supported by major projects for years, resulting in at least a decade of support for Pascal. Please show me an AMD GPU with even eight years of support. Back to focus, ROCm isn't even that old and AMD is infamous for removing support for GPUs, often within five years if not less.

> CUDA didn’t even have backwards compatibility for minor releases (.1,.2, etc.) until version 11.0.

Yes but they have it and CUDA 11 is four years old. They also do nice things like when they added Hopper support in 11.7 so on the day of release it "just worked" with whatever you were already running (PTX again). Same for their consumer GPUs, it "just works" the day of release. AMD took over a year to officially support their current flagship desktop GPU (7900 XTX) and even that is dicey in practice due to CDNA vs RDNA. Even when they did they were doing bizarre things like supporting Python 3.10 with ROCm 5.7 docker containers and Python 3.9 in ROCm 6 docker containers for the first few months.

Python 3.10 is pretty much the de-facto standard for these stacks, cue my surprise when I was excited for ROCm 6 only to find out Python code with popular projects was blowing up all over the place because 3.9. It just screams "we don't get this".

> The distinction between CUDA, CUDA toolkit, CUDNN, and the actual driver is still inscrutable to many new devs (see the common questions asked on r/localLlama and r/StableDiffusion).

Yes, and AMD has direct equivalents that are even less clear. The reddit communities you mention are not the best examples (I would not call those users "devs"). Even so, look at any post of someone coming along asking what hardware to buy. The responses are overwhelmingly "AMD is a world of pain, if you want for it to just work buy Nvidia". IMO the only "AMD is fine, don't believe the FUD" responses are an effect of the cult-like "team red vs team green" bleeding over from hobbyist/gamer subs on Reddit because it's just not accurate. I don't know a single dev or professional in the space (who's livelihood depends on it) who agrees.

They will also often point out that due to significantly better software AMD hardware is often bested by previous generation Nvidia hardware with dramatically inferior paper specs [2]. I like to say that AMD is at the "get it to work" stage while Nvidia and the broader CUDA ecosystem has been at the "squeeze every last penny out of it" stage for many years.

> And you have been able to run real DL workloads on budget laptop cards since 5.4 (I’ve done so personally).

Depends on what you mean by "real DL workloads". Vanilla torch? Yes. Then start looking at flash attention, triton, xformers, and production inference workloads...

> Developer support is improving (arguably too slowly), but it’s improving.

Generally agree but back to focus and discipline it's a shame that it took a massive "AI" goldrush over the past ~18 months for them to finally take it vaguely seriously. Now you throw in the fact that Nvidia has absurdly more resources, their 30% R&D spend on software is going to continue to rocket CUDA ahead of ROCm.

For Frontier and elsewhere I really want AMD to succeed, I just don't think it does them (or anyone) any favors by pretending that all is fine in ROCm land.

[0] - https://www.olcf.ornl.gov/wp-content/uploads/OLCF_AI_Trainin...

[1] - https://docs.nvidia.com/deeplearning/frameworks/support-matr...

[2] - https://www.reddit.com/r/LocalLLaMA/comments/191srof/amd_rad...

cepth · 2024-06-17T18:14:45 1718648085

(Split into two parts due to comment length restrictions)

> I work closely with OLCF and Frontier (I have a job running on Frontier right now). This is incorrect. The overwhelming majority of compute and resource allocation are not "nuclear stockpile modeling code" projects or anything close to it. AMD often gets directly involved with various issues (OLCF staff has plenty of stories about this). I know because I've spoken with them and AMD.

I don't have any experience running a job on one of these national supercomputers, so I'll defer to you on this. (Atomic Canyon looks very cool!)

Just two follow-ups then: is it the case that any job, small or large, enjoys this kind of AMD optimization/debugging support? Does your typical time-grant/node-hour academic awardee get that kind of hands-on support?

And, for nuclear modeling (be it weapons or civilian nuclear), do you know if AMD engineers can get involved? (https://insidehpc.com/2023/02/frontier-pushes-boundaries-86-... this article claims "86% of nodes" were used on at least one modeling run, which I imagine is among the larger jobs)

> It is and has been miles beyond the competition and that's clearly all you need. Nvidia has > 90% market share and is worth ~10x AMD. 17 years of focus and investment (30% of their R&D spend is software) when your competitors are wandering all over the place in fits and starts will do that.

No dispute here that NVIDIA is the market leader today, deservedly so. NVIDIA to its credit has invested in CUDA for many years, even when it wasn't clear there was an immediate ROI.

But, I bristle at the narrative fallacy that it was some divine inspiration and/or careful planning (“focus”) that made CUDA the perfect backbone for deep learning.

In 2018, NVIDIA was chasing crypto mining, and felt the need to underplay (i.e., lie) to investors about how large that segment was (https://wccftech.com/nvidia-sued-cryptocurrency-mining-reven...). As late as 2022, NVIDIA was diverting wafer supply from consumer, professional, and datacenter GPUs to produce crippled "LHR" mining cards.

Jensen has at various points pumped (during GTC and other high profile events):

- Ray tracing (2018) (https://www.youtube.com/watch?v=95nphvtVf34)

- More ray tracing (2019) (https://youtu.be/Z2XlNfCtxwI)

- "Omniverse" (2020) https://youtu.be/o_XeGyg2NIo?list=PLZHnYvH1qtOYOfzAj7JZFwqta...)

- Blockchain, NFTs, and the metaverse (2021) (https://cointelegraph.com/news/nvidia-ceo-we-re-on-the-cusp-...) (https://blockonomi.com/nvidiz-ceo-talks-crypto-nfts-metavers...)

- ETH (2021) (https://markets.businessinsider.com/currencies/news/nvidia-c...)

- "Omniverse"/digital twins (2022) (https://www.youtube.com/watch?v=PWcNlRI00jo)

- Autonomous vehicles (2022) (https://www.youtube.com/watch?v=PWcNlRI00jo)

Most of these predictions about use cases have not panned out at all. The last GTC keynote prior to the "ChatGPT moment" took place just 2 months before the general availability of ChatGPT. And, if you click through to the video, you'll see that LLMs got under 7 minutes of time at the very end of a 90 minute keynote. Clearly, Jensen + NVIDIA leadership had no idea that LLMs would get the kind of mainstream adoption/hype that they have.

On the business side, it hasn't exactly always been a smooth ride for NVIDIA either. In Q2 2022 (again right before the "ChatGPT moment"), the company missed earnings estimates by 18%(!) due to inventory writedowns (https://www.pcworld.com/article/828754/nvidia-preannounces-l...).

The end markets that Jensen forecasts/predicts on quarterly earnings calls (I’ve listened to nearly every one for the last decade) are comically disconnected from what ends up happening.

It's a running joke among buy-side firms that there'll always be an opportunity to buy the NVDA dip, given the volatility of the company's performance + stock.

NVIDIA's "to the moon" run as a company is due in large part to factors outside of its design or control. Of course, how large is up for debate.

If/when it turns out that most generative products can't turn a profit, and NVIDIA revenues decline as a result, it wouldn't be fair to place the blame for the collapse of those end markets at NVIDIA’s feet. Similarly, the fact that LLMs and generative AI turned out to be hit use cases has little to do with NVIDIA's decisions.

AMD is a company that was on death’s door until just a few years ago (2017). It made one of the most incredible corporate comebacks in the history of capitalism on the back of its CPUs, and is now dipping its toes into GPUs again.

NVIDIA had a near-monopoly on non-console gaming. It parlayed that into a dominant software stack.

It’s possible to admire both, without papering over the less appealing aspects of each’s history.

> Depends on what you mean by "real DL workloads". Vanilla torch? Yes. Then start looking at flash attention, triton, xformers, and production inference workloads...

As I mentioned above, this is a chicken-and-egg phenomenon with the developer ecosystem. I don't think we really disagree.

CUDA is an "easy enough" GPGPU backbone that due to incumbency and the lack of real competition from AMD and Intel for a decade led to the flourishing of a developer ecosystem.

Tri Dao (sensibly) decided to write his original Flash Attention paper with an NVIDIA focus, for all the reasons you and I have mentioned. Install base size, ease of use of ROCm vs CUDA, availability of hardware on-prem & in the cloud, etc.

Let's not forget that Xformers is a Meta project, and that non-A100 workloads (i.e., GPUs without 8.0 compute capability) were not officially supported by Meta for the first year of Xformers (https://github.com/huggingface/diffusers/issues/2234) (https://github.com/facebookresearch/xformers/issues/517#issu...). This is the developer ecosystem at work.

AMD right now is forced to put in the lion's share of the work to get a sliver of software parity. It took years to get mainline PyTorch and Tensorflow support for ROCm. The lack of a ROCm developer community (hello chicken and egg), means that AMD ends up being responsbile for first-party implementations of most of the hot new ideas coming from research.

Flash Attention for ROCm does exist (https://github.com/ROCm/flash-attention) (https://llm-tracker.info/howto/AMD-GPUs#flash-attention-2), albeit only on a subset of cards.

Triton added (initial) support for ROCm relatively recently (https://github.com/triton-lang/triton/pull/1983).

Production-scale LLM inference is now entirely possible with ROCm, via first-party support for vLLM (https://rocm.blogs.amd.com/artificial-intelligence/vllm/READ...) (https://community.amd.com/t5/instinct-accelerators/competiti...).

> Compute capability is why code targeting a given lineage of hardware just works. You can target 8.0 (for example) and as long as your hardware is 8.0 it will run on anything with Nvidia stamped on it from laptop to Jetson to datacenter and the higher-level software doesn't know the difference (less VRAM, which is what it is).

This in theory is the case. But, even as an owner of multiple generations of NVIDIA hardware, I find myself occasionally tripped up.

Case in point:

RAPIDS (https://rapids.ai/) is one of the great non-deep learning success stories to come out of CUDA, a child of the “accelerated computing” push that predates the company’s LLM efforts. The GIS and spatial libraries are incredible.

Yet, I was puzzled when earlier this year I updated cuSpatial to the newest available version (24.02) (https://github.com/rapidsai/cuspatial/releases/tag/v24.02.00) via my package manager (Mamba/Conda), and started seeing pretty vanilla functions start breaking on my Pascal card. Logs indicated I needed a Volta card (7.0 CC or newer). They must've reimplemented certain functions altogether.

There’s nothing in the release notes that indicates this bump in minimum CC. The consumer-facing page for RAPIDS (https://rapids.ai/) has a mention under requirements.

So I’m led to wonder, did the RAPIDS devs themselves not realize that certain dependencies experienced a bump in CC?

cepth · 2024-06-17T18:18:19 1718648299

(Part 2 of 2)

> Please show me an AMD GPU with even eight years of support. Back to focus, ROCm isn't even that old and AMD is infamous for removing support for GPUs, often within five years if not less.

As you yourself noted, CDNA vs RDNA makes things more complicated in AMD land. I also think it’s unfair to ask about “eight years of support” when the first RDNA card didn’t launch until 2019, and the first CDNA “accelerator” in 2020.

The Vega and earlier generation is so fundamentally different that it would’ve been an even bigger lift for the already small ROCm team to maintain compatibility.

If we start seeing ROCm removing support for RDNA1 and CDNA1 cards soon, then I’ll share your outrage. But I think ROCm 6 removing support for Radeon VII was entirely understandable.

> Generally agree but back to focus and discipline it's a shame that it took a massive "AI" goldrush over the past ~18 months for them to finally take it vaguely seriously. Now you throw in the fact that Nvidia has absurdly more resources, their 30% R&D spend on software is going to continue to rocket CUDA ahead of ROCm.

> For Frontier and elsewhere I really want AMD to succeed, I just don't think it does them (or anyone) any favors by pretending that all is fine in ROCm land.

The fact is that the bulk of AMD profits is still coming from CPUs, as it always has. AMD wafer allotment at TSMC has to first go towards making its hyperscaler CPU customers happy. If you promise AWS/Azure/GCP hundreds of thousands of EPYC CPUs, you better deliver.

I question how useful it is to dogpile (not you personally, but generally) on AMD, when the investments in people and dollars are trending in the right decision. PyTorch and TensorFlow were broken on ROCm until relatively recently. Now that they work, you (not unreasonably) ask where the other stuff is.

The reality is that NVIDIA will likely forever be the leader with CUDA. I doubt we’ll ever see PhD students and university labs making ROCm their first choice when having to decide where to conduct career-making/breaking research.

But, I don’t think it’s really debatable that AMD is closing the relative gap, given the ROCm ecosystem didn’t exist until at all relatively recently. I’m guessing the very credible list of software partners now at least trying ROCm (https://www.amd.com/en/corporate/events/advancing-ai.html#ec...) are not committing time + resources to an ecosystem that they see as hopeless.

---

Final thoughts:

A) It was completely rational for AMD to focus on devoting the vast majority of R&D spend to its CPUs (particularly server/EPYC), particularly after the success of Zen. From the day that Lisa Su took over (Oct 8, 2014), the stock is up 50x+ (even more earlier in 2024), not that share price is reflective of value in the short term. AMD revenue for calendar year 2014 was $5.5B, operating income negative 155 million. Revenue for 2023 was $22.68B, operating income $401 million. Operating income was substantially higher in 2022 ($1.2B) and 2021 ($3.6B), but AMD has poured that money into R&D spending (https://www.statista.com/statistics/267873/amds-expenditure-...), as well as the Xilinx acquisition.

B) It was completely rational for NVIDIA to build out CUDA, as a way to make it possible to do what they initially called "scientific computing" and eventually "GPU-accelerated computing". There's also the reality that Jensen, the consummate hype man, had to sell investors a growth story. The reality is that gaming will always be a relatively niche market. Cloud gaming (GeForce Now) never matched up to revenue expectations.

C) It’s difficult for me to identify any obvious “points of divergence” that in an alternate history would’ve led to better outcomes with AMD. Without the benefit of “future knowledge”, at what point should AMD have ramped up ROCm investment? Given, as I noted above, in the months before ChatGPT went viral, Jensen’s GTC keynote gave only a tiny mention to LLMs.

D) If anything, the company that missed out was Intel. Beyond floundering on the transition from 14nm to 10nm (allowing TSMC and thus AMD to surpass them), Intel wasted its CPU-monopoly years and the associated profits. Projects like Larrabee (https://www.anandtech.com/show/3738/intel-kills-larrabee-gpu...) and Xe (doomed in part by internal turf wars) (https://www.tomshardware.com/news/intel-axes-xe-hp-gpus-for-...) were killed off. R&D spending was actually comparable to the amount spent on share buybacks in 2011 (14.1B in buybacks vs 8.3B in R&D spending), 2014 (10.7B vs 11.1B), 2018 (10.8B vs 13.B), 2019 (13.5B vs 13.3B) and 2020 (14.1B vs 13.55B). (See https://www.intc.com/stock-info/dividends-and-buybacks and https://www.macrotrends.net/stocks/charts/INTC/intel/researc...).

vegabook · 2024-06-17T19:58:22 1718654302

lol AMD flogged its floundering foundry waaay before Intel ran into any problems.

in fact most of your points about AMD's lack of dough can be traced back to that disaster. The company wasn't hit by some meteorite. It screwed up all by itself.

Then lucky it had that duopolistic X86 licence to lean on or it would have gone the way of Zilog or Motorola. 'Cos it sure can't rely on its janky compute offering.

cepth · 2024-06-17T20:25:25 1718655925

Assuming you're not just here to troll (doubtful given your comment history, but hey I'm feeling generous):

> lol AMD flogged its floundering foundry waaay before Intel ran into any problems.

Not wanting/being able to spend to compete on the leading edge nodes is an interesting definition of "floundering". Today there is exactly 1 foundry in the world that's on that leading edge, TSMC. We'll see how Intel Foundry works out, but they're years behind their revenue/ramp targets at this point.

It's fairly well known that Brian Krzanich proposed spinning out Intel's foundry operations, but the board said no.

The irony is that trailing edge fabs are wildly profitable, since the capex is fully amortized. GloFo made $1 billion in net income in FY2023.

> in fact most of your points about AMD's lack of dough can be traced back to that disaster. The company wasn't hit by some meteorite. It screwed up all by itself

Bulldozer through Excavator were terrible architectures. What does this have to do with what's now known as Global Foundries?

GloFo got spun out with Emirati money in March 2009. Bulldozer launched in Q4 2011. What's the connection?

AMD continued to lose market share (and was unprofitable) for years after the foundry was spun out. Bad architectural choices, and bad management, sure. Overpaying for ATI, yep. "Traced back" to GloFo? How?

> Then lucky it had that duopolistic X86 licence to lean on or it would have gone the way of Zilog or Motorola. 'Cos it sure can't rely on its janky compute offering.

"Janky" when? "Rely" implies present tense. You're saying AMD compute offerings are janky today?

KeplerBoy · 2024-06-17T08:14:36 1718612076

Small correction: CUDA was first released in 2007 and of course Nvidia was also aiming at HPC before the AlexNet moment.

fulafel · 2024-06-17T07:56:03 1718610963

Good summary. There was also the 2010's multivendor HSA and OpenCL software evolution directions that ended up losing other vendors on the way and many customers turned out to accept the proprietary Cuda.

cepth · on July 27, 2021

I think the general idea that "lawmakers are ex-lawyers, and therefore write extremely bureaucratic laws" is a common sentiment, but there are a number of reasons why I think this is not quite the case.

Few (if any) US representatives or senators at the federal level are actually writing out legislation themselves. They hire staffers (see the "Personal staff" section https://en.wikipedia.org/wiki/Congressional_staff), sometimes lift language straight from lobbyist proposals (aka "model legislation" https://en.wikipedia.org/wiki/Model_act, which is widely used at the state level), or defer to committee staffers (who are subject matter experts) to do the heavy lifting. For example, Lina Khan served from 2019-2020 as counsel to the House's "Subcommittee on Antitrust, Commercial, and Administrative Law", and you can see her fingerprints all over the written work that the committee produced. The framing, and sometimes direct language, of committee report sections are clearly lifted from her legal academia work.

This is comparable to the fact that US federal judge at all levels (including SCOTUS) lean on their clerks to write the first drafts of their opinions, and serve primarily as editors of the final text.

---

In regards to the empirical claim about the backgrounds of lawmakers, see page 8 of this report from the Congressional Research Service (https://crsreports.congress.gov/product/pdf/R/R46705). They say that 144 House members (32.7% of the total), and 50 senators (50%) hold law degrees. While I think you may have been using a bit of hyperbole, it is worth pointing out that there are not enough lawyers in Congress for "ALL of the democrats" to be lawyers.

In terms of occupation (page 3), 85 reps and 28 senators were previously educators; 14 reps and 4 senators were physicians; etc.

Yes there are plenty of law degree holders, but it's also worth considering what law-related job they held. Per the CRS report, 29 reps and 9 senators were previously prosecutors, and 1 rep and 6 senators were previously attorney generals. It's unclear to me why a career in the criminal side of our legal system would have much bearing on how someone drafts laws affecting taxation, provision of government services, etc.

There's also the fact that many law degree holders practiced law for not long at all before winning elected office, or had more substantial "chapters" of their life not related to their degree. Take Jason Crow (https://en.wikipedia.org/wiki/Jason_Crow). He spent as much time as an Army Ranger as he did as a lawyer. One could easily construct a narrative that Mr. Crow, who has complained often about the bureaucracy of accessing veterans' healthcare, should be allergic to red tape and bureaucracy. But with the crude taxonomy of "he has a law degree", the other parts of his life would be overlooked.

---

IMO, a big part of the bloated and inhumane parts of the bureaucracy have to do with the outgrowth of "administrative law" and "rulemaking" (https://en.wikipedia.org/wiki/Administrative_law; https://en.wikipedia.org/wiki/United_States_administrative_l... https://www.everycrsreport.com/reports/RL32240.html). Once a bill has been signed into law, the rulemaking process begins. These are where the actual details of a new law are hashed out. A bill may designate $X in funding for a program. Which contractors receive those contracts, hours of service, the amount of paperwork required, etc. are all handled at the administrative level. And it's certainly the case that for 99.9% of citizens, no one is submitting public comments during this period, and the input of ordinary people is often lacking.

So when the IRS makes a free-file options for taxes difficult to use, in large part due to Intuit's lobbying (https://www.propublica.org/article/inside-turbotax-20-year-f...), this is not the result of a carve-out or giveaway spelled out in actual legislation's text. It's the result of an actor exploiting the opacity of the rulemaking and administrative law practices.

jellicle · on July 27, 2021

This is an excellent comment.

The law-degree-heavy nature of Congress should be understood not as a Congress full of lawyers, but as a Congress full of the children of rich people checking boxes on their career path to high status jobs.

The modern rich kid angling to be a Senator one day might want a short stint in the military. They might want a degree in either political science or law. They might want a job as a public prosecutor or one as a TV personality (either one gives public exposure). These are essentially box-checking to be done in the years from 18-30, and after 30 you've checked a bunch of the boxes and are ready for your run for local, state, or federal office.

At no point in most of these people's lives did they have any intent of becoming a lawyer as a full-fledged career, and the convoluted nature of much US law-making is not due to the inherent nature of lawyer-politicians. The average politician spends approximately zero seconds per day writing legislation.

cepth · on July 12, 2021

It’s a much more fully featured note taking app, probably most directly comparable to Roam Research.

E.g., tagging, graph views, back links between notes, note/document templates.

All docs are (more or less) plain markdown as well, so if for some reason the open source community ever abandoned Obsidian, in theory it’s easy to export/transfer your notes.

cepth · on June 9, 2021

They’re different ratings systems, and not meant to be directly comparable.

Chess.com uses Glicko with an initial rating of 1200 (https://support.chess.com/article/210-how-do-ratings-work-on...), while Lichess uses Glicko-2, and sets their initial rating to 1500 (https://lichess.org/page/rating-systems).

Just anecdotally, I’m an ~1800 rated player on chess.com, and ~2050 on Lichess. Percentile rank wise, I’m in the top 3.5% on chess.com, and closer to top 10% on Lichess.

I think this matches the general perception that Chess.com has many more casual players. This is likely a function of all the cross promotion they’ve done to grow the game, especially on Twitch. Neither site is “good” or “bad”, but my friends who play casually seem more likely to play on Chess.com.

qsort · on June 9, 2021

> I’m in the top 3.5% on chess.com, and closer to top 10% on Lichess

You are correct that ratings are not comparable, but percentiles are not really comparable either. IIRC, chess.com computes the percentile rank on the overall player population, while lichess only compares the score of players who have been active in the last 7 days.

> Just anecdotally, I’m an ~1800 rated player on chess.com, and ~2050 on Lichess

I have similar ratings to yours on both sites, can confirm the anecdata.

> I think this matches the general perception that Chess.com has many more casual players.

I guess it depends on what you define as "casual". Chess is a game where ratings work beautifully because it's individual and there's not a lot of randomness, so your rating will converge on your skill in short order.

What I noticed is that there are a lot of casual players on both sites, but Chess.com has a "wider" distribution if that makes sense, with lots of absolute beginners (think <1100 chess.com), while lichess seems to have less of those.

Also, IIRC lichess has an absolute rating floor at 800, while chess.com doesn't have one.

OhNoMyqueen · on June 9, 2021

Thanks for the explanation. Do you know which one would be closer to FIDE or club ratings?

iends · on June 9, 2021

https://chessgoals.com/rating-comparison/

OhNoMyqueen · on June 9, 2021

Thanks. TL;DR: chess.com is closer to FIDE than Lichess.

cepth · on April 26, 2021

In 2017, Ars Technica did a deep dive into computation in Formula 1 (https://arstechnica.com/cars/2017/04/formula-1-technology/).

Some relevant quotes:

> For example, each Formula 1 team is only allowed to use 25 teraflops (trillions of floating point operations per second) of double precision (64-bit) computing power for simulating car aerodynamics.

> Oddly, the F1 regulations also stipulate that only CPUs can be used, not GPUs, and that teams must explicitly prove whether they're using AVX instructions or not. Without AVX, the FIA rates a single Sandy Bridge or Ivy Bridge CPU core at 4 flops; with AVX, each core is rated at 8 flops. Every team has to submit the exact specifications of their compute cluster to the FIA at the start of the season, and then a logfile after every eight weeks of ongoing testing.

> Everest says that every team has its own on-premises hardware setup and that no one has yet moved to the cloud. There's no technical reason why the cloud can't be used for car aerodynamics simulations—and F1 teams are investigating such a possibility—but the aforementioned stringent CPU stipulations currently make it impossible. The result is that most F1 teams use a somewhat hybridised setup, with a local Linux cluster outputting aerodynamics data that informs the manufacturing of physical components, the details of which are kept in the cloud.

> Wind tunnel usage is similarly restricted: F1 teams are only allowed 25 hours of "wind on" time per week to test new chassis designs. 10 years ago, in 2007, it was very different, says Everest: "There was no restriction on teraflops, no restriction on wind tunnel hours," continues Everest. "We had three shifts running the wind tunnel 24/7. It got to the point where a lot of teams were talking about building a second wind tunnel; Williams built a second tunnel.

With the new cost cap in F1 (https://www.autoweek.com/racing/formula-1/a35293542/f1-budge...) (which notably excludes driver salaries), it would be interesting to know how much these on-prem clusters cost to operate.

Someone · on April 26, 2021

To level the field even more, I think the FIA should require teams to release the design of their computer hardware after X time. That way, investments by one team on improving the system architecture spread to teams with lower budgets after a while.

Also, I didn’t find it in the article, but I guess they have programmers who can work for months to speed up their software by a few percent.

RaceWon · on April 26, 2021

>> to level the field even more, I think the FIA should require teams to release the design of their computer hardware after X time.

But that is Not what Formula One is about... it is Not a Spec series where the cars are equal to each other. It is a competition where each team builds their own race car to compete against the other iterations of race cars built by the opposing teams. It is Not meant to be fair or equitable. We have Indycar and NASCAR for that.

Ditto with the drivers: Is Max or Lewis comparable to say a Mazepin or even a Hulkenberg? No they are Not.

It's a Spectacle, it's a Circus... that is what F1 is about. And I tell you, as a racer there is nothing else that is its equal in terms of pure audacity both from a standpoint of driving talent and car performance.

capableweb · on April 26, 2021

F1 is definitely trying to make the teams and their engineering more similar than different, why do you think the whole regulation part exists even? [1] If they were to be allowed to build whatever they want, F1 would have looked very different than how it looks today.

F1 (FIA really) has been using regulation to improve the sports safety, but lately they also used regulation in order to regulate how much each team spends on engineering, both money-wise and time-wise. This is to make things more equal between the teams.

- [1] - https://en.wikipedia.org/wiki/Formula_One_regulations#Techni...

RaceWon · on April 26, 2021

>> F1 is definitely trying to make the teams and their engineering more similar than different, why do you think the whole regulation part exists even?

I agree, especially under the new owners. And for sure the cars are built according to each teams interpretation of the rules (which are of course subject to scrutineering). But that still leaves massive room for innovation.

Lewis is sitting in the same cockpit as Bottas is... their results are frequently vastly different due to their individual interpretation of events.

moralestapia · on April 26, 2021

>each Formula 1 team is only allowed to use 25 teraflops

That ... doesn't make much sense, honestly.

jecel · on April 27, 2021

The problem is the sloppy use of technical terms. MIPS means "millions of instructions per second" so the "p" is "per" and the "s" is "second". So it is natural for people to use FLOPS in the exact same way, but it is more correct for this to be "FLoating point OPerationS" where the "s" is used to indicate a plural.

That makes MIPS the equivalent of power (Watts) and FLOPS the equivalent of energy (Jouls). In that case limiting each Formula 1 team to a maximum of 25 teraflops of computation does make sense.

If instead you use teraflops as the equivalent of "trillion floating point operations per second" as many people do then it indeed makes less sense.

moralestapia · on April 29, 2021

My thoughts exactly.

But then, the article explicitly states that they are talking about "trillion floating point operations per second".

It's probably just a mistake from the journalist.

cepth · on Feb 23, 2021

You’d be paying for earnings several years out, for sure. But I think it’s hardly fair to compare to Nikola.

Trevor Milton’s experience prior to starting Nikola was selling home security systems. Lucid’s leadership team features various Tesla, Audi, Ford, VW, etc. veterans. (See page 9: https://www.lucidmotors.com/files/lucid-investor-deck-februa...)

Lucid has finished building a factory that can produce roughly 30k cars per year, expandable to 400k.

They’ve given rides in their launch vehicle to various auto journalists (https://youtu.be/gqSN2QNgO5k).

Their battery pack technology is a component of the Formula E drivetrain system (https://lucidmotors.com/media-room/atieva-powers-season-6-fo...).

This isn’t a “Nikola rolling a non functional truck down a hill” situation. They have a working product.

The car could suck, the company could be overvalued. But I think hard to compare Nikola to Lucid.

EDIT:

I should also add for comparison, that at the time that Tesla IPOed in 2010, it was a 1.7B market cap company. Only ~2450 Roadsters (their only car at the time) would be sold in total. By November 29, 2010 Tesla had not yet sold 1400 cars (https://www.tesla.com/blog/race-champions-2010-motorsport-go...).

Tesla's Fremont factory was opened in October 2010. In other words, when the company went public on June 29th, 2010 you would have been buying into a car company without a factory.

The first Model S wasn't delivered until June 2012 (https://www.tesla.com/blog/tesla-motors-begin-customer-deliv...).

Not to say that Lucid will or won't ever reach Tesla's heights, but assigning a 12B valuation to the company isn't loony. The SPAC price though is a different story.

EDIT 2:

There are some fun short videos of Lucid CEO Peter Rawlinson in the workshop from his Tesla days (https://www.youtube.com/watch?v=TrbOLHW8Pec, https://www.youtube.com/watch?v=8YxHp2ot61Y, https://www.youtube.com/watch?v=NGKqPYvtqXE). It's pretty awe inspiring to see where Tesla and the global EV industry as a whole was in 2011 vs today.

cepth · on Jan 20, 2021

Yes, a HN member with a profile created in December of 2008, 5800+ Karma, with links to his Twitter and name of his employer in his profile page is a 五毛(50 center).

The CCP is truly all powerful!

If my dripping sarcasm wasn’t clear, you’re way off base here.

Edit 1: to the downvoters, please feel free to explain why it’s appropriate to accuse a 12-year HN member of being a paid shill for the CCP when all evidence points to the contrary.

ncann · on Jan 20, 2021

The one you're replying to sounds like sarcasm to me. Or a very ironic comment given the comment it's replying to. It's hard to tell.

cepth · on Jan 20, 2021

I might’ve thought so too, but the account I’m replying to seems to be a genuine CCP critic (which is totally fine!).

A post about Putin and the CCP exploiting US domestic political turmoil: https://news.ycombinator.com/item?id=25830075

A book recommendation by a longtime China-skeptical writer: https://news.ycombinator.com/item?id=25843422 (https://en.wikipedia.org/wiki/Bill_Gertz)

My personal preference is for a HN community where dissenting views don’t lead to assigning ulterior motives (in this case being paid to shill for the CCP) to our debating “opponents”.

lxe · on Jan 20, 2021

Keybase, twitter, normal twitter tech posts stuff... followers... why randomly throw conspiracies at people?

cepth · on Dec 25, 2020

Given that a typical garbage truck weighs 25+ short tons (https://www.reference.com/world-view/much-garbage-truck-weig...), it seems likely that you start to run into the legal limits for local roads.

E.g. in NYC (https://www1.nyc.gov/html/dot/html/motorist/sizewt.shtml), the max weight for any vehicle is 80,000 lbs, likely much less if you’re the length of a typical garbage truck.

Can you really go electric? A 2016 bullish Quartz article (https://qz.com/749622/the-economics-of-electric-garbage-truc...) says the typical garbage truck travels 130 miles a day. Unclear what the additional weight of a capable battery pack would be, even after accounting for the saved weight by removing what must be a pretty hefty combustion engine. It’s certainly interesting that the company (Wrightspeed) profiled in the article seems to be doing more than just garbage trucks now. Their Route 1000 powertrain/platform only quotes 24 miles of pure EV range (https://www.wrightspeed.com/the-route-powertrain), so it seems like there are certainly trade offs between range and weight regulations.

I imagine once long and medium haul trucks make it to market (perhaps Tesla or Volvo), maybe we can say we’ve reached the energy densities in packs that make hauling around weight equivalent to an 18 wheeler possible at a reasonable cost.

Other reasons that maybe no one has done this already:

* Garbage trucks cost $250k+ (https://bigtruckrental.com/front-loader-garbage-truck-rental...). I’m guessing there are regulatory requirements around collision safety, and maybe longevity requirements that become a factor.

* Given that a brand new sleeper semi goes for 50% less (https://youngtrucks.com/new-trucks/2020-volvo-vnl64t860-860-...), there are probably significant costs not typical of a combustion engine vehicle. Maybe those front loading bins require powerful pneumatics, or those on-vehicle compactors need to be able to exert tremendous amounts of force, and so on and so on.

* How large is the market for garbage trucks? Some press release claimed that the global garbage truck market in 2019 was ~$22 billion (https://www.globenewswire.com/news-release/2020/07/29/206925...). If we use a unit price of $250k (probably on the low end), that would mean only ~88000 total units sold globally. Of course, I’m sure different countries may have larger and smaller trucks, but you get the gist.

* The customers for these trucks are likely to be municipal governments and a handful of private waste management companies (https://craft.co/waste-management/competitors). I think it’d be a huge risk to build out a plant to put together a garbage truck, have maybe a few hundred plausible decisions makers to pitch on the product, and potentially risk burning hundreds of millions in capital between labor, regulatory certification, endurance testing, battery pack development, power train development, etc.

bigbubba · on Dec 25, 2020

They would surely use hydraulics not pneumatics, but this is a good point. Garbage trucks are half truck, half front end loader. And loaders aren't exactly cheap either.

rwmj · on Dec 25, 2020

Thanks - very interesting answer.

cepth · on Dec 24, 2020

I have to commend the author for taking on a beast of a task. It's not easy to be learning/relearning chess while also trying to program an engine!

I think programming a chess engine that can beat most club players (let's say sub-2000 rating) is not too hard. At that level, human players will make inaccurate moves (as well as blunders at lower ratings). It is however much harder to develop engines that are competing at GM level (~2500+).

This wiki has some good high level articles (https://www.chessprogramming.org/Main_Page). Even something like a chess bitboard representation is non-trivial to code.

Implementing minimax and comparable algorithms may be straightforward, but the actual evaluation of positions for traditional engines is a distillation of thousands of pieces of expert knowledge. See for example the parameters that can be tweaked in Fritz (http://help.chessbase.com/Fritz/16/Eng/index.html?000038.htm).

If you look at the history of chess engines (https://www.youtube.com/watch?v=wljgxS7tZVE), by the 1990s and later most of the top chess engines have had masters, international masters, and grandmasters intimately involved with development. For example, Deep Blue had several consulting grandmasters. Rybka (the world's best engine 2007-2010) had IM Vasik Rajlich as primary author and GM Larry Kaufman closely involved with tweaking its evaluation functions. Kaufman also went on to write the (still) very strong engine Komodo (https://ccrl.chessdom.com/ccrl/4040/).

Traditional chess engines used to have glaring weaknesses like playing poorly in closed positions, being poor at avoiding disadvantageous endgames, etc. By the mid-2000s many of these weaknesses disappeared, but as recently as Fritz 9 there were well known opening sequences where engines could be tricked into playing losing lines.

cepth · on Dec 5, 2020

The paper referenced ("Gender shades: Intersectional accuracy disparities in commercial gender classification") has been cited 1000+ times per her Google Scholar page (https://scholar.google.com/citations?user=lemnAcwAAAAJ). For a 2-year old paper, this is easily a top 1% most cited paper.

Take for example a retrospective look at 2017 NeurIPS papers done in 2019 (https://archive.is/wip/77YrB).

You can disagree with how she and/or Google has handled this whole situation, but please do not denigrate work that has been cited (https://scholar.google.com/scholar?oi=bibs&hl=en&cites=14954...) by papers accepted at the most competitive/prestigious ML conferences.

EDIT: I also do not see how in good faith you can say that VentureBeat, a company who makes the bulk of its revenue from running conferences catering to C-suite execs who can shell out thousands of dollars for a ticket, is "leftist".

bzb6 · on Dec 5, 2020

Are you really making the argument that people with money can’t be leftists? Do you know liberals are on average richer? https://www.cnbc.com/2019/09/19/economic-divide-in-the-us-is...

Most executives of tech and news companies would bend over backwards just to show how leftist they are in 2020.

(I’m only replying to the part of your comment that answers mine)

jeromenerf · on Dec 5, 2020

"Left" and "right" carry little meaning when it comes to analyzing the divide between people. The "with/without money" bit is of a much higher order than the global "left/right" bit to me.

I don’t really care about the paper author’s fate, but it seems unsettling to me that she is discussed more than the paper itself.

cepth · on Dec 5, 2020

In the United States, liberal and leftist are distinct terms.

Even someone like Ben Shapiro recognizes a difference between liberals and leftists (https://twitter.com/benshapiro/status/966081078166421504).

Silicon Valley types would hardly be described as leftists. Numerous studies have been done on the attitudes of Silicon Valley founders and execs (https://www.vox.com/2015/9/29/9411117/silicon-valley-politic...). The distinctions are dramatic.

We see that on average, tech founders are less likely to support vs. even Democrats generally (not just progressives):

* Banning the Keystone XL pipeline (60% vs 78%)

* The individual healthcare mandate (59% vs 70%)

* Labor unions being good (29% vs 73%)

This is to say, the average Silicon Valley type, particularly the C-suite exec or founder, tends not to be on the left wing of the Democratic party.

During the 2020 Democratic primary, even the Silicon Valley billionaires who are openly Democratic-leaning donated to candidates who were not to the left of the field (i.e. Elizabeth Warren and Bernie Sanders) (https://www.cnbc.com/2019/08/13/2020-democratic-presidential...):

* Eric Schmidt -> Cory Booker and Joe Biden

* Reed Hastings -> Pete Buttigieg

* Marc Benioff -> Cory Booker, Kamala Harris, and Jay Inslee

* Reid Hoffman -> Cory Booker, Kirsten Gillibrand, Amy Klobuchar

* Jack Dorsey -> Andrew Yang, Tulsi Gabbard

* Ben Silbermann -> Pete Buttigieg

I'm engaging with you in good faith, and because I was intrigued that in a previous comment you mentioned that you live in Spain (though who's to say you're not a US ex-pat). But calling US tech companies "leftist" is a stretch at best.

bzb6 · on Dec 5, 2020

I see, they are the same thing to me, hence the confusion.