Nvidia announces financial results for second quarter fiscal 2024

TheAlchemist · on Aug 23, 2023

What's also pretty interesting that they actually didn't sell more chips this quarter - they ... just pretty much doubled the prices (hence the huge margin).

This is what having a monopoly looks like !

This is also why companies that manufacture their cards didn't report any uptick in profits. I'm wondering how this play out in some months ? Do they have any pricing power with respect to NVidia ? Or NVidia could just switch to another manufacturer ?

PheonixPharts · on Aug 23, 2023

> This is what having a monopoly looks like !

As someone who has been in the AI/ML space for over a decade, and even had an AMD/Radeon card for more than half of that, I can't help but feel that this is partially AMD's own fault.

For many, many years it seemed to me that AMD just didn't take AI/ML seriously whereas, for all it's faults, NVIDIA seemed to catch on very early that ML presented a tremendous potential market.

To this day getting things like Stable Diffusion to run on an AMD card requires extra work. At least from my perspective it seems like dedicating a few engineers to getting ROCm working on all major OSes with all major scientific computing/deep learning libraries would have been a pretty good investment.

Is there some context I'm missing for why AMD never caught up in this space?

ken47 · on Aug 23, 2023

Until very recently, AMD was struggling for survival. Rather than making the big bet on AI, they went for the sure thing by banking on revolutionary CPU tech. I'm sure if they were in a better financial position 5 years ago, they would have gone bigger on AI.

xmprt · on Aug 23, 2023

And arguable their bet on CPU tech worked! AMD is in a much better position today than they were 5+ years ago. They have some catching up to do but that doesn't mean their completely out of the game.

xcdzvyn · on Aug 24, 2023

"much better" is an understatement! "AMD predicted to go bankrupt by 2020"[0]

[0] https://www.overclock3d.net/news/cpu_mainboard/amd_predicted...

bornfreddy · on Aug 24, 2023

Great achievement! Of course, they can also thank Intel for waiting for them to catch up.

izacus · on Aug 24, 2023

They also focused on game consoles, where they won contracts for both major platforms this generation.

zirgs · on Aug 24, 2023

The other non-major Nvidia based console has higher sales numbers than both of so-called major consoles combined.

mschuster91 · on Aug 24, 2023

It's ... difficult to compare a low-power SoC released in 2015 (so, design dating back to 2014 if not earlier) with high-power consoles developed in 2019 onwards.

zirgs · on Aug 24, 2023

They belong to the same generation. Until Switch 2 is released.

asddubs · on Aug 24, 2023

That doesn't mean they're meaningfully comparable (in the context we are talking about). A $100 budget android phone that was released at the same time as the latest iphone also belongs to the same generation, but that doesn't mean the chip manufacturers profit equally from the two (of course apple makes their own chips so in a literal sense this comparison doesn't make sense, but I'm sure you understand what I mean)

And I don't mean to disparage the switch, just point out that the way it's designed is very different which makes the comparison questionable

Toutouxc · on Aug 24, 2023

Previous one too. Xbox One and Series and PS4 and PS5 are all AMD, and there’s several revisions of each (e.g. PS4 OG, Slim, Pro).

tho23iu423o4324 · on Aug 24, 2023

Yes, this is true.

However, a lot of this has to do with the fact that AMD was on the brink of bankruptcy before the launch of Zen in 2016 (when their share price was ~$10). They simply did not have the capital to the kind of things Nvidia was doing (since '08 ?).

The bet on OpenCL and the 'open-source' community failed. However, ROCM/HIP etc. really seems to be catching up (I even see them packaged on Arch linux).

slavik81 · on Aug 24, 2023

> However, ROCM/HIP etc. really seems to be catching up (I even see them packaged on Arch linux).

There are now distro-provided packages on Arch, Gentoo, Debian, Ubuntu, and Fedora.

Dalewyn · on Aug 24, 2023

What really strikes me is Nvidia's been working hard on doing practical work on their GPUs even just 10~15 years ago with PhysX, while both Intel and AMD just existed.

Nvidia's dominance today is the product of at least over a decade of work and investments to make better products. Today they are finally reaping their rewards.

desiarnezjr · on Aug 24, 2023

I remember meeting NVIDIA in the late aughts (2007?) first launching their CUDA efforts. Really the product was a re-branded 780GTX or whatever their high end gaming card was at the time more or less, but they already laid out a clear pathway to today (more or less).

dekhn · on Aug 24, 2023

I remember meeting with them in the mid aughts when they were first talking to HPC folks about using their cards for science. I'll never forget what the chief scientist from nVidia said. "What is the color of a NaN? That is, when you render a texture with a nan value, what does it look like? I'll tell: it's nvidia green."

WanderPanda · on Aug 24, 2023

That is a funny way to signal their commitment to HPC! But compared to other tooling (non GPU) CUDA is still really clunky. Way ahead of everything else in the GPGPU space but still surprisingly clunky. Also I don't get what they are fearing with all their "Account required for download" (e.g. for CuDNN) what are they fearing? And is it really worth the trade-off for the pain it causes for dev environments and CI pipelines? It really seems like Intel and AMD have to step in to break this monopoly to force them to improve the situation for everyone.

jjoonathan · on Aug 24, 2023

No, you're not missing anything, NVIDIA's software is super clunky by the standards of most of the software world. However, for the last decade, the competition has been much worse: OpenCL development on AMD would be riddled with VRAM leaks, hard lockups, invisible limits on things like function length and registers that would cause the hard lockups when you tripped over them without any indication as to what you did wrong or how to fix it, that sort of thing. Cryptic error messages would lead to threads scattered around the internet, years old, with pleas for help and no happy endings.

The thing that caused me to ragequit the AMD ecosystem was when I took an OpenCL program I had been fighting for two days straight and ran it on my buddy's Nvidia system in hopes of getting an error message that might point me in the right direction. Instead, the program just ran, and it ran much faster, even though the nvidia card was theoretically slower.

In terms of quality, I expect the competition to catch up in a generation or two, but then there is still the decade+ of legacy code to consider. Hopefully with how fast AI/ML churns that isn't actually an insurmountable obstacle.

versteegen · on Aug 24, 2023

Years ago I gave up on OpenCL (1.2 on an AMD card) because of those hard lockups, with no way to debug it. nVidia didn't even support OpenCL 1.2 (and IIRC didn't support the synchronisation primitives I wanted in CUDA either -- AMD was more capable on paper). Thanks, I feel better to hear just how bad it was -- so it wasn't just my fault for quitting.

jjoonathan · on Aug 24, 2023

It's a quality meme but I'm having trouble figuring out the settings that make it work. It looks like RGBA8 would be blue:

    >>> struct.pack('f',math.nan)
    b'\x00\x00\xc0\x7f'

maybe that becomes green if you composite over white or something? Or maybe there is a common type of NaN that fills some of the unspecified bits? ("Just use the particular NaN that makes it green" is cheating unless you have an excuse)

nealabq · on Aug 24, 2023

They mean big-endian NaN, taking only the first 3 bytes. No alpha channel.

https://encycolorpedia.com/76b900 says Nvidia green #76b900.

scns · on Aug 24, 2023

Encycolorpedia looks like a great resource, thank you very much. A similar one would be Colorhexa. Not affiliated.

https://colorhexa.com

[Edit] Could not find it under the name, but it shows how color blind users perceive it. And it loads much faster.

https://www.colorhexa.com/76b900

[Edit] Encycolorpedia has a color blindness simulator too. Have to check on desktop.

jjoonathan · on Aug 24, 2023

Cool, thanks!

dekhn · on Aug 24, 2023

It was an arbitrary decision by the engineers who made the early GPUs, they just mapped NaN to an RGB

It was a nice way to debug tensors: render them to the screen, the green sticks out.

hinoki · on Aug 24, 2023

32-bit NaN is encoded: s111 1111 1xxx xxxx xxxx xxxx xxxx xxxx

Where both the sign (s) and the x bits can be anything and it will still be treated as a NaN.

There are lots of ways to encode colour, but there would be too much red with RGBA, and ARGB could be almost any opaque colour, but the red channel has to be at least 0x80, which is still too much red.

So NaNs are too red to encode nvidia green.

andersa · on Aug 24, 2023

I once ended up having nans get interpreted as 32 bit colors accidentally, and it made everything red and white, like Christmas decoration.

Wonder what caused the difference in the latter bits being all 1 or 0 together.

rasz · on Aug 24, 2023

I remember uni course on GPGPU and only discovering during first lecture Nvidia donated hardware to make sure it would be Cuda only course.

rasz · on Aug 24, 2023

>doing practical work ... even just 10~15 years ago with PhysX

practical work with PhysX 13 years ago: https://www.realworldtech.com/physx87/

"For Nvidia, decreasing the baseline CPU performance by using x87 instructions and a single thread makes GPUs look better."

Nvidia magically released PhysX compiled with multithreading enabled and without flags disabling SSE a week after this publication. But couple of days before release they made those funny statements:

"It's fair to say we've got more room to improve on the CPU. But it's not fair to say, in the words of that article, that we're intentionally hobbling the CPU," Skolones told Ars.

"nobody ever asked for it, and it wouldn't help real games anyway because the bottlenecks are elsewhere"

>Nvidia's dominance today is the product of at least over a decade of work

Nvidias decade of work:

Ubisoft comments on Assassin’s Creed DX10.1 controversy https://techreport.com/news/14707/ubisoft-comments-on-assass...

AMD says Nvidia’s GameWorks “completely sabotaged” Witcher 3 performance https://arstechnica.com/gaming/2015/05/amd-says-nvidias-game...

AMD Dubs Nvidia’s GameWorks Tragic And Damaging, Fight Over The Developer Program Continues https://wccftech.com/fight-nvidias-gameworks-continues-amd-c...

"Number one: Nvidia Gameworks typically damages the performance on Nvidia hardware as well, which is a bit tragic really. It certainly feels like it’s about reducing the performance, even on high-end graphics cards, so that people have to buy something new.

"That’s the consequence of it, whether it’s intended or not - and I guess I can’t read anyone’s minds so I can’t tell you what their intention is. But the consequence of it is it brings PCs to their knees when it’s unnecessary. And if you look at Crysis 2 in particular, you see that they’re tessellating water that’s not visible to millions of triangles every frame, and they’re tessellating blocks of concrete – essentially large rectangular objects – and generating millions of triangles per frame which are useless."

"The world's greatest virtual concrete slab" https://web.archive.org/web/20121002034311/http://techreport... (images "somehow" vanished from original article at techreport where Nvidia runs marketing campaigns)

"Unnecessary geometric detail slows down all GPUs, of course, but it just so happens to have a much larger effect on DX11-capable AMD Radeons than it does on DX11-capable Nvidia GeForces. The Fermi architecture underlying all DX11-class GeForce GPUs dedicates more attention (and transistors) to achieving high geometry processing throughput than the competing Radeon GPU architectures."

_gabe_ · on Aug 24, 2023

> But GameWorks' capabilities are necessarily Nvidia-optimized; such code may perform poorly on AMD GPUs.

From the arstechnica article about Witcher 3.

How dare Nvidia optimize their game enhancing effects for Nvidia hardware and forget to do it for their competitors hardware as well! And as for a lot of these complaints, could it be that a lot of companies only optimize for hardware that has the largest market share?

According to the steam hardware survey in July of 2023, Nvidia accounts for 75% of the GPUs[0]. Nvidia and Amd have a lot of incompatibilities, and it can be hard to make the same code performant on both. It makes sense, as a game company, to prioritize optimizations for the largest market. No collusion and evil corporate mega lord scheming needed for this.

Edit: Also, Nvidia does put out a lot of research efforts for free. Path rendering on the GPU for example (PhysX being another). You can find research papers and videos published by Nvidia for these things. I would consider that practical work. You can hate on Nvidia for lots of things, but this is one thing I find weird to be combative over.

Second Edit: Also, why do you find the statements Nvidia said about the PhysX improvements funny? They’re right. Most games 13 years ago left a lot of idle time on the GPU while the CPU worked overtime to do logic, physics, sound, culling, etc. Lots of that stuff has also been moved to the GPU to minimize the amount of idle time on either the CPU or GPU. Nothing funny about what they said there.

[0]: https://store.steampowered.com/hwsurvey/

girvo · on Aug 24, 2023

> could it be that a lot of companies only optimize for hardware that has the largest market share?

Yes, but also Nvidia partners directly with companies too (money can/does change hands).

Now the flip side is: so does AMD.

Dalewyn · on Aug 24, 2023

>Now the flip side is: so does AMD.

And their antics shutting out Nvidia (eg: FSR only, no DLSS) aren't being received well, not the least because their offerings are objectively inferior to Nvidia's.

girvo · on Aug 24, 2023

While Nvidia isn’t doing these silly antics today, they’ve absolutely done them in the past. None of the large consumer silicon companies have clean hands with respect to anticompetitive/anti-consumer behaviours. They’ve all got too much power frankly.

paulmd · on Aug 24, 2023

This is factually jot the case as confirmed by Crytek developers at the time. Wireframe mode turns off clipping and cranks up LOD to max, and normally neither the water table would be visible (under the ground) nor would that block be rendered at that LOD.

https://old.reddit.com/r/pcgaming/comments/3vppv1/crysis_2_t...

edgyquant · on Aug 23, 2023

100% this. I, and many others, bought multiple AMD cards due to disliking NVidia and tried to get ROCm set up to no avail. It just never worked except under hard to maintain configurations. I switched to an nvidia card and within the hour import tensorflow just worked

fleischhauf · on Aug 24, 2023

to be fair Nvidia drivers are also a nightmare under linux

jacooper · on Aug 24, 2023

Not anymore

kristopolous · on Aug 24, 2023

Until they are. When the system breaks it breaks real bad. Nvcc/gcc/cuda/kernel mismatches are a pain to match up right. It gets gnarly super fast.

All systems hit snags. In most, you skid the tires a bit, maybe lose balance. With nvidia you're flying over the handlebars on to the asphalt.

I got snagged by this just about 2 weeks ago. It gets nasty. Not as bad as CUPS, but probably #2.

michaelt · on Aug 24, 2023

Eh, depends what you want out of them.

Do you want solid dependability and settings that are right first time, because you've only got one PC and if the graphics get broken you've got no browser to google for a fix?

Do you want the absolute most up-to-date drivers, to support the very newest GPUs, while running an LTS version of your OS?

Do you want to always run the latest driver version and upgrade without testing or worrying, like we do for web browsers?

Do you want to run CUDA and ML stuff, but also want to run Steam which for some reason wants 32-bit support available?

Do you want to run on a laptop with hybrid graphics, and have suspend/resume work reliably every time?

Do you have a small /boot/ partition, because you expected initrd.img to be 50MB or less?

Do you want to support Secure Boot?

If you want to achieve all these things at once, it'll take you a few tries to get it right :)

uluyol · on Aug 24, 2023

Catching up in this space requires a significant, sustained investment over multiple years and competent software engineers. It's not a simple thing for a hardware company to suddenly become competitive with Nvidia in AI/ML.

Instead, they've been going after the CPU market (and winning), HPC/scientific computing (high FP64 performance, in contrast to Nvidia's focus on low-precision ML compute), and integrating Xilinx.

However, I agree that it's an unfortunate situation, and I hope AMD becomes competitive in this space soon.

slashdev · on Aug 24, 2023

I think their hardware is comparable with nvidia. The problem is the software is awful by comparison. It’s hard to run any of the AI workloads with AMD, and even when you can the performance is poor. The software investment just hasn’t been made. Until then they are not even in the game.

FuriouslyAdrift · on Aug 24, 2023

AMD has an entire line specifically for AI/ML... https://www.amd.com/en/graphics/instinct-server-accelerators

They just don't have those capabilities in their consumer GPUs.

AMD is also nearly 50/50 with nVidia for supercomputers in the Top500 (and dominates at the top)

It took a few years after completeing the massive purchase of Xilinx to get going, but they are picking up speed rapidly.

paulmd · on Aug 24, 2023

AMD should do a high-memory-density MCD variant of 7900XT/XTX with a MCD that has 4 PHYs instead of 2. You could get 7900XTX to 48GB with no clamshell and 96GB with clamshell, which is getting into H100 territory.

xiphias2 · on Aug 24, 2023

Look at the good thing instead: they are catching up, and open source devs are starting to be serious about AMD because of its price/performance.

I believe it's a highly undervalued stock right now.

htrp · on Aug 24, 2023

AMD gave up on the market for parallel compute entirely

JB_Dev · on Aug 24, 2023

Nvidia and really all chip designers are limited by the fab companies who are trying to scale as fast as they can. But all the cutting edge fabs are limited by one single supplier - ASML. ASML make the lithography machines and have a total monopoly. Even they cannot make lithography machines fast enough to satisfy demand - their lithography machines are sold out 2 years in advance

paulmd · on Aug 24, 2023

The current limitations are not about litho at all but actually about CoWoS stacking capacity.

riow · on Aug 25, 2023

``` CoWoS stands for Chip on Wafer on Substrate. It is a high-density packaging technology for high-performance chips. TSMC developed CoWoS in 2012. In CoWoS, multiple silicon dies are placed on a silicon interposer, which is an intermediate layer on the package board. The interposer acts as a communication layer for the active die on top. CoWoS is a 2.5D packaging technology. It is widely used in high performance computing. ```

thfuran · on Aug 23, 2023

There probably isn't another manufacturer they can switch high end stuff to. They recently tried moving at least some of their cards to Samsung but switched back last generation due to yield issues.

wmf · on Aug 23, 2023

You have to distinguish between fabs and AIBs.

thfuran · on Aug 23, 2023

If they treat their AIBs for their enterprise stuff anything like they do in the consumer space, they don't really have anything to worry about there (aside from the rest of them giving up on dealing with Nvidia's BS, I guess).

kelvie · on Aug 23, 2023

For my sake, what number did you look at to come to this conclusion? I'm not used to reading these quarterly reports.

TheAlchemist · on Aug 24, 2023

You can look at cost of revenue to get an idea.

tomnipotent · on Aug 24, 2023

> just pretty much doubled the prices

They prices were already double, it was just scummy resellers capturing that value rather than Nvidia.

dekhn · on Aug 24, 2023

nvidia deserves their monopoly and it's in the US's best interest to let it continue.

the other companies have to up their game to compete with a company that has been executing well for 20+ years.

BbzzbB · on Aug 23, 2023

Isn't this more a case of supply and demand? Huge ramp on chip demand by every FAAMG, every dev and their grandmothers for AI with a mostly inelastic supply (foundry constrained and very specialized atoms tech involved).

It's not like Intel and AMD don't exist, but if everyone is pushing each other at the door for Nvidia chips..

jfoutz · on Aug 24, 2023

Amazon, Alphabet, Meta. Who are the F and G, and where is apple?

BbzzbB · on Aug 24, 2023

You know, Meta and Alphabet which every one outside of their respective CEOs still call Facebook and Google.

And by saying FAAMG instead of FAANG I make the statement that Cramer was high as a kite when he put Netflix in there instead of Microsoft. Today you might make a case for an N, but not at Microsoft's expense.

buzzerbetrayed · on Aug 24, 2023

Facebook, Amazon, Apple, Microsoft, and Google

SAI_Peregrinus · on Aug 24, 2023

While I hate using the stupid "meta" name, MAGMA is a much more fun acronym than FAAMG, or GMFAA, or MAGAF, etc.

BbzzbB · on Aug 24, 2023

It's cornier and yet still technically incorrect by not using Alphabet. And MAMAA is even cornier without putting in another vowel; arguably Nvidia's N but I feel like it doesn't quite belong. So I stand by my FAAMG, it sounds right, and everyone uses Facebook and Google rather than Meta and Alphabet anyway.

aiisjustanif · on Aug 24, 2023

Meta Apple Microsoft Alphabet Amazon

MAMAA

ttunguz · on Aug 24, 2023

I'm curious where did you find the data point that they sold an equal number of units quarter over quarter?

hospitalJail · on Aug 24, 2023

>This is what having a monopoly looks like !

Yep and we are suffering as a result. Want the best in computing and CUDA? Give M$ and Nvidia money.

Linux and Nvidia don't play well. Apple doesn't even attempt to try.

The absolute state of computing right here.

Mistletoe · on Aug 23, 2023

How does the Nvidia stranglehold on AI compare to US Steel, Standard Oil and Bell Telephone etc., other monopolies that were broken up?

zirgs · on Aug 24, 2023

There isn't a monopoly. Their competitors just are really bad at it.

pb7 · on Aug 23, 2023

[flagged]

parthdesai · on Aug 23, 2023

> Raising prices means you are a monopoly?

Not sure if you're intentionally choosing to ignore their point, but what they meant is Nvidia can unilaterally choose to raise the prices and customers can't do anything since they're a monopoly. You can't just say well, i'll go to the next shop and buy something for cheaper.

issafram · on Aug 23, 2023

Not that it's much better, but wouldn't it be a duopoly considering that AMD is also a big player?

Hopefully Intel continues to improve it's GPU offerings

capableweb · on Aug 23, 2023

> Not that it's much better, but wouldn't it be a duopoly considering that AMD is also a big player?

Not sure AMD would be considered a big player, what would be the percentage threshold for that?

According to the Steam Hardware (& Software) Survey (https://store.steampowered.com/hwsurvey/Steam-Hardware-Softw...), ~75% of computers with Steam running has a NVIDIA GPU, while ~15% has a AMD GPU.

AMD is the closest to a competitor NVIDIA has, but they are also very far away from even being close to their market-share.

I'm sure in AI/ML spaces, NVIDIA holds a even higher market-share due to CUDA and the rest of the ecosystem, at least in gaming things are pretty much "plug and play" when it comes to switching between AMD/NVIDIA hardware, but no such luck in most cases with AI/ML.

swyx · on Aug 23, 2023

> ~75% of computers with Steam running has a NVIDIA GPU, while ~15% has a AMD GPU.

and thats the consumer market, which lets say is 30% of the B2B enterprise market, which is probably even higher % nvidia

User23 · on Aug 23, 2023

AI cloud is something like 95% Nvidia.

michaelt · on Aug 24, 2023

> the B2B enterprise market, which is probably even higher % nvidia

I'd wager the enterprise market is 90% intel integrated graphics.

It's the cheapest option, the best for laptop battery life, and modern integrated graphics can run excel and a web browser at 4k just fine.

izacus · on Aug 24, 2023

And then there's the console market, with more units sold than PC gaming market, with 100% AMD GPUs.

zirgs · on Aug 24, 2023

The Switch is based on Nvidia hardware.

izacus · on Aug 24, 2023

And the high budget consoles aren't.

tric · on Aug 23, 2023

> wouldn't it be a duopoly considering that AMD is also a big player?

I don't think GPUs are commoditized. You can't swap a Nvida GPU with a AMD GPU, and get the same performance/results.

midhir · on Aug 23, 2023

AMD seem to be catching up quickly lately. I'm running Stable Diffusion, Llama-2, and Pytorch on a 7900XTX right now. Getting it up and running even on an unsupported Linux distro is relatively straightforward. Details for Arch are here: https://gitlab.com/-/snippets/2584462

The HIP interface even has almost exact interoperability with CUDA, so you don't have to rewrite your code.

ABeeSea · on Aug 23, 2023

Inference and training are not the same things. AMD has basically no market share in training.

zirgs · on Aug 24, 2023

Now try doing the same on Windows.

srj · on Aug 24, 2023

I interned at NVIDIA in 2009 on the kernel mode driver team. Was super fun there in terms of the project work and the people. If the code still exists, I created the main class that schedules work out to the GPU on Windows.

That level of programming gave such rewarding moments in between difficult debugging sessions. When I wanted to test a new kernel driver build I needed to walk into some massive room with all of these interconnected machines that emulated the non-yet-fabricated GPU hardware. One of the full time people on my team was going insane trying to track down a memory corruption issue between GPU memory and main memory when things paged out the entire time I was there.

Back then the stock was around $7/share and the CEO announced a 10% paycut across the board (even including my intern salary) and had an all hands with everyone in the cafeteria. It's pretty cool they went from that vulnerable state, with Intel threatening to build in GPU capabilities, to the powerhouse they are today.

alanfranz · on Aug 24, 2023

> the CEO announced a 10% paycut across the board

Which is still better than a 10% layoff, anyway!

M3L0NM4N · on Aug 24, 2023

Only for 10% of the workers.

bottled_poe · on Aug 24, 2023

I guess that depends if you want to do 11% more work, and probably the tedious stuff.

henry2023 · on Aug 24, 2023

Morale after layoffs goes down for all workers. So no, not only for 10% of the workers.

M3L0NM4N · on Aug 25, 2023

More than having their salary cut by 10%?

granshaw · on Aug 24, 2023

Interned there Summer of 08. Remember mentions of “this CUDA thing” then, that was during its infancy.

Midway through, our intern friend group found out one of the smaller buildings had a buffet lunch and started taking the shuttle there often.

Saw this tweet just now and that lanyard holder really brought back memories, hasn’t changed at all: https://x.com/jimcramer/status/1694465908234699243?s=46&t=NA...

1-6 · on Aug 24, 2023

Do you still own stock?

the_svd_doctor · on Aug 24, 2023

He probably didn’t get any as an intern.

enos_feedler · on Aug 24, 2023

I joined as a new college grad (full time) the same year. I remember the 10% pay cut announcement. I didn't get any stock granted to me in the 4 years I was there as a SWE working on CUDA. They had an ESPP you could put 10% of your paycheck into or something like that though.

srj · on Aug 24, 2023

I vaguely remember that I could participate in the employee purchase program. This was during the financial crisis though and I was an undergrad intern with a paltry ability to invest.

Even if I had hit received some great stock award in all likelihood I would have sold it for index funds at my first opportunity. I imagine some of my old coworkers made out quite well though.

radium3d · on Aug 24, 2023

I do wonder though, why has moore's law stopped in its tracks? Using CS:GO benchmark, a 1070 got 218 FPS, while a 4090 is at 477 FPS. Only a ~2.2x increase in FPS in 6 years? :(

dahart · on Aug 24, 2023

Between those 2 GPUs, the fp32 perf went up 12.7x according to TechPowerUp’s specs. The SM count went up 8.5x (which represents Moore’s law and note is almost exactly in line with Moore’s prediction), and the clock rate went up 1.5x. The FPS of CSGO (or any game) is not a good measure of Moore’s law. Games have all kinds of complexities and caveats that will prevent them from scaling linearly. I used to write some of those bottlenecks :P What are the 2080 and 3080 datapoints for CSGO? Did it approach 400-500 fps on the 2080 and never get any faster after that?

radium3d · on Aug 24, 2023

Wow, didn't expect to get so many comments. It's just interesting to me that we have 500hz monitors, yet games are still coming out that only run at ~70 fps on a 3080, etc. I used CS:GO as a comparison just because it was an example capable of coming near 500 FPS. Avg FPS are not going up fast enough in my opinion. I understand some games purposefully limit FPS for physics calculations, but there are many that do not.

KronisLV · on Aug 24, 2023

> Avg FPS are not going up fast enough in my opinion.

I've never had hardware to run above 1080p at 60fps (monitors, GPUs), so my desires are a bit orthogonal - I wish more games let you customize the graphics settings so that you could maintain a consistent framerate (be it 30 on a low spec laptop, 60 on a regular PC or more fps) at a resolution of your choice.

Things like switching between baked or dynamic shadows, their type/filtering, things like ambient occlusion and tessellation, various post processing effects and filters, model LOD limits and texture resoltion, resolution scaling and so on.

More so, it would actually be nice if games let you download a version with lower fidelity assets (like War Thunder sort of does, for example), so you don't need 100 GB of models and textures if you're realistically only ever going to see 20 GB of those on your hardware.

Thankfully, most modern game engines scale both up and down decently, for example, Unity's URP (though historically a bit half baked and fragmented the community/assets). It's just up to the developers to get over the hubris of wanting "low" settings to still be pretty, that choice should be up to the user.

Night_Thastus · on Aug 24, 2023

>There’s a good reason to never go above 70fps: human perception experts tend to agree that our visual system gives us quickly diminishing returns above 60fps, we can’t really see things any faster than that

No. Please don't spread this nonsense rumor. It hasn't ever been true and still isn't.

There are always diminishing returns, but human vision is perfectly capable of noticing the difference between 60 and say, 120. You can literally try this yourself just moving your mouse around on a 120Hz monitor and then capping the refresh rate. In general human vision is really complicated and not tied to any specific "speed".

Also, displays are more complicated. There isn't just a refresh rate, there's differing pixel responses at differing brightness and a variety of different processing delays all the way from the moment you put in some hardware input (like the mouse), to the final result. It's a huge chain.

The end result is that two 120hz displays can behave completely differently in terms of motion clarity.

paulmd · on Aug 24, 2023

We are pretty quickly approaching the point where a "frame scanout" is going to lose meaning. Areas of the image will be rendered and scanned out optimally ala "foveated rendering" and also sent over the pipe in this fashion too.

TAA fundamentally decouples the sample generation from the output generation. TAA samples (even native res) have no direct correspondence to the output grid, they are subpixel sampled and jittered. The output grid takes the samples nearby and uses them to generate an output for each pixel.

TAAU/DLSS2 take this further and decouple the input grid from the output grid resolutions entirely. So now you have a 720p grid feeding a 1080p grid or whatever. Thinking of it as "input/output frames" is clunky especially when (again) the input frame doesn't even correspond directly to the input samples - they're still jittered etc. Think of it as a 720p grid overlaid over a 1080p grid, and DLSS is the transform function between these two (real/continuous) spaces. Samples are randomly (or purposefully) thrown onto that grid.

OLEDs are functionally capable of individual pixel addressing if we wanted to. Current ones probably can scan out lines in non-linear order already, so you could scan out the center more than the top/bottom, for example. And OLEDs are already capable of effectively >1000hz real-world response time from their pixels. They just are absolutely critically bottlenecked by the ability to shove pixels through the link and monitor controller quickly enough (give me 540p 2000hz mode, you cowards)

This all leads to a question of why you are still sampling and transmitting the image uniformly. If there's parts of the image that are moving faster, render those areas faster, and with more samples! Maybe you render the sword moving at 200fps but the clouds only render at 8fps. And Optical Flow Accelerator can also allow you to identify movement within the raster output and correlate this with input - if you think about oculus framewarp, what happens if we framewarped just one object? Translate it around against the background, stretch it to simulate some motion aspect, etc. And if you further refine that to a 1x1 region, then you have individual pixel framewarp.

You can also have a neural net which suggests which regions are best to render next, for "optimal" total-image quality after the upscaling/framewarp stages, based on object motion across the frame and knowledge of the temporal buffer history depth in a particular pixel/region (and this is the sort of abstract, difficult optimization problem ML is great at). And those pixels actually might be rendered at multiple resolutions in the same frame - there might be "canary pixels" rendered at low resolution that check whether an area has changed at all, before you bother slapping a bunch of samples into it, or you might render superfine samples around a high-motion/high-temporal-frequency area (fences!). So now you have "profile guided rendering" based on DLSS metrics/OFA analysis of the actual scene being rendered right now.

Another random benefit is that "missed frames" lose all meaning. The input side just continues generating input for as long as possible, as many samples at whatever places will be most efficient. If it's not fully done generating input samples when it comes time to start generating output... oh well, some cloud has less temporal history/fewer samples. But DLSS does fine with that! As long as you still have some history for that region you will get some output, and I'm sure if it was important then it'll be first thing scheduled in the next frame.

You can use this idiom with traditional scanout/raster-line monitors, but ideally to take advantage of OLED's ability to draw specific pixels, you probably end up with something that looks a lot like a realtime video codec - you're sending macroblocks/coding tree units that target a specific region of the image with updates. Or you use something like Delta Compression encoding, like with lossless texture compression.

https://en.wikipedia.org/wiki/Coding_tree_unit

This already is sort of what Display Stream Compression is doing, but that's just runlength encoding, and if the monitor can draw arbitrary pixels at will (rather than being limited to lines) then you can do better.

And again, you can't transmit the whole image at 1000fps, but you can target the "minimum error approximation" so that on the whole the image is as correct as possible, considering both high-motion and low-motion areas. This is sort of an example of the concept - notice the scanline errors in high-motion areas. Great demo but linked to the direct example:

https://youtu.be/MWdG413nNkI?t=176

https://trixter.oldskool.org/2014/06/20/8088-domination-post...

This also raises the extremely cursed idea of "compiled sprites" inside the monitor controller - instead of just running a codec, you could put a processor on the other side (like the G-Sync FPGA module) and what you send is actually the program that draws the video you want. Executable Stream Compression, if you will. ;)

(but there's no reason you couldn't put THUMB or RISC-V style compressed instructions in this - and you certainly could change up the processor architecture however you want, as long as you don't mind doing a Gsync-like compatibility story. It makes upgrading capabilities a lot easier if you control both ends of the pipe, that's why NVIDIA did g-sync in the first place! And there is probably nothing more flexible or powerful than allowing arbitrary instruction streams to operate on the framebuffer (or a history buffer, whether frame history or macroblock/instruction history) or to draw directly to the frame itself. With the bottlenecks that DP2.0 and HDMI2.1 present, this is probably the way forward if you want lots more bandwidth out of a given link speed.)

I said a long time ago that this is basically a "JIT/runtime" sort of approach and people kinda laughed or said they didn't get it. But it's funny that trixter actually used the same analogy there for his demo. But basically DLSS is an engine unto itself already, DLSS is what does the rasterizing, and the engine just feeds samples (that are entirely disconnected from the output, they go into the black box and that's the end of it). And with fractional rendering you can basically view that as a big JIT/runtime. DLSS provides quality-of-service to pixels by choosing which ones to schedule for "execution" and with what time quanta, to produce optimal image quality ("total QOS") over the whole image. And then the resulting macroblocks are individually sent over when ready.

Effectively this works like the biological eye - there is no "frame", changes happen dynamically across the whole frame constantly. Or another analogy would be "what if we could 'chase the beam' but across arbitrary pixels in the frame"? If the motion is predictable then you can schedule the rendering so that the output and blit happens at the exact proper time, down to 0.1ms accuracy. It's Reflex for Pixels.

DLSS knows when a person is running and about to peek out from behind a wall. DLSS knows when someone was sitting there and can render the lowest-latency-possible update when they suddenly pop out of cover and AWP you (can't read minds/make network latency disappear, but it can update it as quick as it can). That already shakes out of the motion vector data and just needs to be generalized to a world where you can render out specific regions/lines at 1000hz.

(I guess technically "frame" still exists as a notional concept inside the game loop, you are generating game state in 60fps intervals or whatever, but rendering is totally decoupled from that and you run the GPU on whatever parts of the image would benefit the most from touchups at a particular moment.)

But again you can see how this whole idea fundamentally inverts control of the game engine - Reflex already tells the game when to sleep and when to start processing the next game-loop, now DLSS will tell the engine what pixels it wants sampled, and handles rasterizing the samples and compressing/blitting them to the monitor. The engine is just providing some "ground truth" visual input samples and calling hooks in DLSS.

(sorry, long post, but I've been musing about this elsewhere and I looooove to chitchat lol)

dahart · on Aug 24, 2023

This is a very good point. It would be ideal to spend the budget rendering only fractional updates to the image, and allow those updates to happen much faster than what would be 60fps. This way we could get 1000hz updates without it costing 10x more than 100hz. While I’m skeptical about the supposed perceptual benefits of full frame rates above 120hz or 240hz outside of the latency argument, foveated fractional rendering could end up being the fastest and best and cheapest.

> you are generating game state in 60fps intervals or whatever

This is also an excellent point that might question my suggestion that high frame rate is being used to reduce latency. Games state updates are already decoupled from rendering in lots of games. Having an extremely high render refresh might not mean that the latency between controls and visuals is reduced proportionally. Or maybe it helps but has a limit to how much.

DLSS is an interesting topic here. Do you see it eventually working for fractional updates? We’d possibly need a new style of NN or of inference? DLSS currently operates on a full frame, and the new version even hallucinates interpolated frames to boost fps artificially. This doesn’t help with control latency at all, in fact it makes it worse.

paulmd · on Aug 25, 2023

> DLSS is an interesting topic here. Do you see it eventually working for fractional updates? We’d possibly need a new style of NN or of inference? DLSS currently operates on a full frame, and the new version even hallucinates interpolated frames to boost fps artificially. This doesn’t help with control latency at all, in fact it makes it worse.

Yeah I've been playing fast and loose with terminology here, there's several overlapping but synergistic ideas that aren't the same things. To try and clean this up:

DLSS1 required the full image, it actually was an image-to-image transformer that "hallucinated" a full-res image from an input image. This sucked and NVIDIA gave it up (except for the 500mb of DLSS models that live eternally in the driver for the 2 games that opted for driver-level model distribution). Nobody cares about this anymore at all.

DLSS2 does not need a full image, because it is a TAAU algorithm that weights samples using a ML model. If there aren't enough samples in an area oh well, you just get crappy output (like immediately after a scene change). It manifests as either obvious resolution pop/detail pop, or visual artifacts on moving/high-res things. IIRC this can be assisted by drawing invisible (1% alpha) objects in motion/fences/etc to "warm up" DLSS sample history on the (invisible) edges before just popping them into existence iirc lol.

You need at least some samples near that area, it can't render from nothing, but DLSS2 is not dependent on rendering out the full image to work - if some unrelated part of the image doesn't have samples, oh well. (and this may allow mGPU scaling with reasonable correctness for partitioning an image!). Personally I consider this "loosy goosy correctness" attribute of ML models to be extremely desirable for GPGPU programming - if some edge case messes up 0.1% of samples, the ability of ML models to just ride over it and spit out a reasonable output is super desirable. This includes things like camera noise, dead pixels, etc. Extremely tolerant of data ingest etc. Like if 10 threads aren't quite finished with their sample output because you don't want to wait for kernelfence sync when it's time to start rendering the output buffer... just start going. It'll be fine.

Fractional rendering is a separate and unrelated idea, but I think the time is right with OLEDs here, and with everyone searching for a way to extend perf/tr with costs spiraling it makes sense to see if you can render "better" imo.

Variable rate sampling is another concept that builds on fractional rendering. Render some areas at a higher rate than others. And again this is something that DLSS2 plays nicely with.

--

DLSS3 is actually the successor to DLSS2 and is supported by all RTX cards (yup). Framegen is one of the features in this, and that is only supported on Ada. Supporting framegen requires the inclusion of Reflex, which does benefit everyone hugely.

Reflex basically flips the "render+wait" model to be a "wait+render", by adding a wait at the start of the game loop that delays until the last possible second to start processing the frame, so it's as fresh (input latency) as possible. And this does legitimately cut latency significantly (by ~half) in highly gpu-bound scenarios. And that gives NVIDIA some headroom to play with in framegen tbh. Igor's Lab and Battlenonsense both found NVIDIA to have much lower click-to-photon latency than AMD Antilag, by like 20-30ms in overwatch f.ex.

Framegen as currently implemented is interpolation and yeah that does increase latency. But NVIDIA do have some headroom to play with there, in CPU-bound situations (which is different!). And tbh most people who actually have used it generally seem to find it not too bad, it's the "eew I tried it at the store and it was awful!"/"i've never tried it" who are most vocal about the latency. It's at least an option in the toolbox (see again: starfield).

I think it is possible to move to extrapolation and I hope the current framegen is only an intermediate step. And I think Optical Flow Accelerator is a really cool building block for that. The performance and precision has improved a bunch over the gens, and now it can support 1x1 object tracking (which I mentioned above as seeming like a significant threshold/milestone) so it's flowing pixels really. I see that as being a Tensor Core-like moment that people scoff at but has big implications in hindsight. Being able to incorporate realtime image data back into the upscaling/TAA pipeline seems big even beyond just framegen itself, I don't doubt DLSS3.5 will make further progress too.

You don't need extrapolation (or interpolation) at all. but if you can extrapolate per-pixel, the ability to do a low-cost "spacewarp" that accomplishes most of the squeeze of a full re-render (in terms of moving edges/texture blocks) at much lower cost could would be very interesting. And the OFA could end up being a key building block in that sort of thing.

--

Again kind of a topic shift but there's also this issue of display connection (there is never enough bandwidth) and whether it's lines or macroblocks etc. That's a capability that's offered by OLEDs in theory, and could be explored with a similar FPGA approach/etc. If you can do that, it pairs with variable rate sampling concepts (and ML input tolerance for bad data) very nicely - render out the regions you're updating whenever they're ready, or whenever is optimal for that element to be drawn (to get minimum error).

And in fact quite a few of these ideas synergize nicely together. If you put it all together.

--

These are all kinda separate in general but quite a few of them synergize if you put them together. And I think the zeitgeist is ripe on some, brian heemskirk was talking about some similar ideas on MLID's show a few months ago (not the most recent appearance).

dahart · on Aug 25, 2023

This is great stuff. I don’t have anything useful to add, just wanted to say thank you, TIL. You don’t have links to any write ups about the latency testing by Igor’s Lab & Battlenonsense do you? I’d be interested to learn more about what typical click-to-photon latencies are today for given refresh rates, and how the latency changes wrt refresh rate. The latencies must be absolutely horrendous if we’re talking about differences of 20-30ms? That would tend to justify super high frame rates (assuming they actually reduce latency!), but it’s funny to me that rendering frames faster and faster is seen as the solution, rather than attacking the issue of a render+display pipeline with insane and growing latency.

paulmd · on Aug 27, 2023

https://www.igorslab.de/en/radeon-anti-lag-vs-nvidia-reflex-...

https://www.igorslab.de/wp-content/uploads/2023/04/Overwatch...

https://youtu.be/7DPqtPFX4xo?t=727

(highly pronounced in this game but)

Yes I agree that driver overhead+latency reduction in the pipeline matters a lot and that's the tool Intel just dropped (and has probably optimized their own driver for of course). Classic Tom Peterson, lol, just like FCAT.

https://www.youtube.com/watch?v=8ENCV4xUjj0

dahart · on Aug 24, 2023

I don’t disagree with your points, vision is indeed complex and not discrete, displays and games can all be different, but there is plenty of scientific perception research to back up my statement that 500 fps is not 5x better than 100 fps to a human. We don’t need 500fps movies, ever. Games want high fps because there’s a feedback loop.

Do you have sources that show otherwise and back up your claim that this idea is “nonsense”? I will dig up some scientific sources. Are you perhaps reading into my comment and not responding to what I said literally?

You aren’t really addressing what was main point: that fps throughput isn’t the reason for high frame rates in games. The primary reason for this happening is to decrease latency.

We wouldn’t need 500fps for games if we lowered the latency. Or at the very least, the benefits would be much lower. Reducing latency is a great reason to want high fps, but there are other ways to reduce latency.

You replied to the wrong comment, btw. I almost didn’t catch your reply.

Edit: links that discuss the measured speed of human perception:

http://web.cs.wpi.edu/~claypool/papers/fr/fulltext.pdf

(Study is limited to 60 fps, but clearly shows the trend of diminishing returns.)

https://www.healthline.com/health/human-eye-fps

(Argues it’s higher than 60 for some tasks… up to 90 or 100fps.)

https://www.rtings.com/monitor/learn/60hz-vs-144hz-vs-240hz

“there are diminishing returns when it comes to the refresh rate. Most people can perceive improvements to smoothness and responsiveness up to around 240Hz; however, the difference between a 240Hz and 360Hz panel is so small that even competitive gamers might have a hard time telling them apart. If you have a choice between a 1440p 240Hz and a 1080p 360Hz monitor, you're probably better off getting the 1440p option, as the increase in resolution has a much larger impact on the overall user experience.”

https://www.neurotrackerx.com/post/5-answers-to-the-speed-li...

(Mentions seeing a single frame of color at 500fps. This is true! Humans can see a flash of light that’s much shorter than 2ms. For perceiving imagery and tracking motion, the available evidence shows little benefit to going higher than 120fps.)

Night_Thastus · on Aug 24, 2023

You're twisting my words and changing what you claimed.

>my statement that 500 fps is not 5x better than 100 fps to a human

I never said that. I also stated that I know about diminishing returns. You claimed "There’s a good reason to never go above 70fp". There absolutely is a large difference in motion clarity between 70 and something like 120/240/etc. Is the difference from 70->120 as large as 30->60? No, absolutely not. But it is significant and can be seen easily with a cheap monitor.

>You aren’t really addressing what was main point: that fps throughput isn’t the reason for high frame rates in games. The primary reason for this happening is to decrease latency.

>We wouldn’t need 500fps for games if we lowered the latency. Or at the very least, the benefits would be much lower. Reducing latency is a great reason to want high fps, but there are other ways to reduce latency.

While there is a latency improvement and some people care about that, the motion clarity is also significantly improved. (again, just drag some windows around on a 120hz monitor) That's true of movies just as well as games. Movies have pulled a lot of tricks to mask this issue over the years, but 60 is quickly becoming the standard over 30. (And once bandwidth and processing improves, it's likely some day decades from now it will jump even higher)

>You replied to the wrong comment, btw. I almost didn’t catch your reply.

Yeah, not sure how that happened.

dahart · on Aug 24, 2023

Alright we’re in a cycle of misunderstanding each other, and rabbit holing on something that is rather tangential to my original point. I acknowledge I should not have used the word “never”. I meant rarely, and I meant for “most” games, not literally never, and not all games.

When you said “No. Please don't spread this nonsense rumor. It hasn't ever been true and still isn't.”, combined with the downvote, I assumed you were referring and objecting to everything I said including diminishing returns (even though I see you acknowledging it next paragraph.)

We are mostly agreeing violently, I acknowledge that there’s no known hard fps threshold above which nobody can see something. I acknowledge that there are benefits above 60fps, even if they grow smaller.

But it’s still true that the primary reason games are going to 500fps is for the latency benefits, not for the smoothness or high flicker rate. A frame rate that high isn’t generally perceptible, while the latency of today’s games - a latency of multiple frames - is actually well inside the known measurable threshold of response times. The problem isn’t generally the need for more frames per second, the big problem is the time between input and the visible change on screen.

The other topic that would be nicer to discuss is the quality trade offs. High frame rate takes away from other options.

Mawr · on Aug 25, 2023

> games are still coming out that only run at ~70 fps on a 3080, etc.

Graphical fidelity in games improves until it maxes out current hardware. This is natural. You can always turn the settings down to get higher FPS, but you can't turn down the high FPS you get in older games into more modern graphics.

> Avg FPS are not going up fast enough in my opinion.

FPS isn't the one and only metric. Audiences generally care more about graphical fidelity. FPS just has to be good enough and stable. Typically that sweet spot is 60fps, though we're slowly moving to 144fps being the standard.

Cloudef · on Aug 24, 2023

Also note that FPS is non-linear value, you want to compare frame times instead.

dahart · on Aug 24, 2023

> It’s just interesting to me that we have 500hz monitors, yet games are still coming out that only run at ~70fps on a 3080, etc.

Of course new games are running at 70 fps. Most games are absolutely not designed for 500hz, and have zero reason to do that. Cs:go wan’t designed for 500fps, I bet the designers of cs:go never intended people to play at 500fps, or even imagined that would ever happen; it wasn’t possible when the game came out.

There’s a good reason to never go above 70fps: human perception experts tend to agree that our visual system gives us quickly diminishing returns above 60fps, we can’t really see things any faster than that, so I’m quite skeptical that 500fps is necessary.

The whole reason 500fps is useful for competitive esports gaming is to reduce latency, not really to keep increasing frame rates forever. Because games are triple-buffering and sometimes monitors are too, and there’s another frame of latency for controller inputs to be recognized, you might still have 10ms or more of latency between controller input and changes on-screen, even if your game is rendering at 500hz. This means you could get away with 100fps if you had zero latency. When your fps is 60hz and there’s 5 frames of latency, you don’t see responses to your controller until almost 100ms later(!).

It’s been about a decade since I was a game dev, but it’s wild to me that anyone would want 500 fps, or that any games would aim for that. This gives you a grand total of 2 milliseconds do to everything, gameplay + animation + physics + audio + rendering. It’s not a lot of time, and you have to compromise your visuals and rendering (by 10x!) in order to achieve that frame rate. Looks like a lot of new gaming monitors are 144hz, and these new 240/360/480hz monitors are pretty extreme and still a bit rare.

> I understand that some games purposefully limit FPS for physics calculations, but there are many that do not.

FWIW, this isn’t the way I’d frame it, it’s kinda misleading. Very few games are purposefully limiting FPS for the sake of slowing it down. All games, however, have a budget. There are always limited compute & render resources, and both game devs and players want the highest quality available for their budget. Until very recently most games aimed for 30fps, and only really fast twitchy games went for a relatively very smooth 60fps. The games that went for 60 had to cut their polygon counts and physics and gameplay in half in order to achieve high frame rate, so you are totally trading away a richer experience in favor of high frame rates. Only certain kinds of games should even try to do that.

Mawr · on Aug 25, 2023

Looks like you were specifically a console game dev? PC games have not aimed for 30fps either ever or at least for the last 2 decades. Consoles are hilariously far behind PC in this aspect.

The whole "science says the human eye cannot see beyond 30fps :)" is actually a very old meme at this point. You are correct that high FPS/Hz is about decreasing latency, but you underestimate the importance of it. 144hz is a very clear improvement over 60hz and the PC ecosystem is going to move over to it as a new standard relatively soon. You don't need to be playing a hyper-competitive FPS to notice it either, any game with significant movement will do.

> Because games are triple-buffering and sometimes monitors are too, and there’s another frame of latency for controller inputs to be recognized, you might still have 10ms or more of latency between controller input and changes on-screen, even if your game is rendering at 500hz. This means you could get away with 100fps if you had zero latency. When your fps is 60hz and there’s 5 frames of latency, you don’t see responses to your controller until almost 100ms later(!).

You're assuming VSync or something. There's no guarantee that the input will line up with the next frame, so an input latency of 10ms while running at 100 fps equals a worst-case latency of roughly 20ms, not 10ms. That's why the higher fps & hz numbers really matter.

dahart · on Aug 25, 2023

Guilty as charged, yes you’re right I was a console game dev. Fair enough, yes PCs are way ahead of consoles and I overstated the perceptual limits of frame rates. I didn’t really mean to suggest that competitive twitchy PC games have no reason to go above 60fps, I meant that there’s a whole swath of other kinds of games that don’t need it (consoles, mobile, puzzle games, etc. etc.). I was reacting more to the idea of 500fps which is really far above 144hz, and the implied suggestion by top comment that maybe even 500 isn’t enough and that everything should be trying go there.

> an input latency of 10ms while running at 100fps equals a worst-case latency of roughly 20ms, not 10ms. That’s why the higher fps & hz numbers really matter.

Yes exactly, I agree and this was the point I was trying and I guess failing to make. Latency is typically multiple frames, so at 100hz, latency can easily be longer than known science-meme perception times. Even with vsync there’s up to one frame to recognize inputs, then another frame to produce new game state & submit the render, then 1 or 2 more for double or triple buffering, then maybe supersampling and/or denoising, then whatever the monitor does, and I might be missing some steps that add latency. So I suspect 500hz has nothing to do with seeing smoother motion compared to 144hz, and everything to do with getting overall control latency down to below, say, 10ms.

Does that really work though? Does CS:GO and do other modern games poll the controller at the display refresh rate? I know it’s pretty common for a lot of games to decouple rendering from game state. That can mean a lot of different things, but one of the implications might be that controller latency is limited to one or two frames of, say, 60hz game state updates, followed by 3-5 frames of 500hz display refresh. Is latency in today’s PC games limited by the game state update, regardless of what the display refresh rate is?

I’m curious where the perceptual limits really are, and what the max framerate we actually need is. Suppose an imaginary world where worst case input-to-screen latency is 2 frames. Then in that case, how high should the FPS be? What if there was zero latency - like if the next rendered frame magically reflected mid-frame controller inputs and magically displayed with no latency - then what should the ideal FPS be? Would there be any benefit to going higher than 144, or would we be wasting electrons?

Mawr · on Aug 26, 2023

> Even with vsync there’s up to one frame to recognize inputs, then another frame to produce new game state & submit the render, then 1 or 2 more for double or triple buffering, then maybe supersampling and/or denoising, then whatever the monitor does, and I might be missing some steps that add latency.

VSync introduces latency, so it always gets disabled, same with double/triple buffering. I don't think gaming monitors do anything to process the image, they're usually advertised with ~1ms response times.

> Does CS:GO and do other modern games poll the controller at the display refresh rate?

Modern gaming mice and possibly keyboards use 1000hz polling.

> I know it’s pretty common for a lot of games to decouple rendering from game state.

I think all modern games do. Some old games have issues running at fps other than 60 due to this coupling.

> I’m curious where the perceptual limits really are, and what the max framerate we actually need is.

I don't know exactly; there are definitely diminishing returns, like you've mentioned:

- 30fps: 33ms / frame

- 60fps: 17ms / frame (2x cost for -16 ms)

- 144fps: 7ms / frame (2.4x cost for -10 ms)

- 240fps: 4ms / frame (1.6x cost for -3 ms)

- 500fps: 2ms / frame (2x cost for -2 ms)

- 1000fps: 1ms / frame (2x cost for -1 ms)

Just from looking at those numbers, 144fps is still a clear win and 240fps probably makes sense too, if you've optimized your setup for latency. Everything beyond is probably pointless, unless it's your job to play competitive FPS games ;)

HWR_14 · on Aug 24, 2023

At low frame rates, the GPU is (usuallly) the bottleneck most of them time. At very high frame rates, there are other more significant bottlenecks.

A much better benchmark would be to take a game designed for 120 or fewer FPS on a 4090 and try it on a 1070.

AnotherGoodName · on Aug 24, 2023

That's because single thread perf of CPUs hasn't progressed and ~500fps is where CPUs cap out on that game still to this day. GPUs are doing fine.

teaearlgraycold · on Aug 24, 2023

For anyone that doesn't understand why this matters, the CPU still needs to prepare the full scene before sending it over to the GPU for polygons to get rasterized and shaders to get calculated. Most of the time the CPU does all of the physics as well. So even if GPU render time goes to 0ms, 2ms spent by the CPU per frame means 500fps is the best you'll get without a better CPU or better code. And I suspect game devs aren't looking to shrink their budgeted CPU time just so someone benchmarking can see 1000fps.

izacus · on Aug 24, 2023

CS:GO also doesn't use any of the new approaches that reduce CPU time in those operations and more effectively saturate the GPU. Compare CS:GO against Doom:Eternal, which runs at high fps while providing massive improvements to visuals.

teaearlgraycold · on Aug 24, 2023

Does Doom: Eternal have multiplayer? How do people feel about it as a competitive shooter?

izacus · on Aug 25, 2023

It has multiplayer, but I admit I never really spent any time with it.

ATMLOTTOBEER · on Aug 24, 2023

Fps in CSGO isn’t really so dependent on the GPU as more modern games, so comparing with a different game might be more accurate.

radium3d · on Aug 24, 2023

I'm curious, do you know of a study that has proven that single core CPU or other is the bottleneck in CS:GO?

AaronFriel · on Aug 24, 2023

This is essentially folk wisdom - but I think you can trust the other people in this thread that it's accurate. I doubt anyone has done a scientific study on this, but you could see this for yourself by setting your computer's clock rate lower and re-running the benchmark.

I expect that if you have a 4090, you have an Intel or AMD CPU that exposes a core clock multiplier. You could run this benchmark with whatever value it's at, then reduce the modifier by say, half. That should halve your CPU's clock rate, and I'm guessing you'll see the frame rate decline similarly. You can conclude that the game is "CPU bound", then.

Even if your GPU was infinitely fast, you have to remember that game developers are not optimizing for the game event loop to run in sub-millisecond times. If the core loop in the game takes under 16ms to run on a common CPU, such as one in a console: that's better than 60hz and the overwhelming majority of video game players will never see a benefit.

Some game developers, I see someone in another thread mentioned Doom Eternal, pride themselves on that optimization. With a fast enough CPU and GPU, you could probably reach 1000 FPS on Doom Eternal. A quick search suggests this might have been accomplished with a liquid cooled, this was done with a liquid nitrogen cooled PC with a 6.6GHz CPU: https://www.pcgamer.com/heres-doom-eternal-running-at-1000-f...

user_7832 · on Aug 24, 2023

Actually kinda yes, look at AMD’s 3d v cache. There’s a very recent review by the verge on an Asia laptop with the AMD chip, and it does hit like 600-700 fps. CPUs can absolutely make such a difference.

Mawr · on Aug 25, 2023

Study, lol. Run the game with uncapped FPS and see your CPU vs GPU usage. Your modern GPU is probably going to be chilling at 30% while your CPU will be pegged in some way (most likely at least one core at 100%).

pjc50 · on Aug 24, 2023

At that high a framerate, it might even be the transfers to the GPU that are the bottleneck.

fragmede · on Aug 24, 2023

GS:go aside, the 1070 is said to do 6.5 TFlops to the 4090's 1,321 TFlops, for a 203x improvement in 6 years. Not bad!

bexsella · on Aug 24, 2023

FPS of one specific game isn't a great indicator of GPU grunt.

devnullbrain · on Aug 24, 2023

Moore's law is about transistors, not CS:GO. CS:GO benchmarks stopped advancing as quickly because Moore's law is - objectively, despite the protestations of semiconductor companies - dead.

Mawr · on Aug 25, 2023

You can't just use an arbitrary game to test your hypothesis like that. You need to test with something that caps out the 4090 at 100% usage and then see how a 1070 performs in comparison. It's not quite that simple of course, so just look up some benchmarks ;)

transcriptase · on Aug 24, 2023

Admittedly knowing nothing, I’m going to assume that a great deal of the advantages of the latest generation aren’t going to improve performance in a decade old game since the engine won’t touch them.

How does a 1070 fare versus a 4090 in Hogwarts Legacy or another modern game at 1440 or 4K?

newZWhoDis · on Aug 24, 2023

I’m more worried about how human development has seemingly stopped in its tracks, backtracking more like it

spoonjim · on Aug 24, 2023

Physics stopped Moore’s law

Mechanical9 · on Aug 24, 2023

My guess is that we've reached a peak in the amount of new investment that can be made annually, based on tech nearing total proliferation throughout society. The tech can sort of only advance as fast as tech companies can grow, and they can't grow exponentially after they account for some percentage of the global workforce.

My guess is that we'll see improvements at closer to the current rate rather than at an increasing rate.

xnx · on Aug 23, 2023

The good new is that Nvidia's high GPU prices motivate everyone (Intel, AMD, ARM, Google, etc.) to try and tackle the problem by making new chips, making more efficient use of current chips, etc. For all the distributed computing efforts that have existed (prime factorization, SETI@Home, Bitcoin, etc.), I'm surprised there isn't some way for gamers to rent out use of their GPU's when idle. It wouldn't be efficient, but at these prices it could still make sense.

Uehreka · on Aug 23, 2023

They’re all pretty motivated, they’ve been motivated for years, and almost nothing is happening. This situation isn’t exactly a poster child for the Efficient Markets Hypothesis.

Every year just sounds like “Nvidia’s new consumer GPUs are adding new features, breaking previous performance ceilings, running games at huge resolutions and framerates. Their datacenter cards are completely sold out because they can spin straw into gold, and Nvidia continues to develop new AI and graphics techniques built on their proprietary CUDA framework (that no one else can implement). Meanwhile AMD has finally sorted out raytracing, and their consumer GPUs are… well not as good as Nvidia’s but they’re a better value if you’re looking for a competitor to one of Nvidia’s 60 or 70 line GPUs!”

lotsofpulp · on Aug 23, 2023

Efficient market hypothesis is unrelated to Nvidia’s competitors being unable to offer a competing product so far.

https://www.investopedia.com/terms/e/efficientmarkethypothes...

> The efficient market hypothesis (EMH), alternatively known as the efficient market theory, is a hypothesis that states that share prices reflect all information and consistent alpha generation is impossible.

ikrenji · on Aug 23, 2023

you know what he meant...

jbc1 · on Aug 23, 2023

I honestly dont. I'm trying to think of what principle they could have meant with "This situation isn’t exactly a poster child for X" where X is any economic principle and coming up empty.

Best I've got is "central planning". One firm being able to handily out perform others despite them also being both motivated and well capitalised lends itself pretty heavily towards markets being good, but I hardly think they were referring to "central planning" when they wrote "efficient market hypothesis".

If it's so obvious to you that you're dropping ellipsis, care to clue us in?

mkaic · on Aug 23, 2023

My impression was that they used "Efficient Market Hypothesis" to mean "the theory that free-market competition rapidly drives down prices and breaks up monopolies on its own".

panick21_ · on Aug 24, 2023

> the theory that free-market competition rapidly drives down prices and breaks up monopolies on its own

You mean a theory that no economic school actually believes? Not even Austrians would sign that.

shsixhehgf · on Aug 24, 2023

You sound very confused. That’s economics 101 just about everywhere. The friction is generally termed “barriers to entry” if that helps reorient you.

Geee · on Aug 24, 2023

Well, patents are certainly a good way to create monopolies, which wouldn't exist in an actually free market with no patents. In addition, there's interventions in place which don't allow certain tech exports to China, for example. Not sure how much better the situation would be without these.

I think the principle is called perfect market or perfect competition, but it's only a theoretical concept. However, it's certainly possible that the market is less perfect than it could be due to interventions and regulations.

mkaic · on Aug 24, 2023

Not endorsing that theory, just offering my best guess at what the person probably meant when they used "Efficient Market Theory" in their comment, based on the context.

sph · on Aug 23, 2023

That theory is still valid, the issue is that the competition can't or won't even try to make a better product or a cheaper one. The rule only applies if there exist competing products in the first place.

If there was any, the prices would go down as we have seen a billion times.

roughly · on Aug 23, 2023

So, why’s that not happening, and what’s it imply for the rest of the theoretical framework?

supazek · on Aug 24, 2023

It’s not happening because 10,000 people who have more intimate knowledge of the business than you or I ever will have made decisions to best suit their current conditions. This isn’t an exception to the rule, you’re just looking at a small timeframe and a remarkably performant company. Why is it so bad for a company to be successful when they have provided so much back to society in the form of R&D? Besides, if I’m doing ML my boss has paid for the card anyway so the price doesn’t concern me.

AndrewKemendo · on Aug 24, 2023

This is the point.

This is a textbook situation that would be perfect for a competitor to come in and undercut. However not only is that not happening, nobody is even trying.

Making the “theory” pretty worthless if it’s not even applicable in cases that would naturally produce this market entrant.

The reality is that private equity does not actually want to compete with large global brands.

andrewjl · on Aug 24, 2023

> This is a textbook situation that would be perfect for a competitor to come in and undercut.

Is it? A competitor can enter the market and undercut by producing a cheaper and otherwise undifferentiated commodity-type product. Nvidia's focus is adding moats that prevent competing on pure specs such as CUDA, design, and so on.

zirgs · on Aug 24, 2023

Nvidia started working on CUDA back in the mid-2000s. Nobody else cared about GPU compute back then.

andrewjl · on Aug 24, 2023

I didn't know CUDA was in development for that long, thanks! I'm curious if there were serious attempts to create something similar over the years and how far they managed to progress. I've never used OpenCL but from looking at it from afar its adoption always seemed limited.

Irrespectively though I don't see how a competitor can replicate the context and tacit knowledge associated building something like CUDA for close to 20 years without putting in a similar amount of time.

zirgs · on Aug 24, 2023

The first version of CUDA was released in 2007 so I would not be surprised if they started working on it in the 90s. These things take time.

Nvidia also made sure that CUDA runs on gaming GPUs and supports Windows. This is why its tools are so good. You don't need to buy a datacenter GPU, no need to mess with Linux. Just buy any gaming GPU, install CUDA SDK and you're good to go.

AMD wasn't like that. Their alternative - ROCM didn't even work on gaming GPUs. Their datacenter GPUs didn't even support Windows. Basically the opposite approach of NVIDIA. Now AMD is rushing to add Windows and consumer GPU support to ROCM, but it's a bit too little too late.

Sohcahtoa82 · on Aug 24, 2023

> This is a textbook situation that would be perfect for a competitor to come in and undercut.

Do you have any idea how much R&D it takes to make a GPU, let alone something that could possibly compete with NVIDIA?

This is not something a dozen people in a large garage are going to do.

> However not only is that not happening, nobody is even trying.

AMD has been trying to compete for years. If you're a gamer that doesn't care about getting the top performance and just wants to optimize price/performance, their offerings aren't bad.

> The reality is that private equity does not actually want to compete with large global brands.

Because the barrier of entry into the GPU market is probably at least a couple billion dollars.

AndrewKemendo · on Aug 25, 2023

> the barrier of entry into the GPU market is probably at least a couple billion dollars.

There are $BN Private Equity deals done everyday, there's plenty of money to fund a competitor.

No, I think people are just scared and don't care about the existence of monopolies.

ElectricalUnion · on Aug 24, 2023

> This is a textbook situation that would be perfect for a competitor to come in and undercut.

They already are undercutting.

Some business that are really into hyper-scaling are already pouring man-hours into making those "undercut" alternative products work. Specifically because sometimes you can't affort Nvidia at that scale. Or they have a large enough scale that they can make it work with suboptimal tech.

It's just that for most business the man-hours required to make "undercut" products work still aren't cheap enough to win in cost-benefit.

redox99 · on Aug 24, 2023

> However not only is that not happening, nobody is even trying.

A lot of companies are trying. It's just that nvidia is really good and significantly ahead.

kortilla · on Aug 24, 2023

No, spell it out because it’s complete nonsense as it is. The efficient market hypothesis has zero relationship to barriers of entry, network effects, or any of the other dozen concepts that maybe are actually at play here.

ericmay · on Aug 23, 2023

> This situation isn’t exactly a poster child for the Efficient Markets Hypothesis.

I'm unsure why you're criticizing the Efficient Markets Hypothesis or even using it here, but you need to also analyze this with some time horizon because the market and marketplaces are not static.

willis936 · on Aug 23, 2023

Their description could be used to describe the situation in 2023, 2022, 2021, 2020, 2019, 2018, 2017, and 2016.

kimixa · on Aug 23, 2023

Tech design and development seems, to me at least, pretty much naturally opposed to the "being kept in check with competition" state - as design isn't really a cost that scales per-unit, the company that sells slightly more can afford to put more into development at the same per-unit margin, which snowballs. At some point, they own the entire market - or enough that they functionally control it, and start leveraging this position. I'd argue we're seeing this from Nvidia right now.

People talk about AMD being competition - but from most stats I've seen, they're ~10% of dGPU sales, with Nvidia being the other 90% (with new Intel offerings being pretty much noise now). That means that if they invest the same proportion into development, NVidia nearly have 10x the resources.

It may be that tech companies like this would "naturally" form a monopoly without outside (IE government) interference, as the only reason that multiplier of development resources doesn't completely crush new entrants is rather extreme mismanagement, or a new segment is created where the design resource don't really cross over that much.

I don't see anything like that happening in the short term, if anything there seems to be more opportunity for cross pollination of development within these corporations, as there's a fair bit of design similarities between various silicon (GPUs, CPUs, accelerators for the current ML techniques etc.) that may encourage more consolidation in the whole semi market to take advantage of that, not less. But again the only thing stepping in the way of that seems to be governments trying to keep national interests, like the blocking of NVidia buying ARM to pull in one of the big CPU players. Plus all their other IP that they may benefit from, like low power GPU designs or other accelerators ARM have designed.

tm-guimaraes · on Aug 24, 2023

At the end of the day, what nvidia does is just producing IP. All manufacture is by other companies.

This means that nvidia capital is spend on testing/development infrastructure and creative labor.

So, you don’t need monopoly breaking to handle nvidia if it keeps growing, but instead rethink IP laws.

Testing infrastructure is less capital investment than production manufacture. (See ASML and TSMC beeing booked), and humans can be persuaded to work elsewhere.

This means Nvidia cannot fall a sleep, even if they keep snowballing. In a few years a rival can always arrive, or current ones snag key people or have dev/test infra breakthrough, or IP law could change as its critics rise every year.

Sure, if Nvidia keeps its good game it will keep on growing and get even bigger share, but if it doesn’t, it will happen what happened with intel, intel got greedy on its position, and the company got fat.

As long as nvidia keeps its game, it’ll be good for customers even if they swallow more market share, as prices are always limited by the value business customers get from AI, and the investment needed for a competitor is not even close to infrastructure or resource extraction stuff like high end chip fabs, oil extraction, energy grids, telecom grids.

Again, Nvidia does not produce any physical goods, only designs.

kimixa · on Aug 24, 2023

I think Intel having in-house manufacturing was one of the big causes of them getting "Too fat and slow to respond" - from the outside much of their fall from grace was due to massively delayed production improvements rather than designs and IP. As far as I can see, the architectures were pretty much done and ready to go, just the expected targeted process missed it's mark.

With Nvidia and other GPU competitors being IP-focused, effectively outsourcing all this "manufacturing stuff" (to the same 3rd party much of the time), that's one less thing for them to keep up with, and one less think that'll hurt if they do start "falling asleep". I can't see this happening to Nvidia in quite the same way right now. My point was that not having manufacturing makes advantages of consolidation larger, not smaller.

I wonder what would have happened if Intel realized it's manufacturing wasn't hitting targets and "quickly" added TSMC as an option, would AMD even have had a chance with ryzen? There was clearly a time when AMD had superior manufacturing processes through them, if Intel's designs of the time were on the same process would they have managed to grab the headlines?

And no, I Strongly disagree that NVidia running unchecked over the entire market being "Good for consumers", and not sure if the capital expenditure of getting over this moat is really much smaller than things like resource acquisition or infrastructure, they have $billions in current software ecosystems and hardware designs. Those $billions probably could buy you a fair bit of infrastructure investment on the scale you mentioned. Look how much Intel is burning right now just to get a toe into the market and not laughed out the door - and they're still clearly behind their competitors right now. Their chips aren't anywhere near competitive from a performance-per-area point of view, and their software is rather poor for the vast majority of use cases.

creer · on Aug 24, 2023

It's not like designing this kind of product is easy; or that Nvidia's designers are sitting idle; or that everybody else's design team is not busy building something else. There are in fact many competent design teams, chipping at their own business.

There are in fact startups, also doing what they can (and probably not trying to go head on against the most productive competitor they can find.) And it has been reported countless times that some of the biggest customers of Nvidia are actually trying to design their own.

If you want to point out a market with broken competition, this isn't it.

fulafel · on Aug 24, 2023

To play the free market advocate:

The situation is created by artificial restrictions on free market (namely state enforced monopolies on "IPR", or as some call it, imaginary property).

mort96 · on Aug 28, 2023

Aren't AMD's competing against nvidia's 80 series GPUs these days?

sph · on Aug 23, 2023

Are they motivated? Seems like a massive coincidence how the big two of the GPU world are cousins, and one has been having massive success on the CPU, the other on GPU/AI, and every attempt from both side to enter the other's niche has been pretty weak.

AMD compute is nowhere compared to NVIDIA. NVIDIA wanted to buy ARM, has got its finger in RISC-V, but apart from that, they don't really care. To be fair AMD has done decent with GPUs, but never enough to dethrone NVIDIA, whose playbook for the past few gens is "just make everything bigger than last gen and increase the frequency." Surely AMD could have chosen the same lazy approach to surpass the 4090 only just, but instead they didn't, so it's still NV undefeated in its space because AMD forgot to squeeze the last 1% out of their card.

The market is powerless if the competitors aren't really competing. Intel is the only chance, unless they manage to get their own Taiwanese CEO somehow related to Huang and Su.

AlotOfReading · on Aug 24, 2023

Leaving aside the weird conspiracy stuff, I don't know how you can see stuff like the nvlink C2C that makes Grace Hopper possible and think they went "just make everything bigger".

tzhenghao · on Aug 23, 2023

> motivate everyone (Intel, AMD, ARM, Google, etc.) to try and tackle the problem by making new chips

Yes, there has been repeated efforts to chip at Nvidia's market share, but there's also a graveyard full of AI accelerator companies that fail to find product market fit due to lack of software toolchain support - and that applies even for older Nvidia GPUs and their compatible toolchains, let alone other players like AMD. This isn't a hit on Nvidia, I'm just saying things move so quickly in the space that even the only-game-in-town is trying to catch up.

Nvidia is also leading by being one or two hardware cycles ahead of their competition. I'm pretty confident AI workloads in enterprise is their next major focus [1]. I think this more than anything else will accelerate AI adoption in enterprise if well executed.

To your point, I think the industry needs to focus more on the toolchains that sit right between the deep learning frameworks (PyTorch, Tensorflow etc.) and hardware vendors (Nvidia, AMD, Intel, ARM, Google TPU etc.) Deep learning compilers will dictate if we allow all AI workloads run on just Nvidia or several other chips.

[1] - https://www.nvidia.com/en-us/data-center/solutions/confident...

tric · on Aug 23, 2023

> I'm surprised there isn't some way for gamers to rent out use of their GPU's when idle.

https://rendernetwork.com/

"The Render Network® Provides Near Unlimited Decentralized GPU Computing Power For Next Generation 3D Content Creation."

"Render Network's system can be broken down into 2 main roles: Creators and Node Operators. Here's a handy guide to figure out where you might fit in on the Render Network:

Maybe you're a hardware enthusiast with GPUs to spare, or maybe you're a cryptocurrency guru with a passing interest in VFX. If you've got GPUs that are sitting idle at any time, you're a potential Node Operator who can use that GPU downtime to earn RNDR."

euazOn · on Aug 23, 2023

Also the Horde for Stable Diffusion, pretty good concept: https://github.com/Haidra-Org/AI-Horde/blob/main/FAQ.md

Conscat · on Aug 23, 2023

I am certain that several years ago, I was given an ad for exactly such a service and even tried it out, but I cannot for the life of me remember its name. It had some cute salad motif, and its users are named "chefs".

EDIT: It was just named Salad. https://salad.com/ https://salad.com/download

NavinF · on Aug 23, 2023

You can do that for inference, but most gamers have a single GPU with <24GB VRAM which kinda sucks for training. 3090 or 4090 is the minimum to use reasonable batch sizes

Goronmon · on Aug 23, 2023

The good new is that Nvidia's high GPU prices motivate everyone (Intel, AMD, ARM, Google, etc.) to try and tackle the problem by making new chips...

Or their dominance leads to competition throwing in the towel and investing resources in a market with less stiff competition.

I wouldn't be surprised to see AMD start to pair back ivnestment on high-end GPUs if things continue down this path. I would say Intel likely keeps pushing, but I'm less convinced they can actually make much headway in the near future.

xnx · on Aug 23, 2023

As was mentioned in another thread on a slightly different topic, it wouldn't be surprising to see all non-Nvidia parties unit around some non-CUDA open standard.

josemanuel · on Aug 23, 2023

Do you mean something like OpenCL?

xnx · on Aug 23, 2023

Exactly. More resources might get applied to improving it.

josemanuel · on Aug 24, 2023

I must admit my previous comment was mildly sarcastic. What I was after is: OpenCL is that language/framework that is consortium driven and open. It’s been there since the start of time. Still it cannot dethrone CUDA… Nvidia struck gold with CUDA and its lock-in

andromeduck · on Aug 24, 2023

The main problem with OpenCL is most people eoue rather work with C++.

haldujai · on Aug 23, 2023

> I would say Intel likely keeps pushing, but I'm less convinced they can actually make much headway in the near future.

It seems that Intel is making great headway on their fabs and may somehow pull off 5 nodes in 4 years. Intel 3 is entering high volume production soon and according to Gelsinger 20A is 6 months ahead of schedule and planned for H2 2024.

If they do pull this off and regain leadership that would change outlook.

WanderPanda · on Aug 23, 2023

With interconnect being the biggest limitation these days I don’t think this would work.

xnx · on Aug 23, 2023

I'm not familiar with all the varied uses of GPUs but it seems like image generation could feasibly be distributed: large upfront download of models, then small inputs of text and settings, and small output of resulting images.

WanderPanda · on Aug 23, 2023

For inference I agree! But training requires centralized gradient steps

vajrabum · on Aug 23, 2023

If you're in a data center and running large training jobs then RDMA over Nvidia Mellanox Infiniband cards over high speed ethernet (like 100GB) are used to ship coefficients around without having that transfer bottleneck in the CPU.

xxpor · on Aug 23, 2023

100 gig, that's considered cute nowadays.

https://aws.amazon.com/blogs/aws/new-amazon-ec2-p5-instances...

3.2 terabits.

dekhn · on Aug 24, 2023

I don't think that machine has a single nic with that bandwidth- I'd guess it's 8 400Gbps cards or something similar.

asciimike · on Aug 24, 2023

Correct, it's 8x 400G cards, one per GPU.

ElectricalUnion · on Aug 24, 2023

> I'm surprised there isn't some way for gamers to rent out use of their GPU's when idle.

The main reason why you need massive ammounts of fast VRAM in the first place is that the main limitation of AI is memory bandwidth. Can't simply distribute an algorithm that is already throughput limited by memory bandwidth and distribute it with awful latency and bandwidth and hope for any improvement.

myth_drannon · on Aug 23, 2023

vast.ai allows you to rent out gpu

EVa5I7bHFq9mnYK · on Aug 24, 2023

In bitcoin mining, GPU phase lasted only two years, before been outcompeted first by specialized FPGAs and then by ASICs. Nobody used GPUs for bitcoin mining since 2013. Maybe ML will follow similar path. But the computation is much different from ML, doesnt need memory at all.

PeterStuer · on Aug 24, 2023

Aren't tensorcores basically asics for ml?

EVa5I7bHFq9mnYK · on Aug 28, 2023

if they have transistors for gaming, this is extra unneeded cost from the matrix multiplication point of view. If they have not, those are not GPUs anymore.

magic_hamster · on Aug 24, 2023

The only way to compete with Nvidia is to supply a drop in replacement for CUDA with the same (or better) performance for price.

Good luck with that.

In the current state of things, Nvidia is like a car manufacturer that exclusively owns the concept of tires.

peter303 · on Aug 24, 2023

The larger language models now employ a trillion parameters. This is faster when memory and computing is tighter, not distributed. Cerebus's million core super-wafer addresses this.

IshKebab · on Aug 23, 2023

There have been various attempts but you need a workload that's basically public and also runs on a single GPU (because you don't have NVLink or similar).

TechnicolorByte · on Aug 23, 2023

Incredible company. It’s absolutely insane how far ahead they are with the investments they made over a decade ago.

So nice to see a “hard” engineering (from silicon to software) SV-founded company getting all this recognition. Especially after what has felt like a decade of SV hype software companies dominating the mainstream financial markets pre-pandemic with a spate of overpriced IPOs or large ad-revenue generating mega corporations.

kccqzy · on Aug 23, 2023

The moniker of "hard" engineering is neither precise nor useful. What makes engineering hard? Is solving problems with distributed systems, even if these systems are for ads, hard? Or do you mean hardware? In that case even Nvidia is not hard enough since they don't fabricate their own chips. Or do you mean designing hardware? Then what makes writing system verilog at a desk hard but writing Python not hard?

TechnicolorByte · on Aug 23, 2023

I admit that was a glib comment and unnecessary.

I’m really speaking about Nvidia’s ability to perform well in both hardware and software, at chip-scale and datacenter-scale. Also speaking of their product/business direction that revolutionizes multiple industries (leaders in graphics with ray tracing and AI frame/resolution sacking; leaders in AI infra and datacenter systems, etc.) all resulting in big impacts to their respective industries.

You’re right that many of those software-only companies do very real engineering with distributed systems and such. I should’ve been more precise and was really complaining about the SV hype of the 2010s focusing on regulating-breaking companies like Airbnb, Uber, wework, etc. and on companies like Meta and Google who focus on pushing ads for their revenue.

omniglottal · on Aug 23, 2023

I suppose the difference is engineering something deterministic (i.e., physics, electronics, logic) versus something soft and indistinct (SEO, ad impressions, customer conversion rate).

nsteel · on Aug 24, 2023

It's hard to get complex systems correct. There's far less margin for error when you get a hardware design wrong. Correcting a Python software mistake is orders of magnitude easier and cheaper to resolve, it doesn't cost multiple billions and take 6 months to iterate. You might consider the hardware design harder in that respect.

systemvoltage · on Aug 24, 2023

Yeah. NVidia was a docile looking company and in 2012, they were merely a gaming oriented hardware shop.

These companies exist today. Which small or ignored companies do you think have a bright future?

epolanski · on Aug 23, 2023

Are they so far ahead?

AMD GPUs get comparable results as of late on Stable Diffusion.

Software and hardware from competitors will catch up, crunching 4/8/16 bit width numbers is no rocket science.

david-gpu · on Aug 23, 2023

> Software and hardware from competitors will catch up, crunching 4/8/16 bit width numbers is no rocket science.

I used to think like that, until I got a job there and... Oh, boy! I left five years later still amazed at all the ever more mind bending ways you can multiply two damn matrices. It was the most tedious yet also most intellectually challenging work I've ever done. My coworkers there were also the brightest group of engineers I've ever met.

kortilla · on Aug 24, 2023

> was the most tedious yet also most intellectually challenging work I've ever done

What does this mean? Tedious is pretty much the opposite of “intellectually challenging” when I think of careers.

em500 · on Aug 24, 2023

Trying to beat a good chess engine at 2500 Elo is probably both tedious and intellectually challenging.