Ask HN: Is Apple's R1 a Discrete GPU?

CharlesW · 2024-01-22T20:39:20.000000Z

The R1 is an SPU (Sensor Processing Unit) which works in concert with the M2's integrated GPU (the R1 being the innie, and the M2 being the outie). It is doing an enormous amount of image signal processing, which seems to explain the 256GB/s memory bandwidth.

https://techcrunch.com/2023/06/05/apple-r1-chip-apple-vision...: "The specialized chip was designed specifically for the challenging task of real-time sensor processing, taking the input from 12 cameras, five sensors (including a lidar sensor!) and six microphones. The company claims it can process the sensor data within 12 milliseconds — eight times faster than the blink of an eye — and says this will dramatically reduce the motion sickness plaguing many other AR/VR systems."

giantrobot · 2024-01-22T23:44:12.000000Z

I think this is the best answer. The R1 is directly connected to the device's sensors and can write out state to the shared main memory. Processing the sensor state in 12ms gives 4ms to consume the world state and draw a video frame and hit a solid 60fps.

Any lag from the position of your head/body and the eye display is going to mess with your proprioception. The worse that lag the more likely you are to get motion sickness.

jazzyjackson · 2024-01-23T09:30:42.000000Z

60hz is great while looking at a screen but may not be enough for keeping a good illusion going while you move around space. hololens has a similar setup that renders triangles at whatever rate it can but updates the position on the display at 240hz (that is, even if the next frame of an animation is lagging, its position in space will be adjusted as you move)

imo the illusion was rock solid, extremely challenging given that the display was transparent so you had to keep up with the real backdrop moving - vision pro and all passthrough devices get to fake it but at the cost of proprioception as you said

naikrovek · 2024-01-23T12:51:37.000000Z

The HoloLens was amazing and underrated among the wider internet. Like a lot of Microsoft stuff, it's regarded as a joke until Apple copies it, then suddenly everyone takes it seriously.

iandanforth · 2024-01-23T13:35:56.000000Z

Nope, still a joke. Like many MS products individual pieces of tech may be incredible but the overall experience is severely lacking. So while you might say "this one aspect of the display technology is incredible!" everyone who's actually used it will reply, "yeah and if it didn't feel like looking through a postage stamp then it would have been great!"

naikrovek · 2024-01-27T06:48:26.000000Z

> everyone who's actually used it

I have four of them at work. I have thousands of hours on them. They’re amazing.

halJordan · 2024-01-23T01:36:29.000000Z

Apple's claim is "photon to photon" for the 12ms timeline, iirc. Which jives better with the actual frame rate of the headset.

alpaca128 · 2024-01-23T11:20:35.000000Z

Those 12ms affect the latency, not the framerate. The thing will definitely not render at just 60Hz as that's too low for VR, the standard is usually 90Hz or 120Hz.

nindalf · 2024-01-23T13:23:56.000000Z

If you divide 1 second by 60, you get 16ms. So to hit 60 Hz, you need to complete all processing and draw the frame within 16ms. For 120Hz, like you're claiming, all processing needs to be completed in half the time, or 8ms. And yet, Apple says the R1 takes 12 ms to process input from the sensors? You can draw your own conclusions.

alpaca128 · 2024-01-23T20:21:08.000000Z

You forget that the processing doesn't have to finish within the same frame. Latency is not throughput.

Not even the most expensive high-end gaming setups can finish the entire input-to-screen processing within just one frame, and yet they can easily render some games at 500Hz or more.

jffry · 2024-01-23T13:56:24.000000Z

Nothing about end-to-end latency of the R1 tells you anything about how pipelined it might be. It very well could have multiple frames in-flight at the same time in different stages of processing.

giantrobot · 2024-01-24T20:30:31.000000Z

To provide a comfortable experience the frame pipeline can't be very deep. The older the frame state compared to the wearer's current proprioception the more likely they are to experience discomfort or outright motion sickness.

That's why I assume the R1 is trying to provide the render pipeline with "current" positional state and then tries to finish the drawing in the remaining 4ms (for 60fps) then the display is only going to lag the wearer's perception by 16ms which is less likely to cause discomfort.

This could be mitigated more if the objects in the VR scene are tagged with motion vectors. If the R1 state update doesn't land in time the renderer can interpolate the "current" position by applying those motion vectors to objects in the scene.

toasterlovin · 2024-01-23T04:19:40.000000Z

> and can write out state to the shared main memory

I think it would actually make more sense for the M1 to treat the R1 as a display that it writes final composited frames to, then the R1 integrates the output from the M1 into the rest of the scene it’s rendering from the other sensors. IE, the output of the M1 is essentially another camera input to the R1 (well, camera plus multi-channel audio).

KaiserPro · 2024-01-23T13:46:03.000000Z

Just to back this point up, the single biggest challenge to headset makers is power management.

Every time you move a byte one stage away from the sensor you (general) incur an order of magnitude of power cost. There are many many caveats, and its not a hard and fast rule, but its a useful illustration. Don't get up about precise numbers, just understand that moving data between devices, even inside a processor has a power cost.

e.g. reading out a pixel from a camera costs 1, moving it to a UART/interface costs 10, reading it into a register costs 100, l1 1000 etc etc etc.

The closer you can do the processing to the sensor the greater the (potential) power saving, and in theory the lower the latency.

For example the eye cameras you only really want the direction the eyes are looking. So you don't want to ship the entire image at 120fps to a processor, do some maths to then get that vector. Ideally you want the sensor to do it for you and just ship four floats every 1/120th of a second.

I'm also not sure where the main processor is for the goggles, so data compression and coordination of the sensors also becomes critical. The images sensors almost certainly don't run at a high enough framerate to be "magical" so you need IMU to generate a fudge factor. They need to run at a known sample rate, one that doesn't deviate. Using the CPU to do that is bad, because that'll require a lot of interrupts, leaving little room for other stuff that's important (like other sensors)

now, I've hinted that the "SPU" does processing, which is almost certainly does. This means that it might actually look like a GPU after all, as its doing a lot of calculations that are very "graphics-y" All that computer vision: SLAM for head orientation, HAnd tracking for, well, hands, Image stitching/warping and correction to make the pass through work, plus the audio shit, it all very computer vision. So its highly likely that it looks a bit like a GPU, because its doing similar tasks

kylixz · 2024-01-25T07:50:01.000000Z

You seem knowledgeable about this stuff — do you work in this space? I’m trying to do some feature extraction from multiple global shutters right now on limited hardware and its a learning experience!

raverbashing · 2024-01-23T11:50:15.000000Z

You can probably "do miracles" if you optimize for fp8/fp16

baq · 2024-01-23T07:37:25.000000Z

12 ms is 83 Hz, one of them sounds better than the other on the marketing copy I guess

jhgg · 2024-01-23T09:11:17.000000Z

Saying that the input latency is 12ms does not mean that it can only update once every 12ms. It just says that there's a 12ms delay from what happens in the real world to what your eyes see through the device.

mort96 · 2024-01-23T12:50:37.000000Z

That's not how this works. Nothing in the vision pro is "83Hz".

nicolas_17 · 2024-01-22T22:21:50.000000Z

Maybe instead of speculating based on Apple's marketing and based on articles from other people speculating, we can make some guesses from actual data.

Here's an obviously-incomplete list of source files part of the R1 firmware, probably referenced from asserts or other logging messages, thus present as strings in the firmware binary:

https://transfer.archivete.am/inline/Ydfxb/bora.txt

It seems it's handling data from cameras (CImageSensor*), LIDAR (SensorMgr/Tof = time of flight?), and display (DCP). I also see mentions of accel, gyro, bmi284 (IMU from Bosch?).

phire · 2024-01-23T12:37:00.000000Z

Interesting.

The H13 codename suggests it's actually M1 derived.

And no sign of the GPU, which goes by the name AGX, seems to be more IPU than anything else.

rmorey · 2024-01-22T19:50:48.000000Z

I'm very interested in it as well, I think we can mostly just speculate. I don't think it's a GPU, I think it's an independent SOC running some sort of real-time OS (https://en.wikipedia.org/wiki/Real-time_operating_system) I suppose it would necessarily have GPU cores in it to do image processing, but I don't think it itself is just a GPU, or gives any indication about discrete GPU's coming to the Mac. I think this is a very Vision-specific piece of silicon.

Edit: many are saying that is just an Image Signal Processor (https://en.wikipedia.org/wiki/Image_processor). I don't think that's quite the case because 1) The M-series chips are already known to have ISP's packaged into them. and 2) My understanding is that the R1's job is to provide continuity of passthrough even in the event of a kernel panic by the M-series chip. To my thinking, this means the R1 chip must have a level of independence beyond that of a traditional coprocessor. I think it is an entire SOC.

nicolas_17 · 2024-02-03T18:27:15.000000Z

Here we go, kernel panic in the M-series chip: https://twitter.com/0xjprx/status/1753575170101461266

toasterlovin · 2024-01-22T20:40:51.000000Z

Alright, here's an idea for you: Vision Pro is shipping with the previous generation CPU. What if it was meant to ship earlier, but Apple's discrete GPU efforts got delayed? That could also explain why the M2 Mac Pro shipped without a discrete GPU, which is a glaring omission for that product.

rmorey · 2024-01-22T21:16:47.000000Z

I don't think you should hold your breath for discrete GPUs coming back any time soon: https://www.owc.com/blog/why-the-m2-ultra-mac-pro-doesnt-sup...

toasterlovin · 2024-01-23T04:52:27.000000Z

I know. But on the other hand:

1) There are plenty of pro users they can’t serve without discrete GPUs, since they’ll never have the power or transistor budget in an integrated offering that would allow them to compete with AMD and Nvidia.

2) Why do a Mac Pro that doesn’t support discrete GPUs when the Mac Studio exists?

3) They’re doing hardware stuff in the M3 GPU that indicates a fairly serious GPU effort.

4) If they’re willing to put this much effort into a co-processor for a low-volume experimental product, putting similar effort into a co-processor (discrete GPU) for high volume sure-thing products (Pro laptops and desktops) seems at possible.

aurareturn · 2024-01-23T08:55:00.000000Z

Put a discrete GPU and most of the magic of unified memory goes out the window. The M3 Ultra should be able to use up to 256GB of high bandwidth RAM. That blows away any discrete GPU. Yes, it doesn't have the processing power of a dedicated professional Nvidia card but I don't think Apple is trying to compete there. Apple is trying to fill the high VRAM niche. They might have a real advantage there considering LLMs require a ton of memory.

If Apple ever makes an "M3 Extreme" by gluing 4x Max dies together, they could have up to 512GB of VRAM. You'd need to 7x H100 GPUs to match total RAM size which will cost you $210k.

rmorey · 2024-01-23T18:51:55.000000Z

I heard a really sad report (via anonymous apple employee on the podcast ATP.fm, I'll try to find the episode) that they have totally canned the "Extreme" i.e 4xMax project entirely, for at least the next several revision cycles.

rmorey · 2024-01-23T18:48:49.000000Z

You're arguing that they could and should bring back support for discrete GPU's, which I don't disagree with at all. What I'm saying is, I just don't think they are going to, in the near future.

bhouston · 2024-01-22T19:52:53.000000Z

I don't think it is a GPU, rather it is probably very similar to a DSP/Neural Network chip. Lots of matrix multiplication capabilities and streaming data.

https://en.wikipedia.org/wiki/Digital_signal_processor

(Of which https://en.wikipedia.org/wiki/Image_processor are a subset of, which someone else in this forum mentions.)

DSPs are similar to GPUs in many respects but they are also similar to CPUs in many respects, but it doesn't make them the same.

toasterlovin · 2024-01-22T20:13:42.000000Z

A few points:

1. GPUs are good at matrix multiplication!

2. GPUs are programmable, so you can do software updates, which seems like it would be useful if they're also using the R1 to do object recognition and other machine vision tasks.

3. GPUs would be useful in other parts of their product lineup.

snom380 · 2024-01-22T21:39:07.000000Z

Lots of processor types are programmable, not just GPUs. Apple is well known for making purpose built chips for their hardware. Main point is that they reuse the cores across different chips. A chip with 12 camera interfaces doesn’t sound like a general purpose GPU to me.

toasterlovin · 2024-01-23T05:21:44.000000Z

Just did some reading on image signal processors and then noticed that Apple calls out that the A series chips do ISP on-die rather than via a dedicated co-processor, so it seems like you’re right and a significant part of this chip is just to be a DSP of raw sensor input.

I guess the unknown is what exactly happens on this chip vs the M2. Clearly some computer vision is being done to do object/people recognition and to separate out objects in order to place the content in 3D space. And then there’s stuff like deciding when to allow the external environment to “break through” immersive content. And of course placing the content in the scene.

It seems like those would make the most sense on the R1, since it avoids contention with “userland” processes. But I don’t know if those tasks are more GPU tasks or DSP tasks. If more GPU, then I guess this thing is a SoC with a lot of die space dedicated to DSP.

_kulang · 2024-01-22T23:31:57.000000Z

I thought it was fairly clear from Apple’s comments that it is essentially single function hardware that does all of the processing for AR/VR. I mean, they say that it is all done inside 12ms. There’s no way that it is just a GPGPU. Modern GPGPUs, particularly the NVIDIA stack, have a bunch of single function units internally which do things like optical flow on the image buffers etc. but similarity sort of stops there

You can do a lot with not much if it is all specialised hardware. Some of the wider features of the chip are due to the huge data bandwidth. But general purpose processing is a little slow for this task, certainly at this power envelope

cududa · 2024-01-22T19:52:04.000000Z

“ Apple shows it as being roughly the same size as the M2 processor in the Vision Pro marketing[3], indicating it's a peer to the M2”

Errr, that’s not how this works. The size of them being similar means absolutely nothing.

The R1 is a coprocessor, I mean it’s basically a big signal processor with a decent amount of RAM to handle all the camera inputs, and does some GPU like stuff, but isn’t a GPU

toasterlovin · 2024-01-22T20:26:31.000000Z

The R1 has a lot of memory bandwidth. According to this article at Tom's Hardware[0], the M2 has 68 GB/s of memory bandwidth. So the R1 has several times that at 256GB/s. Coincidentally (or not), that is similar to the discrete Nvidia and AMD GPUs they compare the M2 against in that article.

I know discrete GPU comparable memory bandwidth doesn't necessarily mean it's discrete GPU or a peer to the M2, but there's clearly a lot of something going on. Plus isn't it kinda weird to specifically mention a spec that is applicable to programmable devices like CPUs or GPUs on a signal processor? Like, who cares?

0: https://www.tomshardware.com/news/apple-m2-gpu-analysis

snom380 · 2024-01-22T21:30:08.000000Z

Apple likes mentioning whatever is a competitive advantage. M2 has 100GB/s, M2 Pro 200GB/s. They put in the bandwidth they think is needed for the purpose. Likewise with the 12 camera interfaces. It’s a safe bet that there’s some sort of GPU on board but there’s zero indication that this is Apples first discrete GPU. If anything you would expect such a chip to appear first in a desktop product. The R1 is likely programmable as well.

coder543 · 2024-01-22T21:15:41.000000Z

M1 was 68GB/s (and the article says this too).

M2 was 100GB/s.

toasterlovin · 2024-01-22T21:35:45.000000Z

You're right, my mistake.

rowanG077 · 2024-01-22T19:43:33.000000Z

I doubt it's a discrete GPU. It doesn't make sense to split a GPU and CPU in such a tiny device. Especially with the gains you can get with completely shared memory. I think it's much more likely to be a SoC with dsp tiles for the sensors and a huge GPU block and standard apple p and e cores.

wtallis · 2024-01-22T21:59:11.000000Z

A 1 Gigabit DRAM chip sounds far smaller than anything that's been used with GPUs in a very long time. That's a capacity from 15 years ago.

phire · 2024-01-23T11:57:25.000000Z

The capacity might be small, but with 256GB/s of bandwidth, that's right up there at modern GPU levels, roughly comparable to a RTX 4060.

And that's apparently from single chip, the RTX 4060 needs 6 chips get equivalent bandwidth.

Though for that exact reason, I'm really not sure I trust the "single chip" part of the rumour. That's a lot of bandwidth per chip and it seems more likely to have four memory chips, which would result in 2 GB of capacity.

wtallis · 2024-01-23T17:19:26.000000Z

Having GPU-like bandwidth doesn't make a chip a GPU.

> it seems more likely to have four memory chips, which would result in 2 GB of capacity.

Four times 1 Gigabit is 4 Gigabit, which equals half a Gigabyte: 512 MB. Which was less than the high-end for a desktop GPU 15 years ago. I'm still not seeing any way that this memory configuration is remotely plausible for a GPU today even though it is comparable on one metric.

utopcell · 2024-01-23T11:46:48.000000Z

they probably don't use a single chip.

wtallis · 2024-01-23T15:29:15.000000Z

That doesn't alleviate the capacity issue. Even if it's a 1Gbit chip with an 8-bit interface, then a 256-bit bus would only add up to 4 GB of DRAM at most. The ratio of capacity to bandwidth is just not anywhere close to what makes sense for ordinary GPU usage.

utopcell · 2024-01-23T18:35:10.000000Z

This doesn't seem to be an ordinary GPU however. It needs to process input from all the sensors and produce some low-bw output (positioning, object detection from lidar etc). They probably have enough RAM to buffer/post-process whatever they need from the sensors.

wtallis · 2024-01-23T19:57:52.000000Z

Right, I'm using the memory chip as strong evidence to argue that the R1 cannot be an ordinary GPU.

utopcell · 2024-01-23T20:02:20.000000Z

Makes sense.

phire · 2024-01-23T12:16:24.000000Z

There is no way Apple designed the R1 from scratch, it will contain a bunch of components from their other SoC designs. Obviously it's going to have quite a few more IPUs than a standard SoC, which only have to handle ~4 cameras. It also clearly has the display scan-out components.

And it's entirely possible that one of the components Apple took from their grap-bag of SoC component is the GPU compute cores. Properly not the rasterizer or texture samplers, but I could see the compute cores being useful for running tracking algorithms. The massive amount of memory bandwidth does kind of suggest GPU compute cores.

But even if it does, that doesn't make it a discrete GPU. Just a dedicated SoC with some GPU components, and a bunch of other things.

But it simply can't be used as in indicator of discrete GPUs in Macs, because all those other SoC components would be a waste of silicon and it's unlikely Apple would reuse the R1 die as a dedicated GPU.

toasterlovin · 2024-01-23T21:46:53.000000Z

> But even if it does, that doesn't make it a discrete GPU. Just a dedicated SoC with some GPU components, and a bunch of other things.

Yeah, this is where I'm at after reading all the comments on this thread.

agnokapathetic · 2024-01-22T19:50:55.000000Z

it’s a discrete Image Processing Unit

https://en.wikipedia.org/wiki/Image_processor

rmorey · 2024-01-22T19:55:06.000000Z

I don't think this is correct. The M-series chips themselves have ISP's in them.

choilive · 2024-01-22T19:52:50.000000Z

No, likely a dedicated DSP chip to handle the aforementioned 12 cameras + 5 sensors + 6 mics.

polyomino · 2024-01-22T19:55:47.000000Z

It's basically Apple's version of Microsoft's Holographic Processing Unit

amelius · 2024-01-23T11:46:32.000000Z

It's proprietary magical undocumented locked-down tech that I don't want anything to do with, regardless of how well it performs.