Hacker News new | past | comments | ask | show | jobs | submit login

The R1 is an SPU (Sensor Processing Unit) which works in concert with the M2's integrated GPU (the R1 being the innie, and the M2 being the outie). It is doing an enormous amount of image signal processing, which seems to explain the 256GB/s memory bandwidth.

https://techcrunch.com/2023/06/05/apple-r1-chip-apple-vision...: "The specialized chip was designed specifically for the challenging task of real-time sensor processing, taking the input from 12 cameras, five sensors (including a lidar sensor!) and six microphones. The company claims it can process the sensor data within 12 milliseconds — eight times faster than the blink of an eye — and says this will dramatically reduce the motion sickness plaguing many other AR/VR systems."




I think this is the best answer. The R1 is directly connected to the device's sensors and can write out state to the shared main memory. Processing the sensor state in 12ms gives 4ms to consume the world state and draw a video frame and hit a solid 60fps.

Any lag from the position of your head/body and the eye display is going to mess with your proprioception. The worse that lag the more likely you are to get motion sickness.


60hz is great while looking at a screen but may not be enough for keeping a good illusion going while you move around space. hololens has a similar setup that renders triangles at whatever rate it can but updates the position on the display at 240hz (that is, even if the next frame of an animation is lagging, its position in space will be adjusted as you move)

imo the illusion was rock solid, extremely challenging given that the display was transparent so you had to keep up with the real backdrop moving - vision pro and all passthrough devices get to fake it but at the cost of proprioception as you said


The HoloLens was amazing and underrated among the wider internet. Like a lot of Microsoft stuff, it's regarded as a joke until Apple copies it, then suddenly everyone takes it seriously.


Nope, still a joke. Like many MS products individual pieces of tech may be incredible but the overall experience is severely lacking. So while you might say "this one aspect of the display technology is incredible!" everyone who's actually used it will reply, "yeah and if it didn't feel like looking through a postage stamp then it would have been great!"


> everyone who's actually used it

I have four of them at work. I have thousands of hours on them. They’re amazing.


Apple's claim is "photon to photon" for the 12ms timeline, iirc. Which jives better with the actual frame rate of the headset.


Those 12ms affect the latency, not the framerate. The thing will definitely not render at just 60Hz as that's too low for VR, the standard is usually 90Hz or 120Hz.


If you divide 1 second by 60, you get 16ms. So to hit 60 Hz, you need to complete all processing and draw the frame within 16ms. For 120Hz, like you're claiming, all processing needs to be completed in half the time, or 8ms. And yet, Apple says the R1 takes 12 ms to process input from the sensors? You can draw your own conclusions.


You forget that the processing doesn't have to finish within the same frame. Latency is not throughput.

Not even the most expensive high-end gaming setups can finish the entire input-to-screen processing within just one frame, and yet they can easily render some games at 500Hz or more.


Nothing about end-to-end latency of the R1 tells you anything about how pipelined it might be. It very well could have multiple frames in-flight at the same time in different stages of processing.


To provide a comfortable experience the frame pipeline can't be very deep. The older the frame state compared to the wearer's current proprioception the more likely they are to experience discomfort or outright motion sickness.

That's why I assume the R1 is trying to provide the render pipeline with "current" positional state and then tries to finish the drawing in the remaining 4ms (for 60fps) then the display is only going to lag the wearer's perception by 16ms which is less likely to cause discomfort.

This could be mitigated more if the objects in the VR scene are tagged with motion vectors. If the R1 state update doesn't land in time the renderer can interpolate the "current" position by applying those motion vectors to objects in the scene.


> and can write out state to the shared main memory

I think it would actually make more sense for the M1 to treat the R1 as a display that it writes final composited frames to, then the R1 integrates the output from the M1 into the rest of the scene it’s rendering from the other sensors. IE, the output of the M1 is essentially another camera input to the R1 (well, camera plus multi-channel audio).


Just to back this point up, the single biggest challenge to headset makers is power management.

Every time you move a byte one stage away from the sensor you (general) incur an order of magnitude of power cost. There are many many caveats, and its not a hard and fast rule, but its a useful illustration. Don't get up about precise numbers, just understand that moving data between devices, even inside a processor has a power cost.

e.g. reading out a pixel from a camera costs 1, moving it to a UART/interface costs 10, reading it into a register costs 100, l1 1000 etc etc etc.

The closer you can do the processing to the sensor the greater the (potential) power saving, and in theory the lower the latency.

For example the eye cameras you only really want the direction the eyes are looking. So you don't want to ship the entire image at 120fps to a processor, do some maths to then get that vector. Ideally you want the sensor to do it for you and just ship four floats every 1/120th of a second.

I'm also not sure where the main processor is for the goggles, so data compression and coordination of the sensors also becomes critical. The images sensors almost certainly don't run at a high enough framerate to be "magical" so you need IMU to generate a fudge factor. They need to run at a known sample rate, one that doesn't deviate. Using the CPU to do that is bad, because that'll require a lot of interrupts, leaving little room for other stuff that's important (like other sensors)

now, I've hinted that the "SPU" does processing, which is almost certainly does. This means that it might actually look like a GPU after all, as its doing a lot of calculations that are very "graphics-y" All that computer vision: SLAM for head orientation, HAnd tracking for, well, hands, Image stitching/warping and correction to make the pass through work, plus the audio shit, it all very computer vision. So its highly likely that it looks a bit like a GPU, because its doing similar tasks


You seem knowledgeable about this stuff — do you work in this space? I’m trying to do some feature extraction from multiple global shutters right now on limited hardware and its a learning experience!


You can probably "do miracles" if you optimize for fp8/fp16


12 ms is 83 Hz, one of them sounds better than the other on the marketing copy I guess


Saying that the input latency is 12ms does not mean that it can only update once every 12ms. It just says that there's a 12ms delay from what happens in the real world to what your eyes see through the device.


That's not how this works. Nothing in the vision pro is "83Hz".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: