The author persistently conflates AR and MR throughout the posting, evaluating the Quest Pro as an AR device but the concluding at the end it's a terrible "MR" device because it failed at their AR tests. It seems to suggest they really think these terms are synonymous but they aren't. While AR aims to enable display and interaction with the real world but with augmentations, MR aims to bring virtual objects into the real world and allow you to interact with those. You really don't need such a high fidelity view of the real world to do that. The virtual objects are rendered with very high fidelity.
I am fairly confused by the low res cameras used in the QPro. I have to assume Meta were just heavily constrained by the need for them to do double duty as tracking cameras and pass through cameras, so higher quality cameras just weren't available. But why only a single color pass through camera? This really doesn't make much sense other than if you allow that the processor on the Quest Pro is so limited that even if it had both feeds it probably couldn't process them.
I think it's actually great news that there is so much optimization left to do. Seems like they should be able to improve it a lot over time, and in the meantime it's pretty adequate to enable exploration of some unique experiences.
As someone close to this in the industry, I definitely agree.
I also really don't know why he decided to deemphasize the perspective and depth correctness so much. He mentions it here:
> In this case, they were willing to sacrifice image quality to try to make the position of things in the real world agree with where virtual objects appear. To some degree, they have accomplished this goal. But the image quality and level of distortion, particularly of “close things,” which includes the user’s hands, is so bad that it seems like a pyrrhic victory.
I don't think this is even close to capturing how important depth and perspective correct passthrough is.
Reprojecting the passthrough image onto a 3D representation of the world mesh to reconstruct a perspective-correct view is the difference between a novelty that quickly gives people headaches and something that people can actually wear and look through for an extended period of time.
Varjo, as a counterexample, uses incredibly high-resolution cameras for their passthrough. The image quality is excellent, text is readable, contrast is good, etc. However, they make no effort to reproject their passthrough in terms of depth reconstruction. The result is a passthrough image that is very sharp, but is instantly, painfully, nauseatingly uncomfortable when walking around or looking at closeup objects alongside a distant background.
The importance of depth-correct passthrough reprojection (essentially, spacewarp using the depth info of the scene reconstruction mesh) absolutely cannot be understated and is a make or break for general adoption of any MR device. Karl is doing the industry a disservice with this article.
I’ll admit, I have completely cooked breakfast (eggs and toast, rummaging, flipping and all) with the Quest Pro passthrough, while watching a video on a giant virtual screen floating in my kitchen. The depth accuracy is very impressive.
yes I think the success of the reprojection is a much under appreciated achievement. The author repeatedly questions their claim of "seamless" but its one of the most stunning aspects that you can, for example, see your real arm outside the headset exactly map onto your virtual arm and hand inside it, as do other objects that bridge from external reality to the virtual reality view. This "seamless" nature of it is really impressive when you consider that the whole view is actually being completely reprojected and not just displayed natively from the camera feeds.
Yeah, I really don't think people understand how critical the reprojection is. A lot of folks, apparently Karl Guttag included, think that the resolution of the passthrough is the most important factor for usefulness. And/or that the resolution of the cameras is even directly related to the quality of the passthrough to begin with! The pixel count is just one of the dozens of quality metrics, and is arguably actually one of the less important ones.
I also can't tell if KGuttag believes that you can simply pass the video feed through to the user and not have an absolute usability disaster. His paragraph about the "pyrrhic victory" sort of implies that not preferring depth-correct scene reprojection was an option. In my experience, that is absolutely non-negotiable in terms of shipping a device that's usable to the majority of consumers.
Another reason why folks also don't understand why passthrough is so energy intensive. You're also running an entire additional pass of steresocopic spacewarp on what is essentially four different camera feeds, on an incredibly high resolution scene reconstruction/SLAM mesh.
In a regular Quest 2 (not the pro as discussed here) you can use pass-through and actually reach out and grab a glass of water and drink it. It's super impressive and actually makes the pass-through useful for the things that you use pass-through for.
Would it be nice to be able to actually read text and not have distortion when objects get too close? Yes, of course. But it's not necessary.
We are actively using passthrough in VRWorkout (to be aware of your surroundings while jumping and knee high running) and I have to say that even the low quality Quest 2 passthrough feels magical for that [1]. If you do look at 0:20 you can see that it really shines when the virtual hands interact with real objects
Exactly. Reading text or having 20/20 vision is so, so, so not the most important part of passthrough! Comfort, accuracy when reaching out and grabbing things, walking/running, avoiding obstacles on the ground, pets/humans, etc. Non-depth-correct passthrough risks all of these things.
Yeah I've been really enjoying the passthough as the homescreen background feature they added, it also works on WebXR pages too now which has been fun for the small experiments/toys I like to make on my downtime
Just so I understand correctly - the fact that the cameras are places a couple of inches in front of where you're actual eyeballs are causes a lot of discomfort? And so you need to warp the camera feed to look like it should from your eyeball's perspective?
I'm not sure if it's entirely due to the camera position or if there are other factors (e.g. differences between the camera optics and your eye's) but I can confirm that using a device without reprojection results in quite extreme distortion for objects closer than an arm's length.
There's also the distance between the eyes not matching that of the cameras. And the fact that the 3d scene that isn't from the cameras does not have your eyes in the same position.
Personally, I actively reject this distinction between AR & MR. For as long as I've been aware of the concept, the term "Augmented Reality" covered both use cases, including virtual objects embedded into the real world. The distinction never struck me as interesting or useful; virtual objects embedded into the real world being the most useful way to interact with augmentations to the real world.
Moreover, "Mixed Reality" strikes me as marketing-speak. Compare the search trends for AR [1] vs MR [2]. While both terms have been in use for a long time, AR was always more prevalent, with a notable rise in interest around 2009. My first encounter with the term MR was in marketing materials for Microsoft's HoloLens circa 2015. Now compare the search trends for MR [2] with HoloLens [3]. Searches for MR don't begin to trend upward until after the introduction of HoloLens, where Microsoft's marketing materials tried to make a big deal of said distinction and the rest of the world started to run with it. As a supporting argument for the original broader usage of the term AR, I'll cite discussions of a 2007 anime series, Dennō Coil [4], which explored various examples of fantastical AR usage. Probably just a coincidence, but the search trends for Dennō Coil [4] do precede the search trends for AR [1] rather closely.
Naturally, all the usual caveats about word meanings changing over time apply, but this one is a pet peeve of mine.
I feel the same way. I worked at Microsoft on HoloLens and most the folks I knew there felt like Mixed Reality was modern marketing spin on AR. It makes sense from a marketing perspective since HoloLens was a very different beast than something like Google Glass, but I find the distinction needlessly confusing nowadays. To me, AR is just augmenting reality on a spectrum from 2D fixed overlay to depth-aware and beyond. I find naming arbitrary detents on the spectrum unproductive. The general public has enough confusion around VR vs AR, especially when pass-through enters the picture. To throw MR in the mix furthers the confusion.
I don't begrudge people getting grumpy about terms evolving through gradual misuse over time. It's certainly annoying. What I would say though is that if you have specific subsets of functionality that demarcate clearly through various dimensions - how hard they are to do, the type of technology needed, the typical use cases and real world value they create - then you really need some way to distinguish them if we are going to talk about it.
Alternatively, if you are going to just stick to one hyper general term then you can't really complain that product X is bad at Y because it doesn't do every single thing the general term Y encompasses. Language is fluid for a reason - it adapts to our needs to define and distinguish things. In this case, the Quest Pro does a subset of things quite well, while completely leaving other aspects of AR on the table. Whatever we refer to that subset as, it's not really valid to then say it's generically bad just because we don't have good words to describe what it does do.
Mixed reality as a term dates at least back to 1994 - https://en.wikipedia.org/wiki/Reality%E2%80%93virtuality_con.... Microsoft adopted it because they were doing surface mapping and occlusion culling, something that early AR (circa 2008-2012ish) really couldn't do.
Yes, the term "Mixed Reality" had some academic use going back that far, but I'd interpret that article and its citations quite differently.
These early MR references use the term to mean a spectrum from "Augmented Reality" to "Augmented Virtuality". The AV term never really caught on, and the Wikipedia link for AV [1] actually just redirects to their page for MR [2]. I can't think of many notable examples of AV at the moment, though something like "Mario Kart Live: Home_Circuit" [3], as mentioned on the Wikipedia page for MR [2], might fit the bill, in the sense that reality is being used to augment the game. Uses of haptic feedback in VR might also count as AV.
This usage of MR has nothing to do with how Microsoft began using the term. It's almost as if they embraced an academic term for the sake of marketing, extended the meaning, and successfully extinguished the original academic definition.
AR will never be able to mix depth to objects with the real world due to the physics of image planes. MR could do that, so it might be one key difference.
You're mixing up the term "AR" with specific implementations of it.
People are now using "MR" to mean something distinct from AR but GP and myself are arguing that the term AR was broad enough to encompass passthrough, overlay and any future developments.
If someone built a VR headset that used small projectors rather than flat panel displays, we wouldn't need to invent a new term for it. The AR <> VR spectrum was all we needed and MR is just muddle on top of that.
Full lightfield AR displays would have no issue with this. Even some hypothetical giga-sandwich of waveguide planes like a Magic Leap 1 on steroids could do an effective approximation of this.
(Assuming by "mix depth to objects with the real world" you mean "having virtual objects alongside real objects with convincing depth alignment")
> While AR aims to enable display and interaction with the real world but with augmentations, MR aims to bring virtual objects into the real world and allow you to interact with those.
Would you mind giving user-facing examples to illustrate the difference? As in something you consider AR but not MR and vice-versa.
I've googled a bit but I don't understand the distinction between "enable display and interaction with the real world but with [virtual] augmentations" and "bring virtual objects into the real world and allow you to interact with them". As someone with little AR/VR/MR experience these sound like two ways of phrasing the same thing. Thanks!
Consider something like Gravity Sketch [0] as MR. You are designing a virtual object and it is useful to bring it into the real world to (a) collaborate on it with others in a shared workspace and (b) envision it within its real context. For example, you can design a chair and actually place it in a living room while you are working on it to see how it looks.
An AR example would be the classic overlay of text on a sign you are looking at with a version that is translated into your language.
Not working in the field, I’m also having a hard time to grasp the difference. Is one MR because it’s 3D and the other AR because of 2D ? And is the collaboration part a defining feature ?
Where would you set Pokemon Go’s “AR view” [0] kind of implementations for instance ? (First link I found was for a headset, but I’m also actually curious how you qualify the phone only version of it)
Or what would it change if in you Gravity Sketch example each user would be seeing colors or rendering adapted to their vision/preferences making it a separate experience for each of them ?
I think for example: AR is a graph/dashboard overlay on real world objects, processing them and commenting on them but not forgetting the result is on a 2d screen. While MR is like a cube that falls on the table you see through the screen and will act if you punch it with you hand (all as seen by the camera) and doesn't care your experience of the MR world is bound to a 2D screen, it is virtually exist in any space around you.
This blog post is written by Karl Guttag who is someone who is interested in AR, but not very interested in VR. The reason he is coming at it from an AR angle is that is what kind of device he is looking for. If you look at the rest of his blog you can see many articles related to AR.
That’s fine, but it’s okay to call out the author’s bias. They contributed wonderful tests of the limitations of the headset. But the headset is only a failure for the authors use case. (AR)
Having used Hololens vs several passthrough-AR devices (Quest 2 etc) I'd take "poor image quality" over "limited FOV and the need to use it in a darkened room" any day.
Until AR display tech gets significantly better I think AR based on passthrough cameras is the only sensible approach.
You are correct that MR today implies that the Virtual image is locked to the real world with some form of SLAM.
I was a bit sloppy in that regards. I was more worried about what was going on. But when combined with the word "passthrough" to see the real world, I tend to drop back/slip to the term AR passthrough which is what this type of thing was called for years. MR, as I remember it, is a more recent term used to distinguish different types of AR. Then things flipped and XR (=AR/VR/MR) was used to mean was AR used to mean.
My guess is that late in the program, the importance of passthrough was elevated, perhaps in response to Apple rumors, but they were stuck with the hardware that was in the pipeline. The passthrough available is a big improvement over the Quest 2 for use in seeing the grouse-level surroundings, but not up to regular use in full Mixed Reality.
There are probably decades of "optimizations" left to do.
I also think there is a lot to be said for a VR headset with even mediocre pass through. I find VR very engaging and I think it can be very useful -- you can have the screen space of many monitors if the headset has good enough resolution (not good enough on a quest), plus there is a ton you can do with modeling and visualization.
But it is a big turn off to have no visual idea of what is going on around you. Just to be able to press a button and be able to see that there is someone standing around you, or you are too close to a couch, is a big win for a VR headset in my opinion. Obviously the visual fidelity will improve in the display and the pass through. But just starting to get some pass through is very nice.
I can't help but view the (myriad) odd choices in the Quest Pro in the light of Carmack's resignation letter. It really does have the camel feeling of a horse designed by committee, instead of anything filled with 'give a damn'.
I hope the folks there can figure out the operational difficulties, and find a way past whatever org chart drama is the source of the dysfunction. The Quest 2 has been a delight, but enough time has passed that the limitations are starting to chafe. I would love to see a true Quest 3 made my people who actually like the things.
It's definitely a compromise device. My gut feeling is various factors delayed its delivery and they ultimately were left with the choice of, deliver it now as it is or potentially lose the whole business market due to being overrun by Apple, HTC or others in that arena. Nearly all the interesting tech in the Quest Pro will be outdated within a year - it would be impossible for them to ship it if they waited even another 6 months. So I think they said "ship it" and honestly I think they were right - as people say, ultimately "shipping is everything" [0]. Witness Apple struggling year after year to get their AR/VR product out. It's actually really hard, and if you wait for all your problems to be solved you will never ship anything. The product may be full of compromises but the process of shipping it will put Meta miles ahead of those that still haven't got anything out the door yet.
I mean, historically, "it isn't very good, but at least it exists" has _not_ been a particularly effective approach to consumer product launches, not if the competition is waiting a bit longer to bring out something good.
The problem is that we dont exactly need ar/vr: we can put screens around for a fraction of the cost, and video games are nice to sit down: we can run after each other while giggling in the forest if we want some movement fun and dont exactly need anything overlayed.
I can understand they try to see far to anticipate a future where they'd make a lot of money being useful, but there's such low latency and richness of perception with normal atomic real world interactions that I'm not sure we can or want to replace it with something that simply adds a paid intermediary.
It s like adding a blockchain to transact cash for instance.
> While AR aims to enable display and interaction with the real world but with augmentations, MR aims to bring virtual objects into the real world and allow you to interact with those.
That's literally the same thing in terms of hardware. If you want to bring virtual objects in the real world, and not have weird z-sorting issues all over the place, you need to have an understanding of the real world, which in turn gives you the possibility to do augmentation.
Beside, MR is just a confusingly overloaded term these days. It can refer to:
* Microsoft's Mixed Reality brand of VR headsets (that completely lack any AR/MR features)
* filming people playing VR games in third person and mixing the game footage into it with LIV or similar software
* doing AR with VR headsets and pass-through cameras, instead of see-through optics (e.g. Lynx R1)
> But why only a single color pass through camera?
Pico4 makes the same mistake. I really don't get it. Lenovo Mirage Solo back in 2018 used two front facing cameras at a proper IPD, and while the image is still quite low resolution, having actual 3D with zero distortion made the thing feel like wearing actual glasses, not like looking at a weird camera feed. It's a ginormous quality improvement for very little extra effort. Using pass-through on that thing is still the only time I could literally forget I was wearing a headset.
I am fairly confused by the low res cameras used in the QPro. I have to assume Meta were just heavily constrained by the need for them to do double duty as tracking cameras and pass through cameras, so higher quality cameras just weren't available. But why only a single color pass through camera? This really doesn't make much sense other than if you allow that the processor on the Quest Pro is so limited that even if it had both feeds it probably couldn't process them.
I think it's actually great news that there is so much optimization left to do. Seems like they should be able to improve it a lot over time, and in the meantime it's pretty adequate to enable exploration of some unique experiences.
edit: VR => MR