Hacker News new | past | comments | ask | show | jobs | submit login

That is doable right now. I am only sort of being glib here. I just want to outline what's involved in what people think is a simple task in a spatial system.

TLDR: As of right now, your wildest dreams are pretty possible. In the next 2 years, how we compute is going to get strange. Nothing is nearly as simple in an XR environment as a traditional computing one and currently most people don't really want what they'd ask for. Building a simple (useful) XR app makes launching a web product look like assembling a lego set vs. landing on the moon.

---------

Imagenet[0] works just fine for the object identification. You would probably want to use RGBd sensors like the kinect[1] or Intel Realsense[2] instead of regular cameras, but tracking like what the Vive[3] uses could also work. The thing you just proposed would involve a network of server processes handling the spatial data and feeding extracted relevant contextual information to a wireless headset at a pretty crazy rate. Just to give you an idea a SLAM[4] point cloud from a stereo set of cameras or a cloud from a Kinect2 or Realsese produces a stream of data that is about 200mb a second. Google Draco[5] can compress that and help you stream that data at 1/8 the size without any tuning.

Extracting skeletal information from that is really something that only Microsoft has reliably managed to deliver and it's at the core of the Kinect/Hololens products. NuiTrack[6] is the next best thing, but registering a human involves a T-Pose and gets tricky. Definitely you could roll something specific to the application, maybe just put a fiduciary marker[7] or two on a person and extrapolate skeleton from knowing where it is on their shirt. You will also want to be streaming back the RGBd, IMU, hand and skeletal tracking from the headset back to the server. This could help inform and reduce the tracking requirements from the surrounding sensors.

Out of the box, you'd probably need a base i7, 64Gb+ of ram, and a couple GTX 1080s to power 4 sensors in one room. The task of syncing the cameras and orienting[8] them would be something you'd have to solve independently. After having all of that, you would have an amazing lab to reduce the problem further and maybe require less bandwidth, but very probably to get where you're going you'd need to scale that up by 2x for dev headroom and maybe run some sort of cluster operations[9] for management of your GPU processes and pipeline. Keeping everything applicable in memory for transport+processing would be desirable so you'd want to look at something like Apache Arrow[10]. At this point you are on the edge of what is possible at even the best labs at Google, Microsoft, or Apple. The arrow people will gladly welcome you as a contributor! Hope you like data science and vector math, because that's where you live now.

After getting all of this orchestrated, you now have to stream an appropriate networked "game" environment[11] to your application client on the hololens, but congrats! You made a baby step! Battery life is still an issue, but Disney Research has demonstrated full room wireless power[12].

Now all you have to do is figure out all the UI/UX, gesture control, text entry/speech recognition, art assets, textures, models, particle effects, post processing pipelines, spatial audio systems[13], internal APIs, cloud service APIs, and application build pipeline. The Unity asset store has a ton of that stuff, so you don't have to get in the weeds making it yourself but you will probably have to do a big lift on getting your XR/Holodeck cluster processing pipeline to produce the things you want as instantiated game objects.

Once that's done, you literally have a reality warping metaverse overlay platform to help people find their car keys.

What's crazy is that you can probably have all of it for under $15,000 in gear. Getting it to work right is where the prizes are currently living and they are huge prizes.

[0] https://en.wikipedia.org/wiki/ImageNet

[1] https://azure.microsoft.com/en-us/services/kinect-dk/

[2] https://github.com/IntelRealSense/librealsense

[3] https://www.vive.com/us/vive-tracker/

[4] http://webdiis.unizar.es/~jcivera/papers/pire_etal_iros15.pd...

[5] https://github.com/google/draco

[6] https://nuitrack.com/

[7] https://www.youtube.com/watch?v=JzlsvFN_5HI (markers are on the boxes not the robot)

[8] https://realsense.intel.com/wp-content/uploads/sites/63/Mult...

[9] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus...

[10] https://arrow.apache.org/

[11] https://docs.unity3d.com/Manual/UNetClientServer.html

[12] https://www.youtube.com/watch?v=gn7T599QaN8

[13] https://en.wikipedia.org/wiki/Ambisonics




This sounds like something a warehouse could afford to buy but for most people organizing their stuff properly is going to be the winner. (even at high salaries like $300k yearly 15k is still 100+ hours of your time)


Yeah, it's not a consumer product yet. Did you miss that it's a lab to build it and to make the most "basic" solution you have to have a functional holodeck for the price of a used sedan? That's pretty bonkers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: