The quality of image (depth and rgb ) is staggering compared to everything else I've used (zed, realsense, kinect v2, and a couple of others).
They can be chained together for near microsecond timing accuracy.
The mic array is awesome.
The IMU is awesome.
But right now I've set them all back in their boxes.
Why? Because without being a vision specialist, there's nothing I can do with these devices.
The SDK and sample code is so incredibly bare bones, it is almost laughable.
There's no way to make use of those mics for anything. Its literally not in the SDK.
There's no way to make use of multiple devices in any practical manner. No point cloud merging, no calibration or shared space alignment.
Then there's the problem that buries deep in the SDK is a binary blob that is the depth engine. No source, no docs, just a black box.
Also, these cameras require a BIG gpu. Nothing is seemingly happening onboard. And you're at best limited to 2 kinects per usb3 controller.
All that said, I'm still a very happy early adopter and will continue checking in every month or two to see if they've filled in enough critical gaps for me to build on top of.
If any devs in Seattle want to collaborate (or know computer vision well enough to fill in some of these gaps for the OSS community) let me know :)
> The SDK and sample code is so incredibly bare bones, it is almost laughable.
This is why I held off ordering a few. I took an hour looking through the docs and concluded the overall offering isn't fully baked yet. The hardware looks incredible, but the software looks anemic.
One of the key features for me is the hardware chaining to make realtime PC merging easier to resolve. All of the scenarios I can about are realtime rather than post processing reconstruction.
It's just a matter of following the money. A bigger and bigger slice of Microsoft's income is coming from Azure, and an increasing proportion of Azure users are running Linux. This gives them strong incentive to be the business leaders in open source software.
The other increasing slice is services, where they get you to buy stuff from their storefront, or subscribe to their game pass, or office 365, or make personal interactions with Bing and Windows (so they can sell targeted ads). This naturally gives Microsoft the incentive to know everything about you.
Following the money is exactly why I am skeptical. As Linux becomes a revenue stream for Microsoft it will also become a Microsoft product. From the company that was built on Embrace, Extend, Extinguish I think that is cause for concern.
This is really cool! Back in college I used the Kinect dev kit to build proof-of-concept special effects for live theatre as an independent study project. I used the Kinect pointed at the face of an actor off stage, ran the resulting 3D data points to some cool algorithms, and then projected the result onto a screen on stage. The idea was that, for example, in Hamlet instead of having an actor in make up play the ghost of King Hamlet, you could have this larger-than-life projection on stage.
This camera is way better quality, so it'l be neat to see the sort of projects can be done now.
How does it compare to iPhone sensors used for Face Id? I’m wondering if mounted to a workstation it could be used to implement a Face id system under linux.
I was just checking specs on the two devices and it seems that the new Kinect depth even higher resolution, so it looks like a go in terms of raw input data.
So my next question would be why would it be significantly harder than regular facial recognition approaches as found in say OpenCV, naively one would think more data makes it easier not harder,
neglecting hardware requirements/performance, but just from accuracy perspective from a trivial refactor of current facial identification algorithms.
I’m not talking about identification of people moving or far away but straight looking at from a fairly close distance.
From what I understand, this is nice if you are trying to use multiple devices together. The Kinect's API makes that very easy with Azure (read: multiple cameras scanning a single location in real time, via a robot).
If you just need the same sensors for depth, but significantly cheaper, then look at occipital.
Thinking about using vision technology for small scale aquaculture stock management. Sorting the fishes by weight while eliminating the scale. Probably a conventional camera might suffice with a little bit of ML training. Length does not correlate well with weight in the fish species we are using, but a 2D picture of the fish body might do it. A 3D picture with the new kinect will do the job for sure.
Classic Kinect sensors are used extensively for gesture detection in the niche world of digital interactive exhibitions. You see them a lot in things like science museums or popup events or digital art gallery kinda places.
>It’s meant to give developers a platform to experiment with AI tools and plug into Azure’s ecosystem of machine learning services (though using Azure is not mandatory).
Agreed. Doubt sdk will be OS/Linux friendly either. Previous Linux kinect drivers (libfreenect) were all community driven, it worked, but not as polished as the official stuff.
The quality of image (depth and rgb ) is staggering compared to everything else I've used (zed, realsense, kinect v2, and a couple of others).
They can be chained together for near microsecond timing accuracy.
The mic array is awesome. The IMU is awesome.
But right now I've set them all back in their boxes.
Why? Because without being a vision specialist, there's nothing I can do with these devices.
The SDK and sample code is so incredibly bare bones, it is almost laughable.
There's no way to make use of those mics for anything. Its literally not in the SDK.
There's no way to make use of multiple devices in any practical manner. No point cloud merging, no calibration or shared space alignment.
Then there's the problem that buries deep in the SDK is a binary blob that is the depth engine. No source, no docs, just a black box.
Also, these cameras require a BIG gpu. Nothing is seemingly happening onboard. And you're at best limited to 2 kinects per usb3 controller.
All that said, I'm still a very happy early adopter and will continue checking in every month or two to see if they've filled in enough critical gaps for me to build on top of.
If any devs in Seattle want to collaborate (or know computer vision well enough to fill in some of these gaps for the OSS community) let me know :)