There are visual tracking techniques that are not only more stable and robust that meanshift, but can also handle occlusions, adapt to changes in appearance models, etc. Have a look at the VOT challenge[1] for the state-of-art in visual tracking.
Thanks for the link, I'm aware of some of the methods cited. This first PoC was just to check how fast and how good Meanshift would work with that image quality. I'll try to implement some more robust approach, specially some method that could handle occlusions.
Christian, please also be aware that some of the techniques do not incorporate changes in scale of the tracked object. If you are working with aerial footage then this might be an important consideration.
I've recently worked on a similar problem, so you need any help, please feel free to email me (email address is given in my profile)
Thanks for the feedback ! I'm planning to use more robust and scale invariant features for this so this problem of different levels of zoom would be solved (at least to the limits of what can be done).
Out of interest, what kind of hardware specs does the box doing the image processing have (CPU speed, #cores, and memory in particular). Do you expect to be limited by the compute capabilities?
You're correct, this is a post-processing operation, however I took care to do something that works in real time, so you can use this same algorithm for any FPV transmission, which is the next step that I'm planning to implement (but with more robust features).
This isn't exactly "real time" since he loads an MP4 file (`cv2.VideoCapture('upabove.mp4')`). I think it would be a lot more impressive if the drone was able to stream the film to a server of some kind, process it there then make decisions on it's flight path.
Few days ago, there was a post here on HN on an add-on board that lets a raspberry pi control a copter[1]. I think your idea would be a very good use case for it, because it's capable of running opencv, can control camera position, and avoids downlink latency since all processing is happening on board.
You're correct, however I took care to make the algorithm work in realtime, so you can use it in any FPV transmission. I'm going to prepare a good FPV system to run this with video transmission, but since the algorithm is very fast, it can be executed by an Raspberry Pi or a Beagleboard for instance (embedded in the drone), to avoid loss of quality of the transmission and issues like that.
That's what I was thinking too. Also, isn't this sort of thing pretty common now? As in, aren't there startups that have drones that follow you to take cool video of you [1]. I'd imagine they are doing some sort of realtime tracking.
None of the "follow me" drones folks are really doing object tracking yet. They basically just "fake it" by sending a bunch of guided waypoints with the offset of the phone or bracelet GPS location in airdogs case and hope it looks OK because of the really wide angle lens of the GoPro. There is a lot of room for improvement in this area that could be made by feeding that realtime tracking data back into the flight controller as you suggest.
The one startup in the drone space that is really putting computer vision stuff to work on drones is http://skyd.io - some ex google project wing folks. The demos are freaking amazing. Wait long enough on the website background video and you can see a six foot wingspan UAV flying itself through a parking garage.
Source: I live an breath drone stuff and write a UAV ground station app for iOS to go along with our Apple MFi certified hardware for the long range telemetry connection from iOS. It works with OPs flight controller by the way :-)
OP's video doesn't really have anything to do with drones as such, just loads an mp4 of drone footage. The fact that it's drone footage is completely incidental.
But lots of people are doing real computer vision and object tracking on drones and robots, and have been doing for years.
I know. I was specifically addressing the "follow me" class of drones mentioned that are getting a lot of press these days such as airdog, hexoplus, 3dr IRIS+ (all APM:Copter based) and no one outside university or military settings that I know of have really integrated that data back into the flight controller as well as the skydio demo suggests.
Sorry, it gets confusing since APM:Copter autopilot (I guess that's what they are calling it now) can run on PX4 hardware running NuttX too. So they are using the full PX4 flight stack as well?
Yep, although as I understand it they've worked so hard on it since the Kickstarter I'm not sure how much is still recognisable as the core ETH code. At the very least their software is rooted firmly in the PX4/Lorenz design ethos. One of the guys behind the project is also the author of some very nice PX4 and MAVLink tools which you may well have used: https://github.com/DrTon
[we have a nightmare at work trying to distinguish between original PX4FMU running PX4, Pixhawks running PX4, and Pixhawks running APM codebase in conversations]
Ah, I have seen DrTon's commits on the project. He has been rather prolific in the last several months. I'd love to connect with you outside of HN and hear what you are working on. Shoot me an email (in profile) if you see this.
Are there many drones out there carrying enough processing power to do serious machine vision (SLAM, obstacle avoidance) in real time from the drone's onboard sensors?
I'm sure there is... It's a problem begging to be "solved". The majority of juice on a drone is taken up by the propellers, not the board running it. Though, to be fair, at that point "weight" is a bigger concern than electricity usage.
I don't know the exact hardware of these drones, but I managed to make SLAM work (with near-realtime dense mapping) on an iPhone 4S and up, so I think it's feasible.
Agree the headline makes it sounds like it was done live. In fact I don't see why the drone bit is relevant here, any shaky footage would have done.
Some more challenges involved with using live footage to control a drone (using a companion computer and talk to APM using mavlink to control the drone realtime)
This works better than I expected, but I'm wondering how well it will work if the angle changes (flying around the building) or other parameters change. But looks impressive for how little code it is!
If the angle changes, the details of the building's facade changes as well. You can see in the code that he isolates the building's details from the first frame. Then compares later images to it to do the "tracking".
[1] http://www.votchallenge.net/vot2014/download/vot_2014_paper....