Speaking of CV + birds, I knew the guys behind a small company called Ornicept.
Basically, wind turbines kill birds. To protect birds, you have to collect data on birds in the area. The standard way of doing this was literally putting a guy in a lawn chair for a few hours each day counting birds, and then extrapolate the numbers.
They put some cameras in the field and run CV on the footage to count the birds instead. Cameras are cheaper than people in lawn chairs, apparently. I don't know if they still do this, but it was their initial product.
Nice. They use deep convolutional neural nets, which is the key algorithm that dominated the ILSVRC 2014. This computer vision contest included, among other challenges, recognizing different bird (and dog and cat and spider and.... ) breeds. There is a blog post about ILSVRC that even uses the same xkcd comic ;) http://blog.a9t9.com/2014/10/amazing-progress-of-computer-vi...
I'm seeing a lot of pictures of flowers and bees in the bird pictures. I'm guessing this is due to the way the training is done, anyone from Flickr who can comment on that?
Also, the exact question you're trying to answer makes a difference. If the question is, "Is this a picture of a bird?" where "of a bird" means that the primary focus of the picture is a single bird, then your picture should return "No". That's different than, "Are there one or more birds in this picture?"
I'm not sure which question Flickr's team actually seeks to answer here.
It's cool that a team of deep learning researchers can pull this off quickly. Anyone know of an "image2vec" (word2vec for images) that would empower others to try out similar things? (unfortunately it would need a better name, because "vectorize" means something different for images.)
I think you'd need something slightly more complex than a "word2vec" since images already have a well defined "word vector" i.e. a pixel. What you want is a "parser" that can take in an image and spit out the significant parts of it? Stanford might have the code up from this paper ( http://machinelearning.wustl.edu/mlpapers/paper_files/ICML20...) up on their site.
Do you just want the neural net, or the tags too? This might not be what you want, but it looks like it could be fun to play with: http://clarifai.com/
What would be very cool would be some kind of trigger app where you leave your phone in a tree for an hour, and it takes pictures of all the birds that come within range.
Detecting whether a given image has a bird in it, while certainly difficult and "interesting" from a CS point of view, is not very interesting from a user perspective? (Photographers can always tag their images when they submit them).
The advantage of doing it from a user perspective is that people in practice don't tag their images, but others want to find e.g. CC licensed pictures of blue jays in flight.
I'm not sure this refutes the main point - it does appear to be a team of research scientists, they have likely been working on it for year++ and (since upthread mentions it not id'ing Big Bird or a peacock) it's not finished yet.
Someone else mentioned in another thread that "check whether they're in a national park" is only easy now, thanks to 30 years of monumental effort (launching satellites etc). To bring this back to Hacker News, I wonder how long until vision processing will be in the same place? Where it can be taken for granted?
Hate this jokey style of response. Is it stating that the image is from Rocky Mountain; Perhaps it's just a random comment-bot statement about national parks? Bleurgh.
Yes, I'm sure there's a demographic that enjoys inane comments written as if the presentation layer was conscious.