Show HN: Using computer vision to detect birds in parks

iamwil · on Oct 21, 2014

xkcd is a pretty great programming language. You draw the feature that you want, and then the internets implements it somewhere, somehow.

zuck9 · on Oct 21, 2014

But the difference is only one person can use it.

cbhl · on Oct 21, 2014

My professors used to joke about the "Grad Student" programming language all the time.

erjiang · on Oct 21, 2014

Speaking of CV + birds, I knew the guys behind a small company called Ornicept.

Basically, wind turbines kill birds. To protect birds, you have to collect data on birds in the area. The standard way of doing this was literally putting a guy in a lawn chair for a few hours each day counting birds, and then extrapolate the numbers.

They put some cameras in the field and run CV on the footage to count the birds instead. Cameras are cheaper than people in lawn chairs, apparently. I don't know if they still do this, but it was their initial product.

tiborsaas · on Oct 20, 2014

Almost :) http://imgur.com/gDWXo08

hemangshah · on Oct 21, 2014

Looks like it does quite well (on good images with closeup of a single bird).

Some fails:

http://i1-news.softpedia-static.com/images/news2/Washington-...

http://fc04.deviantart.net/fs71/i/2012/181/1/e/eyes_on_a_bir...

A couple that barely made it:

http://www.jlvaillant.com/Animals/Birds/Egrets-Herons/i-CDVT...

http://www.nps.gov/calo/naturescience/images/IMG_1399_edited...

On the whole, its very impressive though!

vinayan3 · on Oct 21, 2014

http://i.imgur.com/Svn2bZ4.jpg

Also, Matt Zeiler runs an image recognition startup which the image above was generated from. Check them out at http://clarifai.com/

cowpig · on Oct 20, 2014

http://bioexpedition.com/wp-content/uploads/2012/04/Peacock_... wrong

http://totallygreencrafts.com/wp-content/uploads/2012/11/bac... returns ???

even http://howtomakeapapercrane.net/u/paper-origami-crane.jpg fails!!! OK OK maybe the "NO" is correct this time

infinitone · on Oct 20, 2014

Heh.. the peacock does a fine job protecting itself... even from computers.

r3m6 · on Oct 21, 2014

Nice. They use deep convolutional neural nets, which is the key algorithm that dominated the ILSVRC 2014. This computer vision contest included, among other challenges, recognizing different bird (and dog and cat and spider and.... ) breeds. There is a blog post about ILSVRC that even uses the same xkcd comic ;) http://blog.a9t9.com/2014/10/amazing-progress-of-computer-vi...

franciscop · on Oct 20, 2014

I put a picture of some chicken breasts and it didn't recognize it, is it a feature?

Joke aside, incredible work here. Did you do everything from scratch or used some library to get faster results?

verelo · on Oct 20, 2014

I'm seeing a lot of pictures of flowers and bees in the bird pictures. I'm guessing this is due to the way the training is done, anyone from Flickr who can comment on that?

tlack · on Oct 20, 2014

Some pretty clear bird photos aren't working either; I guess it has to be a close up, which kind of ruins the usefulness for me. http://rideintobirdland.com/wp-content/uploads/2012/04/Littl...

JangoSteve · on Oct 20, 2014

Also, the exact question you're trying to answer makes a difference. If the question is, "Is this a picture of a bird?" where "of a bird" means that the primary focus of the picture is a single bird, then your picture should return "No". That's different than, "Are there one or more birds in this picture?"

I'm not sure which question Flickr's team actually seeks to answer here.

kcarnold · on Oct 20, 2014

It's cool that a team of deep learning researchers can pull this off quickly. Anyone know of an "image2vec" (word2vec for images) that would empower others to try out similar things? (unfortunately it would need a better name, because "vectorize" means something different for images.)

ajtulloch · on Oct 20, 2014

Caffe (and other frameworks) provide exactly this. It's basically:

1) To setup, load a pre-trained AlexNet/Overfeat/other architecture model (e.g. trained on ILSVRC2012)

2) To get a vector from an image, run a forward pass on the images, and extract the activations at a given layer (e.g. fc7) as the output vector.

http://caffe.berkeleyvision.org/gathered/examples/feature_ex... is a step-by-step walkthrough.

There's a lot of mystique around deep learning and these kind of problems, but it's not _that_ difficult to use these tools.

levlandau · on Oct 20, 2014

I think you'd need something slightly more complex than a "word2vec" since images already have a well defined "word vector" i.e. a pixel. What you want is a "parser" that can take in an image and spit out the significant parts of it? Stanford might have the code up from this paper ( http://machinelearning.wustl.edu/mlpapers/paper_files/ICML20...) up on their site.

Nogwater · on Oct 20, 2014

Do you just want the neural net, or the tags too? This might not be what you want, but it looks like it could be fun to play with: http://clarifai.com/

tolkienfanatic · on Oct 20, 2014

Does not recognize Big Bird. I proclaim this to be a failure.

avalaunch · on Oct 20, 2014

Ha. I tried Big Bird too. Under "For Bird?", it said "It certainly wants to be."

atburrow · on Oct 21, 2014

Is this something that could be accomplished with the help of CCV?

https://github.com/liuliu/ccv

bambax · on Oct 20, 2014

Totally not what I was expecting!

What would be very cool would be some kind of trigger app where you leave your phone in a tree for an hour, and it takes pictures of all the birds that come within range.

Detecting whether a given image has a bird in it, while certainly difficult and "interesting" from a CS point of view, is not very interesting from a user perspective? (Photographers can always tag their images when they submit them).

rictic · on Oct 21, 2014

The advantage of doing it from a user perspective is that people in practice don't tag their images, but others want to find e.g. CC licensed pictures of blue jays in flight.

talmand · on Oct 21, 2014

XKCD requests Park or Park with Bird.

Flickr built service as requested. Web page titled Park or Bird but works as Park or Park with Bird. Who named this thing?

Cool project regardless.

callesgg · on Oct 20, 2014

Does not work on tablet :(

therobot24 · on Oct 20, 2014

looks like either a typo or a missed edit of a previous figure: http://imgur.com/FUQru1m

on Oct 20, 2014

[deleted]

ianmcgowan · on Oct 20, 2014

I'm not sure this refutes the main point - it does appear to be a team of research scientists, they have likely been working on it for year++ and (since upthread mentions it not id'ing Big Bird or a peacock) it's not finished yet.

Someone else mentioned in another thread that "check whether they're in a national park" is only easy now, thanks to 30 years of monumental effort (launching satellites etc). To bring this back to Hacker News, I wonder how long until vision processing will be in the same place? Where it can be taken for granted?

on Oct 20, 2014

[dead]

Kiro · on Oct 20, 2014

You didn't click the link, did you?

BerislavLopac · on Oct 21, 2014

LOL! Actually I did, but I was at the laptop at the time and didn't scroll so I missed it... :D

NicoJuicy · on Oct 20, 2014

I had this xkcd in mind before i even clicked the topic / comment section..

But then i watched the link, guess what's on page 1...

azeirah · on Oct 20, 2014

http://xkcd.com/1425/

on Oct 20, 2014

[dead]

tgcordell · on Oct 21, 2014

They actually reference this in the article as the source of inspiration. Its clear that neither of you read it.

pbhjpbhj · on Oct 20, 2014

>"Hey, yeah! I went to Rocky Mountain once!" //

Hate this jokey style of response. Is it stating that the image is from Rocky Mountain; Perhaps it's just a random comment-bot statement about national parks? Bleurgh.

Yes, I'm sure there's a demographic that enjoys inane comments written as if the presentation layer was conscious.