Hacker News new | past | comments | ask | show | jobs | submit login
Visual question answering using CNN+RNN (github.com/abhshkdz)
108 points by abhshkdz on Nov 30, 2015 | hide | past | favorite | 9 comments



Amazing. I only had a chance to read the README.md but my question is this. What happens if you ask it questions that it could not possibly answer, as in if it were given a picture of the man playing tennis and you asked it what the score was? Is it capable of discerning between questions that cannot be answered (given a particular input) and those that can?


Priors from the language play a much bigger role in the answers that are predicted than the image itself. So for example, if you ask 'What color is ...?', irrespective of the image, it is more likely to spit out colors as the answer. The answers are usually well-aligned with the question that is being asked. 'Yes/no' for binary questions, 'red/blue/etc' for 'What color...', 'tennis/baseball/etc' for 'What sport...' and so on.


Is there a catch to the effectiveness of this?

I haven't seen it before and it seems pretty magical.


Although there is no catch, it's far from perfect and hardly magical. Its accuracy goes up to ~55% on the VQA (http://visualqa.org/) dataset (which is short of state-of-the-art by ~7%).


Did you look at the examples? Many of them are wrong, and others are guessable just from the question (e.g. "What shape is the plate?" I would say "round" without a picture)


I have seen sites using captchas which ask such visual questions thinking that only a human can answer them. This project really makes me doubt the effectiveness of such techniques.


As it stands currently, it's quite far off from cracking captchas. :-)


How do you measure accuracy? Is this a new baseline?


Accuracy is measured as min((number of humans that provided that answer)/3, 1) i.e. 100% accurate if at least 3 humans provided that exact answer, as outlined here: http://visualqa.org/evaluation.html.

No, this model is from the NIPS15 paper by Ren et al (http://arxiv.org/abs/1505.02074).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: