The two main problems with this approach is it is not rotation-invariant and it ...

wickedchicken · on June 3, 2011

As a followup (and this is rather unrelated to the original post), you can combine a feature detector with a statistical clustering algorithm to automatically identify the generic visual properties of objects in an unsupervised manner. One of the first papers attempting this is http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.72.... but many have followed it. From what I understand these still simply check for the existence of features in an image instead of their spatial relationship ('a picture of a cat has two ears, a tail, and a blob for a body' versus 'a picture of a cat has two ears positioned on the end of a blob with a tail on the other end of the blob'). Nevertheless, they represent the current state-of-the-art in automatic object classification and recognition algorithms.

sigil · on June 3, 2011

I went to a talk by Yann LeCun [1] a few months back, "Learning Feature Hierarchies for Vision." The current state of the art in this field is mind-blowing to an outsider. The final demo was a program that he trained in a matter of seconds to recognize and distinguish faces of various random audience members, in real time, from different viewing angles.

[1] http://yann.lecun.com/

_delirium · on June 3, 2011

Is any of that software available?

sigil · on June 3, 2011

Some of the libraries are [1]. Probably not the whole kit and caboodle. Interestingly, there was a focus on special purpose hardware that made convolutional network learning possible in realtime. One of the other demos was an autonomous driving robot that learned to recognize obstacles in video. Again, just mind-blowing.

[1] http://www.cs.nyu.edu/~yann/software/index.html

brudgers · on June 3, 2011

How serious a problem is non-rotational invariance given the purposes for which Tineye may be using perceptual hashing?

In Tineye's data set, wouldn't rotational issues tend to be edge cases since the vast majority of images on the web are already properly oriented? And given that most of the edge cases are likely to involve 90, 180 or 270 degrees of rotation, the additional computational requirements to cover those would appear to be significantly less than required for true non-translational calculation - i.e. wouldn't rotating the low frequency image 90 degrees and generating a second hash + creating reverse order hashes of the original and second hash work?

It seems to me that for applications such as cryptography where the costs of false positives are high, a SIFT algorithm makes sense. But for a free consumer oriented search tool, might it be considered overkill?

Link to SIFT: http://en.wikipedia.org/wiki/Scale-invariant_feature_transfo...

wickedchicken · on June 3, 2011

The difference is that you're throwing ad-hoc enhancements on top of an underlying framework which can't be modified beyond its basic principle. Yes, if you are looking for exact or simply scaled duplicates on other websites, a vector search for downsampled images would work ok. However, SIFT/SURF has a much more principled approach that can can be extended to handle more cases "when you need it." It's the difference between hand-writing HTML and using a template engine. On one scale, of course it makes sense to just copy and paste some HTML. But as things get hairier over time, you have to switch to something more powerful.

brudgers · on June 4, 2011

>approach that can can be extended to handle more cases "when you need it."

I guess my take on TinEye is that cases "when you need it" may be cases outside their target current market segment. Getting people to use their service is probably more important than using a sophisticated algorithm.