Hacker News new | past | comments | ask | show | jobs | submit login

The two main problems with this approach is it is not rotation-invariant and it does not work well if the image is damaged or added to. A more robust system (that, admittedly, will take longer) is to use one of the affine-invariant feature detection algorithms pioneered by SIFT. SURF is a faster, open-sourced version of SIFT that has many implementations. Essentially it scans chunks of the image at different scales and identifies features that peak even as the chunk around it gets bigger. Once these are identified, they are described in a way forcing them to the same size and orientation for lookup. Since these features should presumably be scattered throughout the image, the image can be recognized even if certain features are obscured or modified. It's certainly not as straight-forward as a DCT metric on a downsampled image, but the nature of widespread image capture, creation and manipulation usually requires this robustness.



As a followup (and this is rather unrelated to the original post), you can combine a feature detector with a statistical clustering algorithm to automatically identify the generic visual properties of objects in an unsupervised manner. One of the first papers attempting this is http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.72.... but many have followed it. From what I understand these still simply check for the existence of features in an image instead of their spatial relationship ('a picture of a cat has two ears, a tail, and a blob for a body' versus 'a picture of a cat has two ears positioned on the end of a blob with a tail on the other end of the blob'). Nevertheless, they represent the current state-of-the-art in automatic object classification and recognition algorithms.


I went to a talk by Yann LeCun [1] a few months back, "Learning Feature Hierarchies for Vision." The current state of the art in this field is mind-blowing to an outsider. The final demo was a program that he trained in a matter of seconds to recognize and distinguish faces of various random audience members, in real time, from different viewing angles.

[1] http://yann.lecun.com/


Is any of that software available?


Some of the libraries are [1]. Probably not the whole kit and caboodle. Interestingly, there was a focus on special purpose hardware that made convolutional network learning possible in realtime. One of the other demos was an autonomous driving robot that learned to recognize obstacles in video. Again, just mind-blowing.

[1] http://www.cs.nyu.edu/~yann/software/index.html


How serious a problem is non-rotational invariance given the purposes for which Tineye may be using perceptual hashing?

In Tineye's data set, wouldn't rotational issues tend to be edge cases since the vast majority of images on the web are already properly oriented? And given that most of the edge cases are likely to involve 90, 180 or 270 degrees of rotation, the additional computational requirements to cover those would appear to be significantly less than required for true non-translational calculation - i.e. wouldn't rotating the low frequency image 90 degrees and generating a second hash + creating reverse order hashes of the original and second hash work?

It seems to me that for applications such as cryptography where the costs of false positives are high, a SIFT algorithm makes sense. But for a free consumer oriented search tool, might it be considered overkill?

Link to SIFT: http://en.wikipedia.org/wiki/Scale-invariant_feature_transfo...


The difference is that you're throwing ad-hoc enhancements on top of an underlying framework which can't be modified beyond its basic principle. Yes, if you are looking for exact or simply scaled duplicates on other websites, a vector search for downsampled images would work ok. However, SIFT/SURF has a much more principled approach that can can be extended to handle more cases "when you need it." It's the difference between hand-writing HTML and using a template engine. On one scale, of course it makes sense to just copy and paste some HTML. But as things get hairier over time, you have to switch to something more powerful.


>approach that can can be extended to handle more cases "when you need it."

I guess my take on TinEye is that cases "when you need it" may be cases outside their target current market segment. Getting people to use their service is probably more important than using a sophisticated algorithm.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: