Hacker News new | past | comments | ask | show | jobs | submit login

I have a feeling there is a deep connection between perceptual hashes and compressed sensing. Could someone more familiar with the latter weigh in?



Kinda sorta not really. Compressed sensing and sparse coding show that, under certain sparsity assumptions, you can perfectly reconstruct your original data with fewer bits than previously thought. It is a coding principle.

Perceptual hashes (or hashes in general) are used for fast indexing and retrieval. You cannot recreate the original data from a hash, pretty much by definition.

So hashes and coding algorithms both provide smaller representations of data. But hashes are used for indexing and do not provide the original data, or even an approximation thereof, while compressed sensing can.


The key thing here is that compressed sensing is an attempt to throw away redundant data as early as possible in the process. Any data which is redundant in the actual data stream, or can be inferred from prior knowledge about the stream does not need to be measured.

Perceptual hashing is instead an attempt to make the matching problem easier by throwing away data that is seen as irrelevant. In the case of the described algorithm, low frequencies are chosen as relevant and high frequencies as irrelevant. As you point out, this involves losing the ability to recover the original signal and adds the risk of mismatching in cases where data thought to be irrelevant to the task is actually relevant.


If we take the principles from compressed sensing and use a random-lens approach to subsampling the original image, we can create a fingerprint of the image which also happens to be able to reconstruct the original.

Both techniques are compressions that rely on the sparse properties of images to devine which bits are meaningful and which are redundant. It appears to me that using compressed sensing is just a smarter way of doing it. Maybe a hash that starts with a random subsample is inherently slower for comparing millions of images, but I shouldn't think so.


I agree. Briefly looking at the description of it, it is some sort of compressed sensing. The differences from traditional CS are minimal in fact, but the scheme is in line with some of the work undertaken in manifold signal processing. The differences are: - the proposed hash is deterministic, generally in CS, you want to rely on random projections (yet there are some results for deterministic problems) in order to get some sort of universality and by the same token some sort of robustness. - step 3 and 4, are the most fascinating steps because they are clearly one of the approaches used in manifold signal processing for images. To summarize, in order for pictures to be close to each other on a manifold, you really want to defocus them. I'll put something on my blog on the matter. This is the reason why the has of two images next to each other are close in the "hash" or manifold space. - for one image, the hash seems to provide 16 measurements (16 bits of the hash result). That would be OK if the initial picture was at the size and color of the picture after step 1 and 2. So in effect, that information is lost. However, in CS you also have "lossy" scheme such as the 1-bit compressed sensing approach (there you retain only the sign of the measurement!, i.e. a little bit like step 6). The reconstruction of these 1-bit pictures are not the original but they are close).

(ps: I write a small blog on CS).


I developed my answer there:

Are Perceptual Hashes an instance of Compressive Sensing ? http://nuit-blanche.blogspot.com/2011/06/are-perceptual-hash...


Just an interesting note, from implementation perspective, random projection is used for both cases (LSH or CS).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: