As you can see, binaries submitted for analysis are
identified by their MD5 sums and no sandboxed execution is
recorded if there is a duplicate (thus the shorter time
delay). This means that if I can create two files with the
same MD5 sum – one that behaves in a malicious way while the
other doesn’t – I can “poison” the database of the product
so that it won’t even try to analyze the malicious sample!
So it's a technique to get the scanner to ignore a malicious binary by constructing a non-malicious one with the same MD5 sum. This would be much harder if the scanner used a SHA-1 hash or similar.
virustotal.com allows you to upload files to scan with a whole range of anti-virus programs. Before uploading, it will calculate the hash of your file client-side to see if the file should be uploaded or if a previously uploaded (by someone else) file with same hash should be re-scanned with newer versions of the anti-virus.
I don't know which hashing algorithm they use but just as example of a situation where whitelist is not used.
Yes, I think that's what the author was alluding to here, although I'm not sure:
The approach may work with traditional AV software too as
many of these also use fingerprinting (not necessarily MD5)
to avoid wasting resources on scanning the same files over
and over (although the RC4 encryption results in VT 0/57
anyway…).