The simplest way to improve MD5 would be to add more rounds (but as others have mentioned, there are much better choices for any practical purpose)
RE your last point, I'm not quite sure what attack you're defending against there, but most file formats do not have a well-defined "canonical interpretation", much less so one that is serializable into bytes. (If you think it sounds easy, you haven't thought about it hard enough :P )
Well, for images I still think that there is such a thing, even if it's the least interesting example: the bitmap they ultimately render. (For formats that are allowed to produce different bitmaps, it gets more difficult and maybe impossible.)
For the general case of any file format: I agree this approach is the least simple/dumb/trivial and might even violate the spirit of my original comment itself. But it's still interesting. By "[canonical] interpretation", I just meant some way to fingerprint the content while understanding the format. e.g., if it's a tarball, sum the total number of files and directories inside it. Concatenate all their names in a well-defined order and hash that. I know you can't prevent collisions entirely, but it may be relatively cheap to make it so that 2 files that are different (with respect to the file format) are likely to be represented differently.
If for a moment we assume that you can do it reliably (which I personally doubt, even for "simple" formats) - what's the point? Why not just hash the original file? What's the benefit here?
For example, I could create two different PNGs that decode to the same bitmap. Or I could create one PNG that decodes to multiple different bitmaps, depending on which implementation decodes it (due to implementation bugs and/or under-defined areas of the specification). Or I could create a PNG that is also a valid ZIP archive.
There's no security benefit, and I would have a hard time coming up with a practicality benefit. It's mostly just interesting to think about, especially in response to the article. The article is demonstrating fast MD5 second preimage attacks for various file formats (EDIT: apparently not preimage attacks, just collision attacks), so in response to that I'm wondering how these MD5-specific attacks might be mitigated, for fun. Consider it alternate history fiction in which we never discovered anything better than MD5 :)
In your examples, though, :
> two different PNGs that decode to the same bitmap
But would the the PNGs also have the same MD5 hash?
> one PNG that decodes to multiple different bitmaps, depending on which implementation decodes it
Yeah, that would be a challenge. Relying on implementation details, or results which are allowed to vary, wouldn't work. But since this is meant to supplement an existing MD5 hash, the idea is that the format consumer/interpreter would be in a good position to produce some format-aware fingerprint that is statistically likely enough to be different when the inputs are different.
Ah, OK, I think I misunderstood the article. If you are supplying both images to me, you could do that with the MD5 hashes. Although, I think if you could get them to generate the same bitmap, then the attack has been at least partially mitigated, by definition. Not completely, I admit, but I think it wouldn't qualify as the same attack shown in the article.
RE your last point, I'm not quite sure what attack you're defending against there, but most file formats do not have a well-defined "canonical interpretation", much less so one that is serializable into bytes. (If you think it sounds easy, you haven't thought about it hard enough :P )