Hacker News new | past | comments | ask | show | jobs | submit login

Nope, perfectly acceptable for Goldman to keep the $100. Craig tried to play a game of semantics, and lost because semantics let Goldman weasel out by pointing to O/S meta data.

The challenge clearly, repeatedly, stated that Goldman would give $5000 to anyone who could compress a datafile such that the combined size of file and decompressor was smaller than the original datafile.

Craig did not compress the data stored. He split the file up into multiple smaller files and using "5" as the boundary. However, by creating 218 files, he increased the amount of space required to store this data. Ergo, the combined size of the "compressed" files and the decompressor exceeded the size of the original file. It is only by excluding the space taken up by the meta data that the "compressed" files are smaller. Furtheremore, elsewhere it was noted that: "The FAQ clearly outlines that filesystem exploits are not true compression."




This interpretation is inconsistent with Goldman's own statement about the original data that "the file size is 3145728". He didn't say "the file size is 3145728 plus some file system overhead", so by file size he was thinking of the number of bytes in the file ... until he was outsmarted.

It's hardly a filesystem exploit if - again by Goldman's own statement - gunzip is allowable.


I suspect that if the challenge had been solved with a single file, Goldman would try to get out of paying by claiming that the program's size should include the size of the interpreter for its language, and the libraries linked to that, the size of the command line needed to invoke it (including the pointer vector and null termination), not to mention the underlying kernel ...


I don't know goldman, and I bet you don't either - but there's a pretty big difference between this solution (which clearly cheats the aim of the challenge, and a solution that actually compresses. People hate to reward cheaters, even if it's a fun kind of cheat from the outside. But that doesn't mean he wouldn't have payed out for a real solution, which likely would have been quite interesting (and not quite as impossible as it's being made out to be, since we don't know whether his random source is truly random).


What does "actually" compressing mean?

Replacing every "5" with EOF is apparently bad.

What if he replaced every "5z" with EOF? Fewer bytes there.

What if he had a variant of LZ77 doing dictionary encoding followed by a range encoder that outputs symbols in the range -1 through 255? Even counting the EOF as a character, this would give an output 2K characters smaller. Sounds like compression to me. It's finding common sequences and uncommon sequences and rescaling them based on probability to remove redundancy.


"compress a datafile such that ..." appears to give a self-contained definition of what it means to compress in the context of the challenge.

In any case, no compression algorithm makes all inputs smaller. So, just because a data encoding doesn't make a particular input case smaller doesn't imply that it's not a compression algorithm.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: