Mike Goldman originally wrote the challenge such that it calls for one file and one decompressor.
However, when subsequently asked whether there can be multiple files, he agreed; thereby he was arguably duped. He didn't say "okay, but there will be a 256 byte size penalty per additional file", he just plainly agreed.
This means that the original formula for adding the size of the solution applies: just the file sizes added together.
Goldman should accept that he foolishly rushed into a careless amendment of his original challenge and pay the money.
That said, it obviously is cheating to have the archive format or file system hide the representation of where the removed bytes are! If a single file is produced, it has to include a table of where to insert the bytes that were taken out. If multiple files are produced, the archive format or file system stores that information for you at considerable expense.
If both people are wrong, the contest should be declared invalid and Goldman should return the $100.
If only Goldman is wrong, he should pay $5000.
Under no interpretation is Goldman strictly right and the contestant strictly wrong.
> He didn't say "okay, but there will be a 256 byte size penalty per additional file", he just plainly agreed.
Exactly. IMO the challenge-setter is the one being more unfair here, since his metadata-based reason to reject the solution applies to submissions made under the original rules too. He accepted an amendment without counter-amending to cover that loophole, so if he stands by his word, he has to pay up.
It casts doubt on whether he would pay if somebody won by the original rules with some luck. Maybe he has an implicit allowance, that merely two files can't "encode" a big enough advantage, but again, then it's his fault for not addressing that in the new rules.
It's effectively a prop bet. If you're a world class table tennis champion and you bet someone you can beat them at table tennis on the proviso that they get to pick the bats, you can't really complain if you find out later that they have spent six months practicing with frying pans. You just have to pay up.
If you don't see the loophole before you agree, you pay. Actually there's a great little bit about this in the movie Guys And Dolls (1955), and what to do if someone bets you that he can make a jack of spades squirt cider in your ear.
Note that he never guaranteed to leave the files with the same file names, and without that, Patrick's trick wouldn't have worked.
Given the fact that Patrick made the rules concerning multiple files, and that they were intentionally tricky, I think it's fair to interpret them in that spirit - as a trick.
Notably, even without the ordering there is still some information leakage purely in terms of file sizes, but that's a little harder to exploit (likely still possible with a 3mb file). Of course, the rules don't explicitly state the directory must be otherwise empty...
Frankly, I think it's pretty reasonable that Patrick "lost" the bet, even if Mike reasoning wasn't sound. For Patrick to have won that bet would have required quite a list of requirements of how those "other" files were to be passed to the decompressor. Such requirements were not presented, and if they had been they would have been obviously fishy.
He could have ordered the files by size pretty easily. Trick still would have worked. No fancy requirements needed. Just making the files available in any manner.
And yes there needs to be a way to know which file(s) are the compressed data, that loophole would disqualify any entry.
Yep, file sizes alone leak sufficient information to be an issue, but it's a bit of a hypothetical because that's not what actually happened. Clearly, Mike shouldn't have allowed multiple files (or should have specified some kind of overhead for multiple files), and clearly there are other tricks Patrick could have used, but with these unfortunate rules and this submission nothing seems to require keeping file order intact.
It's a good point about needing to know which files are compressed. On a philosophical level, I'm not sure whether that means there actually should be a way to know which file(s) are the compressed data, or that the rules are simply broken...
What I'm trying to argue is that any justifiable objection Mike could have made would have failed.
Unjustifiable objections could have been made but screw those, because they would cause legitimate compression to fail.
Unjustifiable objections will pretty much always exist if the rules are written in English. That doesn't mean the rules are broken, it means you use judgement and follow the purpose of the rules.
Mike's fate was sealed when he allowed multiple files without an overhead penalty. He could have removed all the metadata possible, simply having the files exist as separate entities was enough to ruin him.
Nope, perfectly acceptable for Goldman to keep the $100.
Craig tried to play a game of semantics, and lost because semantics let Goldman weasel out by pointing to O/S meta data.
The challenge clearly, repeatedly, stated that Goldman would give $5000 to anyone who could compress a datafile such that the combined size of file and decompressor was smaller than the original datafile.
Craig did not compress the data stored. He split the file up into multiple smaller files and using "5" as the boundary. However, by creating 218 files, he increased the amount of space required to store this data. Ergo, the combined size of the "compressed" files and the decompressor exceeded the size of the original file. It is only by excluding the space taken up by the meta data that the "compressed" files are smaller. Furtheremore, elsewhere it was noted that: "The FAQ clearly outlines that filesystem exploits are not true compression."
This interpretation is inconsistent with Goldman's own statement about the original data that "the file size is 3145728". He didn't say "the file size is 3145728 plus some file system overhead", so by file size he was thinking of the number of bytes in the file ... until he was outsmarted.
It's hardly a filesystem exploit if - again by Goldman's own statement - gunzip is allowable.
I suspect that if the challenge had been solved with a single file, Goldman would try to get out of paying by claiming that the program's size should include the size of the interpreter for its language, and the libraries linked to that, the size of the command line needed to invoke it (including the pointer vector and null termination), not to mention the underlying kernel ...
I don't know goldman, and I bet you don't either - but there's a pretty big difference between this solution (which clearly cheats the aim of the challenge, and a solution that actually compresses. People hate to reward cheaters, even if it's a fun kind of cheat from the outside. But that doesn't mean he wouldn't have payed out for a real solution, which likely would have been quite interesting (and not quite as impossible as it's being made out to be, since we don't know whether his random source is truly random).
What if he replaced every "5z" with EOF? Fewer bytes there.
What if he had a variant of LZ77 doing dictionary encoding followed by a range encoder that outputs symbols in the range -1 through 255? Even counting the EOF as a character, this would give an output 2K characters smaller. Sounds like compression to me. It's finding common sequences and uncommon sequences and rescaling them based on probability to remove redundancy.
"compress a datafile such that ..." appears to give a self-contained definition of what it means to compress in the context of the challenge.
In any case, no compression algorithm makes all inputs smaller. So, just because a data encoding doesn't make a particular input case smaller doesn't imply that it's not a compression algorithm.
I admit I'm surprised to see this comment is still the top comment. I wrote it off when it was first written as being far too pedantic to be at all meaningful, but I guess other people are falling for the same trick.
This is pure pedantry, and it's bad pedantry at that. Your argument is that Mike agreed that multiple files could be submitted, and from that you're drawing the completely baseless conclusion that the multiple files would be considered purely based on their filesize. According to a straightforward reading of the challenge, only the filesize of the decompressor and compressed file would matter. But when Mike agreed to multiple compressed files, nowhere did anyone state that only the filesize of the additional compressed files would matter.
And it should be pretty obvious to anyone that you cannot simply accept just the filesize of the multiple compressed files. Because the only point in using multiple files is to try and hide extra data in either the number of files or their filenames. And that obviously violates the entire point of the challenge.
If you want to take Patrick's multiple files approach to its logical conclusion, you may as well submit a bunch of zero-sized files where all of the data is stored in the filename (perhaps with an integral prefix for sorting purposes) and have your decompressor just be some variant on `echo` that strips the prefix and echoes the rest of the filename (without space separation). That way you could claim infinite compression!
---
If you're still not convinced, then how about this: nowhere did Mike agree that he would not rename the decompressor or compressed file(s) prior to running the program. And a trivial renaming would have broken Patrick's "decompressor". That alone should cause his entry to obviously be a failure.
> completely baseless conclusion that the multiple files would be considered purely based on their file size
You mean, like...
> I meant can I send you a compressor and several compressed files whose total file size is less than the original uncompressed file and from which I can regenerate the original uncompressed file.
Since this is still a pedantry argument: Mike agreed that Patrick could send multiple files, but Mike never agreed that these files would be accepted as a winning submission. Furthermore, Mike never agreed that the filenames of the files would be considered to be meaningful data (without contributing to the filesize calculation).
Huh? Patrick asked if he could send multiple files. Mike said yes. Again, since the context of this is pedantic arguments, agreeing that Patrick could send multiple files is not the same as agreeing that Mike would consider the multiple files to be a winning submission.
I have to logically agree with you. Indeed, Mike never issued a statement like, "I am amending the contest rules so that multiple files decoded by a single program are admitted as a valid solution."
Only, this trickerly was not Mike's intent, otherwise, of course, he would have played this card immediately. For instance:
"I said you could send that to me because that is a true statement with which I therefore agree. It is undeniably true that you can send me anything you like. You can send me an e-greeting card for my birthday, and you can send me a Britney Spears MP3 as a MIME attachment. You can send me questions asking about what you can send me. Obviously, though, the only material which you can send me which also happens to conform to the contest rules is a single program and a single data file. Please re-read the contest statement; I will gladly explain any portion of it that is not clear."
Mike had no need to complain that the program isn't a decompressor. He should have played it as above.
Instead, he demonstrated acceptance of the the multi-file structure of the solution by complaining that the algorithm in the program file isn't a form of compression.
In fact this solution doesn't use either the number of files or their filenames to encode additional information - it uses the sizes of those files to encode the additional information.
We probably agree that any program should be allowed to uniquely identify the files making up the input data. And if Mike would have spotted the (other) more obvious loophole, which is to store input data in only the filenames, he could have given this (possibly the only) sensible answer: "Yes, multiple files whose summed up size + the size of the decompressor are smaller than the original file are ok. But the files must be uniquely named inputfile.0 ... inputfile.NNN." Which is exactly what Patrick delivered.
But that's not what happened. Patrick, not Mike, changed the rules, and he intentionally introduced a loophole. But in doing so, he failed to specify that file names were to be left unaltered. I think it's fair that if you make intentionally tricky rules, you not be surprised when your own loopholes are exploited.
Providing multiple files with the (deceptive) intention of storing data via filesystem metadata, but then failing to specify how filesystem metadata is to be treated is just asking for it.
This is exactly why such challenges are always entirely unsatisfactory and nothing but a pointless distraction. There might be some point in providing entertainment to both parties if the pedantry war weren't entered into, but the nature of the challenge all but guarantees that will happen.
Mike will always be able to find an excuse not to pay. If Parrick had imposed the 'no file renaming' condition, Mike could have found another way in which to break his submission with a counter loophole.
This type of challenge should never be made or accepted, if financial incentives are involved, without an impartial adjudicator.
It's only a pointless distraction when people see these challenges and treat them as a game to be tricked. Mike already made it clear the challenge is not about financial compensation (and offered to give Patrick the $100 back even though Patrick did not win, under the very generous assumption that Patrick innocently misunderstood the challenge instead of deliberately tried to subvert it).
The point of a challenge like this is the same as the point of the Randi challenge: to provide a means to demonstrate that impossible claims are impossible (in this case, that of being able to compress truly random data, and in the case of the Randi challenge, supernatural/magical abilities such as ESP). After all, if any such claim were valid, then the claimant would be able to defeat the challenge, assuming that the stated rules are fair (and they are).
Which is to say, the point of a challenge like this is not actually to have anyone enter. It's just to be.
> This type of challenge should never be made or accepted, if financial incentives are involved, without an impartial adjudicator.
Did you read the whole page? Mike did offer an impartial adjudicator:
> I would gladly submit any such submission to an impartial ombudsman to determine the question of
whether data compression has occurred.
I'm not sure I'd go that far, but it's definitely not a surprise to see it fizzle out. It's particularly uninteresting here because a real solution is either impossible, or only possible because Mike's random source isn't entirely random (which is kinda boring).
Then again, the purpose apparently was to make crackpots put their money where their mouth is rather than spam the usegroup, so perhaps fairness isn't the best success criterium.
I don't think that's relevant - in a sense the ordering information is only required because using individual files has otherwise destroyed some information (by allowing reordering within the output data).
The same process would work with a single stream input to the decompressor, as long as the length of each block where an additional '5' must be output is available - the list of file sizes - so that's where the additional information is stashed.
However, when subsequently asked whether there can be multiple files, he agreed; thereby he was arguably duped. He didn't say "okay, but there will be a 256 byte size penalty per additional file", he just plainly agreed.
This means that the original formula for adding the size of the solution applies: just the file sizes added together.
Goldman should accept that he foolishly rushed into a careless amendment of his original challenge and pay the money.
That said, it obviously is cheating to have the archive format or file system hide the representation of where the removed bytes are! If a single file is produced, it has to include a table of where to insert the bytes that were taken out. If multiple files are produced, the archive format or file system stores that information for you at considerable expense.
If both people are wrong, the contest should be declared invalid and Goldman should return the $100.
If only Goldman is wrong, he should pay $5000.
Under no interpretation is Goldman strictly right and the contestant strictly wrong.
So he is wrong to keep the $100 in any case.