That one's really damning. It's implying that they're using some sort of syntactic equivalence to compare solutions, which is a big no-no. It's not that difficult to generate a bunch of tests and run both solutions through an interpreter, and in this case, it's not that hard to compare them both on every possible input by running them through an abstract interpreter.
Not to mention, their example said that the return value would be (3 + 5) / 2, as opposed to just returning 4.
Here I was, trying to figure out how to write a class named Stack, that behaved like a list, had a push(self,x) method that did an append. Also, adjust the "op()" function so that instead of calculating the results of a mathematical operation, it returned a string holding the operation (like "(3 + 5)") being sure to add parentheses if there was a precedence issue.
...all in one line of code.
yeah, that one particular challenge was screwed up. The other ones were much better.