The example directly below that: "Justify the margins" and "The end justifies the means" is the one I find dubious. Obviously the former could mean to format a document, but those exact words in that structure could be a demand for someone to justify a financial margin for example. It is both true and false depending on the context.
It sounds like you're talking about garden-path sentences [0], and in particular: "time flies like an arrow; fruit flies like a banana" [1]. These are sentences whose structure tricks the reader into making an incorrect parse. My favourite of these has always been: "The horse raced past the barn fell".
I've always enjoyed the multiple valid parses of "Time flies like an arrow". I can't wait for AI to generate more Escher sentences like "More people have been to Russia than I have" ( https://en.m.wikipedia.org/wiki/Comparative_illusion )
You know, I only just now got the second interpretation of that sentence. I always thought of it like "Time flies like an arrow (straight and in one direction), Fruit flies like a banana (when thrown)"
"The horse raced past the barn fell, which has been haunted since all those teenagers were murdered there."
(Noun-adjective is a rare formation, but amusingly more common in the same situations where the author uses rare and archaic definitions like the adjective "fell".)
"I eat my rice with butter." could mean that you use butter as a utensil to eat your rice with. There is often an unlikely way of parsing the sentence that gives an alternate meaning. The point is to test the computer to see if it can distinguish the likely parse from an unlikely one.
These aren't really alternate _parses_ though (in the sense that they don't give different parse trees). They do highlight the different possible meanings of "with" though.
I think "I eat my rice with chicken" vs "I eat my rice with children" vs "I eat my rice with chopsticks" is the canonical example here.
There's a whole field in NLP involved in showing what changes happen to entities mentioned in a sentence as a a side effect of the sentence, and this example shows it pretty well.
I think it's more clear if you say "I usually eat X with Y", i.e. Y it's either the company, the tool or the condiment that you eat with (contrasted with "I'm eating my X", where X is a dish like "rice with chicken")
Not to mention something that almost all NLP systems are resounding terrible at - short-term memory. If we've been talking about corporate financials for an hour and I say 'Justify the margins', it should be crystal clear what I mean. But most automated systems try to operate without a hint of memory or 'state' being tracked.
I'm guessing this is intentional. To a human, although this could be somebody being asked to justify their financial margins that's not a very likely answer. The human can easily see that, while it's possible they're the same meaning, given the lack of any other context the answer is that they're not.
The enemy could have landed several of our aircraft on one of their runways. Agassi may have beaten Becker over the head with his tennis racket. I suspect part of the test is that there can be other meanings that do technically work.