There is a difference between poor reasoning and no reasoning. SOTA LLMs correctly answer a significant number of these questions correctly. The likelihood of doing so without reasoning is astronomically small.
Reasoning in general is not a binary or global property. You aren't surprised when high-schoolers don't, after having learned how to draw 2D shapes, immediately go on to draw 200D hypercubes.
Granting that, the original point was that they're not excited about this particular paper unless (for example) it improves the networks' general reasoning abilities.
The problem was never "my llm can't do addition" - it can write python code!
The problem is "my llm can't solve hard problems that require reasoning"
Reasoning in general is not a binary or global property. You aren't surprised when high-schoolers don't, after having learned how to draw 2D shapes, immediately go on to draw 200D hypercubes.