I think the last paragraph quite makes sense. It seems "true" that some kind of reasoning capability emerges as LLMs get bigger, which makes those LLMs quite useful and blows a lot of people's minds at the beginning. But, I think, essentially, the fundamental training goal of LLMs--guessing what the next word should be--pushes the model into a kind of reasonable nonsense generator, and the reasoning capability emerges because it can help the model to make stuff up. Therefore, we should be cautious about the result generated by these LLMs. They might be reasonable, but to make up the next word is their real top priority.