> There's plenty of evidence that humans reason differently than ML models; namely basically any human intellectual discovery in history versus the (approximately) zero randomly generated ones by ML.
This reasoning is invalid. For fun, I checked if GPT4 would catch the logical errors you made, and it did. Specifically, it correctly pointed out that absence of evidence is not evidence of absence. But even if there had been evidence of absence, this reasoning is invalid because it presumes that human reasoning must result in intellectual discovery irrespective of how it is employed, and so that if we can't find intellectual discoveries, it must mean an absence of human reasoning. In other words, it invalidly assumes that a difference in outcomes must represent a difference in the structure of reasoning. This is trivially invalid because humans think without making intellectual discoveries all the time.
However, it's also a strawman because I did not claim that humans and ML models reason the same way. I claimed there is no evidence of 'some sort of grand distinction between "probabilistically serializing tokens" and "deliberate human reasoning" other than scale'.
1) This explicitly recognizes that there is a difference, but that it might be just scale, and that we don't have evidence it doesn't. Your argument fails to address this entirely.
2) Even at scale, it does not claim they would be the same, but argues we don't have evidence that "probabilistically serializing tokens" must be inherently different from deliberate human reasoning" to an extent sufficient to call it "some sort of grand distinction". We can assume with near 100% certainty that there are differences - the odds of us happening upon the exact same structure is near zero. That does however not mean that we have any basis for saying that human reasoning isn't just another variant of "probabilistically serializing tokens".
I'll note that unlike you, GPT4 also correctly interpreted my intent when asked to review the paragraph and asked whether it implies the two must function the same. I could* take that to imply that LLMs are somehow better at humans at reasoning, but that would be logically invalid for the same reasons as your argument.
> We don't know exactly how human reasoning works, but the observational evidence clearly indicates it is not by randomly piecing together tokens already known.
Neither does LLMs. Piecing together tokens in a stochastic manner based on a model is not "randomly piecing together" - the model guides the process strongly enough that it's a wildly misleading characterization, as you can indeed trivially demonstrate by actually randomly piecing together words.
But even if we assume a less flippant and misleading idea of what LLMs do, your claim is incorrect. Observational evidence does nothing of the sort. If anything, the rapidly closing gap between human communication and LLMs shows that while it is extremely likely to be structural differences at the low level, it is increasingly unclear whether they are a material distinction. In other words, it's unclear whether the hardware and even hardwired network matters much relative to the computational structure the trained model itself creates.
You're welcome to your beliefs - but they are not supported by evidence. We also don't have evidence the other way, so it's not unreasonable to hold beliefs about what the evidence might eventually show.
This reasoning is invalid. For fun, I checked if GPT4 would catch the logical errors you made, and it did. Specifically, it correctly pointed out that absence of evidence is not evidence of absence. But even if there had been evidence of absence, this reasoning is invalid because it presumes that human reasoning must result in intellectual discovery irrespective of how it is employed, and so that if we can't find intellectual discoveries, it must mean an absence of human reasoning. In other words, it invalidly assumes that a difference in outcomes must represent a difference in the structure of reasoning. This is trivially invalid because humans think without making intellectual discoveries all the time.
However, it's also a strawman because I did not claim that humans and ML models reason the same way. I claimed there is no evidence of 'some sort of grand distinction between "probabilistically serializing tokens" and "deliberate human reasoning" other than scale'.
1) This explicitly recognizes that there is a difference, but that it might be just scale, and that we don't have evidence it doesn't. Your argument fails to address this entirely.
2) Even at scale, it does not claim they would be the same, but argues we don't have evidence that "probabilistically serializing tokens" must be inherently different from deliberate human reasoning" to an extent sufficient to call it "some sort of grand distinction". We can assume with near 100% certainty that there are differences - the odds of us happening upon the exact same structure is near zero. That does however not mean that we have any basis for saying that human reasoning isn't just another variant of "probabilistically serializing tokens".
I'll note that unlike you, GPT4 also correctly interpreted my intent when asked to review the paragraph and asked whether it implies the two must function the same. I could* take that to imply that LLMs are somehow better at humans at reasoning, but that would be logically invalid for the same reasons as your argument.
> We don't know exactly how human reasoning works, but the observational evidence clearly indicates it is not by randomly piecing together tokens already known.
Neither does LLMs. Piecing together tokens in a stochastic manner based on a model is not "randomly piecing together" - the model guides the process strongly enough that it's a wildly misleading characterization, as you can indeed trivially demonstrate by actually randomly piecing together words.
But even if we assume a less flippant and misleading idea of what LLMs do, your claim is incorrect. Observational evidence does nothing of the sort. If anything, the rapidly closing gap between human communication and LLMs shows that while it is extremely likely to be structural differences at the low level, it is increasingly unclear whether they are a material distinction. In other words, it's unclear whether the hardware and even hardwired network matters much relative to the computational structure the trained model itself creates.
You're welcome to your beliefs - but they are not supported by evidence. We also don't have evidence the other way, so it's not unreasonable to hold beliefs about what the evidence might eventually show.