I'm not proposing a change in model size; rather, I'm suggesting a higher dimensionality within the current structure. There’s an interesting paper on LLM explainability, which found that individual neurons often represent a superposition of various data elements.
What I’m advocating is a substantial increase in this aspect—keeping model size the same while expanding dimensionality. The "curse of dimensionality" illustrates how a modest increase in dimensions leads to a significantly larger volume.
While I agree that backpropagation isn’t a complete solution, it’s ultimately just a stochastic search method. The key point here is that expanding the dimensionality of a model’s space is likely the only viable long-term direction. To achieve this, backpropagation needs to work within an increasingly multidimensional space.
A useful analogy is training a small model on random versus structured data. With structured data, we can learn an extensive amount, but with random data, we hit a hard limit imposed by the network. Why is that?
What I’m advocating is a substantial increase in this aspect—keeping model size the same while expanding dimensionality. The "curse of dimensionality" illustrates how a modest increase in dimensions leads to a significantly larger volume.
While I agree that backpropagation isn’t a complete solution, it’s ultimately just a stochastic search method. The key point here is that expanding the dimensionality of a model’s space is likely the only viable long-term direction. To achieve this, backpropagation needs to work within an increasingly multidimensional space.
A useful analogy is training a small model on random versus structured data. With structured data, we can learn an extensive amount, but with random data, we hit a hard limit imposed by the network. Why is that?