Interesting, I suppose what you're proposing is that models could, in some abstr...

Interesting, I suppose what you're proposing is that models could, in some abstract way, extrapolate research results taking ideas A and B that it "knows" from its training, and using them to create idea AB. Then, we assert that there is some "validation system" that can be used to validate said result, thus creating a new data point, which can be retrained on.

I can see how such a pipeline can exist. I can imagine the problematic bit being the "validation system". In closed systems like mathematics, the proof can be checked with our current understanding of mathematics. However, I wonder if all systems have such a property. If, in some sense, you need to know the underlying distribution to check that a new data point is in said distribution, the system described above cannot find new knowledge without already knowing everything.

Moreover, if we did have such a perfect "validation system", I suppose the only thing the ML models are buying us is a more effective search of candidates, right? (e.g., we could also just brute force such a "validation system" to find new results).

Feel free to ignore my navel-gazing; it's fascinating to discuss these things.