>Yes, we don't really understand where emergent capabilities are coming from, at...

HarHarVeryFunny · 2024-03-19T19:09:56 1710875396

I don't see that the difficulty of predicting/anticipating emergent capabilities is really related to undecidability, although there is perhaps a useful computer analogy... We could think of the trained LLM as a computer, and the prompt as the program, and certainly it would be difficult/impossible to predict the output without just running the program.

The problem with trying to anticipate the capabilities of a new model/training-set is that we don't even know what the new computer itself will be capable of, or how it will now interpret the program.

The way I'd tend to view it is that an existing trained model has some set of capabilities which reflect what can be done by combining the set of data-patterns/data-manipulations ("thought patterns" ?) that it has learnt. If we scale up the model and add more training data (perhaps some of a different type than has been used before), then there are two unknowns:

1) What new data-patterns/data-manipulations will it be able to learn ?

2) What new capabilities will become possible by using these new patterns/manipulations in combination with what it had before ?

Maybe it's a bit like having a construction set of various parts, and considering what new types of things could be built with if it if we added some new parts (e.g. a beam, or gear, or wheel), except we are trying to predict this without even knowing what those new parts will be.

nyrikki · 2024-03-20T15:18:32 1710947912

While it is an open question, I found the paper that while not directly related is a reason to be concerned.

Soft attention, applying a probabilistic curve across multiple neurons is why I think it is related.

The Problem with Probabilistic DAG Automata for Semantic Graphs

https://aclanthology.org/N19-1096/