Letting it decide how much time computing an answer instead of making it spend the same amount of time per text token could definitely help a lot. Right now it spends the same amount of energy processing each token, but that doesn't make sense, "hello, how are you" should take basically no processing while hard logical problems should take a lot.
Maybe these language models could be way smaller and cheaper if we added a recurse symbol to it that made it iterate many times. Hard tasks would still take a lot, but most banal conversations would be very cheap.
Maybe these language models could be way smaller and cheaper if we added a recurse symbol to it that made it iterate many times. Hard tasks would still take a lot, but most banal conversations would be very cheap.