Ever heard of something called diminishingly returns?
The value improvement between 17.5b parameters and 175b parameters is much greater than the value improvement between 175b parameters and 18t parameters.
IOW, each time we throw 100 times more processing power at the problem, we get a measly 2 time increase in value.
The value improvement between 17.5b parameters and 175b parameters is much greater than the value improvement between 175b parameters and 18t parameters.
IOW, each time we throw 100 times more processing power at the problem, we get a measly 2 time increase in value.