If you include multimodal data then I think it's pretty obvious that training is compute limited.
Also current SOTA models are good enough that you can generate endless training data by letting the model operate stuff like a C compiler, python interpreter, Sage computer algebra, etc.
Also current SOTA models are good enough that you can generate endless training data by letting the model operate stuff like a C compiler, python interpreter, Sage computer algebra, etc.