I think these are the main drivers behind the progress:
- Unsupervised learning techniques, e.g. transformers and diffusion models. You need unsupervised techniques in order to utilize enough data. There have been other unsupervised techniques in the past, e.g. GANs, but they don't work as well.
- Massive amounts of training data.
- The belief that training these models will produce something valuable. It costs between hundreds of thousands to millions of dollars to train these models. The people doing the training need to believe they're going to get something interesting out at the end. More and more people and teams are starting to see training a large model as something worth pursuing.
- Better GPUs, which enables training larger models.
- Honestly the fall of crypto probably also contributed, because miners were eating a lot of GPU time.
I don't think transformers or diffusion models are inherently "unsupervised", especially not the way they're used in Stable Diffusion and related models (which are very much trained in a supervised fashion). I agree with the rest of your points though.
I disagree. Diffusion models are trained to generate the probability distribution of their training dataset, like other generative models (GAN, VAE, etc). The fact that the architecture is a Transformer (or a CNN with attention like in Stable Diffusion) is orthogonal to the generative vs discriminative divide.
Unsupervised is a confusing term as there is always an underlying loss being optimized and working as a supervision signal, even for good old kmeans. But generative models are generally considered to be part of unsupervised methods.
> The belief that training these models will produce something valuable
Exactly. The growth in the next decade is going to be unimaginable because now governments and MNCs believe that there realistically be progress made in this field.
- Unsupervised learning techniques, e.g. transformers and diffusion models. You need unsupervised techniques in order to utilize enough data. There have been other unsupervised techniques in the past, e.g. GANs, but they don't work as well.
- Massive amounts of training data.
- The belief that training these models will produce something valuable. It costs between hundreds of thousands to millions of dollars to train these models. The people doing the training need to believe they're going to get something interesting out at the end. More and more people and teams are starting to see training a large model as something worth pursuing.
- Better GPUs, which enables training larger models.
- Honestly the fall of crypto probably also contributed, because miners were eating a lot of GPU time.