Read the last paragraph. You still have humans, but their input is more akin to a movie reviewer than a movie director/writer/actor/etc. It still takes skill, but it takes a lot less time.
RLHF typically employees humans, and that can be time consuming in itself, but less time consuming than creating content. And their efforts can be amplified. If they are actually rating unpaid humans, that is, users, who are willing to provide feedback and are also prompting the system. Plenty of people are happy to do this for free, and some of it happens, just as a byproduct of them doing what they're already doing, creating content and choosing, which comes out good and which one doesn't. Every time I am working through a coding problem with chatGPT, and it makes mistakes and I tell her about those mistakes, it can be learning from that.
People can also come up with coding problems that can run and test itself on. As a simple example, I imagine it's trying to write a sorting algorithm. It can also write a testing function simply tests that it is correctly sorted. They can also time its results, they can count how many steps had to do in that sense it can work just like Alpha zero, where there is an objective goal, which is to do it with the least clock cycles, and there's a way to test whether and how well it is a achieving that goal. While that may be a limited number of programming problems that that works for, by practicing on that type of problem it will presumably get better at other types of problems, just like humans do.
This is exactly what large language models do, they find a way to objectively test their writing ability, which is by having them predict words and things that they've never seen before. In a sense it's different from actually writing new creative content, but it is practicing skills that you need to tap into when you are creating new content. Interestingly, a lot of people will dismiss them as simply being word predictors, but that's not really what they're doing. They're predicting words when they're training, but when they're actually generating new content, they're not "predicting" words (you can't predict your own decisions, that doesn't make sense), they are choosing words.
The same way we do it. Verifying that an output is good is far easier than producing a good output. We can write a first draft, see what's wrong with it, make changes, and iterate on that until it's a final draft. And along the way we get better at writing first drafts.
> With Alpha Go, you have a clear objective -- to win a game. How does that work for creative outputs?
there are still tons of potentially valuable applications with clear objective: win stock market, create new material or design to maximize some metrics, etc.
>The discriminator in a GAN is simply a classifier. It tries to distinguish real data from the data created by the generator. It could use any network architecture appropriate to the type of data it's classifying.
I think similar to humans, creativity will be an emergent behavior as a result of the intelligence needed to pass other tests. Evolution doesn't care about our art, but the capabilities we use to produce it also help us with survival.
This isn’t directly about creativity, but I suspect a lot of training will happen in simulated environments. A sandboxed Python interpreter is a good example. There are plenty of programming questions to train on.