> The new text and image generative models can now be used to synthesize training datasets.
No. Just no. Dear god, no.
This isn't too different from GPT-4 grading itself (looking at you MIT math problems)!
Current models don't accurately estimate the probability distribution of data, so they can't be reliable for dataset synthesis. Yes, synthesis can help, but you also have to specifically remember that typically they don't because they generate the highest likelihood data, which is already abundant. Getting non-mean data is the difficult part and without good density estimation you can't reliably do this. The density estimation networks are rather unpopular and haven't received nearly as much funding or research. Though I highly suggest it, but I'm biased because this is what I work in (explicit density estimation and generative modeling).
No. Just no. Dear god, no.
This isn't too different from GPT-4 grading itself (looking at you MIT math problems)!
Current models don't accurately estimate the probability distribution of data, so they can't be reliable for dataset synthesis. Yes, synthesis can help, but you also have to specifically remember that typically they don't because they generate the highest likelihood data, which is already abundant. Getting non-mean data is the difficult part and without good density estimation you can't reliably do this. The density estimation networks are rather unpopular and haven't received nearly as much funding or research. Though I highly suggest it, but I'm biased because this is what I work in (explicit density estimation and generative modeling).