It is fascinating that it is now able to continue both piano and speech coherently. At least for a while, it is indistinguishable to me (in the examples) to what a human would do.
When reading a text or listening to a speaker there is the underlying assumption that the speech act was intentional and was meant to convey something. We use that assumption to fill in the blanks and to correct anything that is unclear. We do it automatically. This is why lawyers are so annoying. They are trained to suppress this instinct, thus finding loopholes and gaps.
When I read a GPT-3 generated text, I catch myself that I do the same thing. I'm very forgiving. Interestingly, with these speech continuation the illusion of somebody being there on the other side, is much stronger. It's like listening to an old Feynman lecture or the like. You don't quite get it, but surely it must make sense, right. It doesn't of course. The box is empty. Or is it?
If the internet is flooded with machine generated believable nonsense, how much human brain hours will be spent wondering if they don't quite get it, or if really make no sense.
This one suffers from the same problem that previous audio generation methods had. It correctly mimics the piano timbre but there is no global structure in the generated melodic lines. There is some style imitation but no melodic coherence.
The paper is here: https://arxiv.org/pdf/2209.03143.pdf