Authors here: Fun to wake up to this surprise! We are rushing to add GPUs so you...

AMICABoard · on Dec 15, 2022

Awesome, there is another project out there that does it with CPU https://github.com/marcoppasini/musika maybe mix the both, ie take initial output of musika, convert to spectrogram and feed it to riffusion to get more variation...

rexreed · on Dec 16, 2022

"fine-tuned on images of spectrograms paired with text"

How many paired training images / text and what was the source of your training data? Just curious to know how much fine tuning was needed to get the results and what the breadth / scope of the images were in terms of original sources to train on to get sufficient musical diversity.

SamPatt · on Dec 15, 2022

Fascinating stuff.

One of the samples had vocals. Could the approach be used to create solely vocals?

Could it be used for speech? If so, could the speech be directed or would it be random?