We took a look at encoding phase, but it is very chaotic and looks like Gaussian...

dspig · on Dec 16, 2022

Phase itself looks random, but what makes the sound blurry is that the phase doesn't line up like it should across frequencies at transients. Maybe something the model could grab hold of better is phase discontinuity (deviation from the expected phase based on the previous slices) or relative phase between peaks, encoded as colour?

But the same thing could be done as a post-processing step, finding points where the spectrum is changing fast and resetting the phases to make a sharper transient.

haykmartiros · on Dec 16, 2022

That makes a lot of sense, I would be keen to see attempts at that.