It's called hallucination. As the model is trained on unsupervised data, such er...

knaik94 · on Sept 22, 2022

I came across it during a silent/instrumental portion in the song I was testing. I asked only because I am curious how frequently the error might show up, I don't expect it to be very common. It's looking at phrase level instead of word level timestamps which is going to make it hard to tokenize music. I asked simply because the parent comment also tested on Japanese.