I came across it during a silent/instrumental portion in the song I was testing. I asked only because I am curious how frequently the error might show up, I don't expect it to be very common. It's looking at phrase level instead of word level timestamps which is going to make it hard to tokenize music. I asked simply because the parent comment also tested on Japanese.