I've been doing some things with Whisper and find the accuracy very good, BUT I've found the timestamps to be pretty bad. For example, using the timestamps directly to clip words or phrases often clips off the end of word (even simple cases where is followed by silence). Since this emphases word timestamps, I may give it a try.