I'm not preprocessing the recording. The thing I mentioned with considering VAD, if that's what you're referring to, is that I should somehow trim the files.
In order to minimize my interaction with my phone during the bike ride, I press a button which records 1 minute of audio. If I know that I need more time, I press it again before the minute ends, this then starts a second recording in parallel which also lasts one minute. So I just have to press a button and can forget about it. This is because I noticed that I usually don't require more than one minute to record a thought, and if I have multiple, I can put them in multiple files.
But since my recordings then usually consist of 20 seconds of audio, the 30 seconds at the end are only silence (with wind and tire noise). Whisper splits the files into 30 second segments, and apparently tries to find voice in each segment, so the remaining one which has no voice causes Whisper issues, where it starts hallucinating. This is why I would like to trim the files.
I now noticed that the service doesn't add punctuation and capitalization, so the funny thing is that I took that output and posted it into ChatGPT like this: "Correct the following: '[text from whisper]'", and it does an incredible job of fixing even words which Whisper erred on.
-
Whisper:
ich habe gestern erste tests mit open ai whisper gemacht um nozizen [sic!] zu transkribieren
[ Yesterday I did my first tests with open ai whisker to transcribe nozizen [sic!]. ]
es waren teilweise recht gute ergebnisse vor allem mit medium
[ there were some really good results, especially with medium ]
latsch [sic!] natürlich besser aber da sind die anforderungen zu hoch
[ latsch [sic!] better of course, but the demands are too high ]
wenn ich da einen server draus mache könnte ich mal eine zeit lang ausprobieren ob sich das lohnt
[ If I make a server out of it I could try it out for a while to see if it's worth it ]
auch für anrufe der anruf der antworten
[ also for calls the call of the answers ]
-
then ChatGPT:
Ich habe gestern erste Tests mit OpenAI's "Whisper" gemacht, um Notizen zu transkribieren. Die Ergebnisse waren teilweise recht gut, vor allem mit "Medium". "Large" funktioniert natürlich besser, aber die Anforderungen sind zu hoch. Wenn ich einen Server dafür bereitstelle, könnte ich mal für eine Zeit lang ausprobieren, ob sich das lohnt, auch für Anrufe und Antworten.
[ Yesterday I made first tests with OpenAI's "Whisper" to transcribe notes. The results were sometimes quite good, especially with "Medium". "Large" works better, of course, but the requirements are too high. If I provide a server for it, I could try it out for a while to see if it's worth it, also for calls and answers. ]
I'm sorry that this is in German, but I don't have anything in English I've been testing on.
This is great! Thank you. We’re very similar actually: I also tried getting chat GPT to correct transcripts for errors but when I tried like 3 weeks ago it couldn’t manage. I just use voice memos app and let it run. I just talk and don’t think about the file. So I only have 1 file and my WisprNote app removes nearly all the non speech and passes it to whisper. I think there’s actually a voice memo setting on MacOS that will cut silence automatically but I don’t use it.
In order to minimize my interaction with my phone during the bike ride, I press a button which records 1 minute of audio. If I know that I need more time, I press it again before the minute ends, this then starts a second recording in parallel which also lasts one minute. So I just have to press a button and can forget about it. This is because I noticed that I usually don't require more than one minute to record a thought, and if I have multiple, I can put them in multiple files.
But since my recordings then usually consist of 20 seconds of audio, the 30 seconds at the end are only silence (with wind and tire noise). Whisper splits the files into 30 second segments, and apparently tries to find voice in each segment, so the remaining one which has no voice causes Whisper issues, where it starts hallucinating. This is why I would like to trim the files.
I now noticed that the service doesn't add punctuation and capitalization, so the funny thing is that I took that output and posted it into ChatGPT like this: "Correct the following: '[text from whisper]'", and it does an incredible job of fixing even words which Whisper erred on.
-
Whisper:
ich habe gestern erste tests mit open ai whisper gemacht um nozizen [sic!] zu transkribieren
[ Yesterday I did my first tests with open ai whisker to transcribe nozizen [sic!]. ]
es waren teilweise recht gute ergebnisse vor allem mit medium
[ there were some really good results, especially with medium ]
latsch [sic!] natürlich besser aber da sind die anforderungen zu hoch
[ latsch [sic!] better of course, but the demands are too high ]
wenn ich da einen server draus mache könnte ich mal eine zeit lang ausprobieren ob sich das lohnt
[ If I make a server out of it I could try it out for a while to see if it's worth it ]
auch für anrufe der anruf der antworten
[ also for calls the call of the answers ]
-
then ChatGPT:
Ich habe gestern erste Tests mit OpenAI's "Whisper" gemacht, um Notizen zu transkribieren. Die Ergebnisse waren teilweise recht gut, vor allem mit "Medium". "Large" funktioniert natürlich besser, aber die Anforderungen sind zu hoch. Wenn ich einen Server dafür bereitstelle, könnte ich mal für eine Zeit lang ausprobieren, ob sich das lohnt, auch für Anrufe und Antworten.
[ Yesterday I made first tests with OpenAI's "Whisper" to transcribe notes. The results were sometimes quite good, especially with "Medium". "Large" works better, of course, but the requirements are too high. If I provide a server for it, I could try it out for a while to see if it's worth it, also for calls and answers. ]
I'm sorry that this is in German, but I don't have anything in English I've been testing on.