This is a small project i made years ago and updated to whisper last year, i still use it from time to time and thought it might be useful to others, or just put the idea out there for someone better than me to make a better implementation!
1. In a desktop web browser, visit a YouTube video with captions, which is almost all of them
2. Click the video description to expand it
3. Scroll down and click the tiny "Show Transcript" button near the bottom (whoever decided to bury it down here was very misguided)
4. Ctrl-F and search any word. Occurrences in the transcript will be highlighted and you can press enter to scroll the transcript to the next one. Click the transcript to seek the video.
I see that this extension shows occurrences on the seek bar which is cool. There is also a slight problem with regular ctrl-F: if you search for a multiple word phrase you might not find it if the phrase happened to be split between two chunks of the transcript. So that could be better in this extension. And of course not every YouTube video has captions, but most do these days.
> visit a YouTube video with captions, which is almost all of them
Depending on what you're watching, you might never come across a video with good subtitles but rather Youtube's auto-generated subtitles.
Whisper can do a better job in a lot of cases, but not all... I wonder if they've had multiple generations of auto-captioning and not gone back and redone the ones that were done earlier.
This extension is really interesting to me because in the past I've tried (and failed) to make a similar one that adds a new .vtt to the list of available subtitles for the video. I sometimes struggle with auditory processing, especially in a noisy environment, and following along with subtitles helps me out immensely, so it's frustrating when the auto-generated subtitles are poor quality. I've bookmarked the extension to see if I can fork it for that purpose in the future.
> Depending on what you're watching, you might never come across a video with good subtitles but rather Youtube's auto-generated subtitles.
Even with very little correspondence to the actual dialogue, if you already know what you're looking for, you can probably find it pretty easily in the auto-generated subtitles.
you are correct, originally youtube didnt have this when i made it in 2019 with deepspeech, now they do but i just always preferred the idea of it being on the timebar to just click and go right to it. tbh i should just make a simple addon to take the youtube timestamps and slap it onto the timebar. also for the split chunks this would have no problem there as the transcript is actually stored in a json file, so any concurrent words will always be matchable for phrases. ofc downside being you need to run the model lol
If that also saved a copy of the transcript, with meta data added (title, channel, url, smilar vids) - as a text file on my local machine - I would actually use this as well.
Wait, is this using a cloud service in some way or is it all local / total private? That would be a deal breaker or maker.
Oh might as well copy a screen shot of the thumbnail and save it.
I built an extension that injected a search bar into the transcript card. Worked by filtering the YouTube transcripts themselves, and manipulating their display attribute.
Didn't release it to the store because YouTube released a search feature and it looked exactly like mine.
would you prefer if the timestamp was hidden since it takes up a bigass portion of the screen or that being an option to hide it in the extension settings?
I think the timestamp is OK, my biggest complaint is the huge amount of whitespace between the rows and the small size of the box. If I designed YouTube I would put the transcript on the left side above the video description, with a button that expands it to full height so there's no separate scrollbar for the transcript anymore, it's just all directly in the page.
BTW when I went to look at a video just now, YouTube actually served me a "Search in Video" box at the top of the transcript. So I guess the feature exists, they just haven't rolled it out to everyone yet.
You can find a button for the transcript in the description (or the three dot menu near the dislike button if it's still serving you the older interface). You have to open the transcript first, then Ctrl+f
If we had an extension to skip all the filler garbage in YT videos, I would be ecstatic. Maybe that's doable now? YT captions -> identify fluff timestamps via a browser LLM -> insert segments onto the video timeline, which automatically skip, a la SponsorBlock.
We could slash through Youtubers repeating themselves, making hack jokes, narrating their video title & outline, vapid explanations of common knowledge, etc. Any of which can be customized to your taste via a system prompt!
This kinda semantic filter would actually be an immensely powerful UI tool for all webpages and media, now that I think about it...
Do check the settings too, SponsorBlock is best known for skipping sponsored segments but it also has markers for things like intros, previews, self-promotion, and filler jokes/skits which aren't skipped by default but can be if you want them to.
Developed https://www.videototextai.com/ exactly for this reason as it was quite impossible to search videos otherwise. Also you can copy the transcript into a LLM and ask questions from video content like that.
yeah I remember the whole transcript youtube coming out a yearish after i made the first version of this in 2019, but i still perfer the timebar highlighting, but thats just a preference thing
The model.pth is a custom LSTM for detecting phonetic similarity, as long as you're running it from the pythons folder ( I didn't manage file location very well) it should work.
ahh never really looked cause i built my original one in 2019 off of Deepspeech haha just updated it for fun mostly. I know youtube captions themselves are good, but one thing on his code would be that not all videos have captions. Since mine actually downloads the audio and runs it, it would still have values on those older videos that never got captions
I searched for my YouTube username and then for the exact title of one video I posted and it didn't find either one.... instead it said the title of my video was not true because it didn't interpret it correctly (but it didn't link to the video).