Hacker News new | past | comments | ask | show | jobs | submit login

VideoMentions scrapes the video page markup and pulls out the "baseUrl" for the English caption track. It converts that XML caption track into JSON, then searches it for keyword matches. You're right that this particular search for "toxic" should find several spoken word matches, but it doesn't. It seems like the tool isn't able to access the captions data for that video for some reason. I made a note of this bug, and I'll look into fixing it. Thanks for pointing it out, and for checking out VideoMentions Search!



yt-dlp [1] has command-line options to download only the captions of a video, in available languages, if you want to skip the scraping for the link.

I built something similar [2] for a slightly different use case. I wanted to be able to search through all Ram Dass talks in the 'Here and Now' podcast series on YT. I'm obviously not as skilled at CSS. :) And the display of timestamps is still a bit shaky, but for me it fulfills its purpose.

Since I'm able to preload all caption files ahead of time, I'm just using pcregrep for the search which does a pretty good job.

[1] https://github.com/yt-dlp/yt-dlp [2] https://ramdass-search.net




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: