Hacker News new | past | comments | ask | show | jobs | submit login

How are you sourcing the text for the videos? This search [1] grabs some results for my query, but it does miss this [2] video which contains the searched keyword multiple times, and the video's subtitles indicates as much.

[1] https://videomentions.com/search?channelUrl=https%253A%252F%...

[2] https://www.youtube.com/watch?v=3denP7wX2XU&t=296s




VideoMentions scrapes the video page markup and pulls out the "baseUrl" for the English caption track. It converts that XML caption track into JSON, then searches it for keyword matches. You're right that this particular search for "toxic" should find several spoken word matches, but it doesn't. It seems like the tool isn't able to access the captions data for that video for some reason. I made a note of this bug, and I'll look into fixing it. Thanks for pointing it out, and for checking out VideoMentions Search!


yt-dlp [1] has command-line options to download only the captions of a video, in available languages, if you want to skip the scraping for the link.

I built something similar [2] for a slightly different use case. I wanted to be able to search through all Ram Dass talks in the 'Here and Now' podcast series on YT. I'm obviously not as skilled at CSS. :) And the display of timestamps is still a bit shaky, but for me it fulfills its purpose.

Since I'm able to preload all caption files ahead of time, I'm just using pcregrep for the search which does a pretty good job.

[1] https://github.com/yt-dlp/yt-dlp [2] https://ramdass-search.net




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: