Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: VideoMentions – Search YouTube based on the spoken words in videos (videomentions.com)
295 points by kellenmace on May 25, 2022 | hide | past | favorite | 132 comments



One of the consequences of the effectiveness of click-bait titles on youtube is that searching for videos you watched historically is extremely difficult, because quite often the primary thing in the video is never mentioned in the title. The descriptions are almost universally used to store common links that are about the channel not the video and the consequence of this is that click-bait titled content is very hard to find after the fact. The other problem is quite a lot of channels are testing one title and then changing to another, the video wont then be found using the initial title and so even if you do remember elements of the title the channel may have made that impossible to use some hours or days later as another title with better performance was used.

I can see a need for search in the contents of the videos but I don't want to specify the channel, I likely don't know especially since channels can also change their name.


Also, the youtube built in search often fails to match videos that have a word in the title for which you search for.


YouTube search is almost completely useless, I just use Google to find YouTube videos. It does seem to search in the video transcript and the comments (I'm not sure but it seems that way).


YouTube's search results will show you like 10 results of the thing you actually searched for, then resort to a section of completely random video suggestions unrelated to your search in hopes some thumbnail will draw your attention and suck you back in to wasting your time watching videos and being served ads.


I also suffered from this issue for a while but then discovered "Unhook" browser extension, which in addition to many other amazing features (e.g. hide recommended videos/comments) has a feature to hide irrelevant search results. With the feature on I get an endless list of matching results and no junk.


I can't even find a specific channel when I spell it correctly, even with double quotes. I have to go into the filters and specify channels-only, and even then it puts a bunch of wrong results in front of it. Their search on YouTube used to be better, showing results for what was actually searched for. But now it seems like they're trying to drive views to high-performing videos since they now give a single page of results for any query before backfilling the results with a bunch of suggestions that have nothing to do with the search


Up until a few days ago I had a redirect for search that fisabled the suggestions on the search page. Now, if I do that, the search only gives 1, irrelevant, result.


You should be able to just go directly to the channel using the URL if you know the channel name. youtube.com/c/<CHANNEL_NAME>. What on earth they use the distinction for, I don't know, but if it's a user with videos instead of a channel, then youtube.com/user/<USER_NAME>.


Youtube is clearly blacklisting certain channels and they can't be found via even precise searches.


Unreasonable of you to expect search to be good in a Google product.


lol- my thoughts exactly. How can Google search be so good, while YouTube search is so abysmal? Google bought YouTube in 2006, so they've had 16 years to get it right :D


I created a project exactly as you described: https://github.com/victor141516/YourArch


people also edit the titles of videos from time to time... so even the exact title doesn't help


Aye especially in modern day YouTube

Once you're big enough your stats start being useful, folks in the know (mrbeast's analytics guy, I can't think of more rn, etc) advise you rotate out the thumb/titles until the click through rate is decent


Hey HN! I just launched VideoMentions Search, a free tool that allows you to search YouTube to find videos that contain specific spoken words.

Here's how to quickly try it out -

Let's say you want to find every video on The Verge's YouTube channel within the last several months in which "MacBook Pro" is mentioned. Here's a pre-populated search to accomplish that:

https://videomentions.com/search?channelUrl=https%253A%252F%...

If you follow that ^ link, then click the button to perform the search, you'll see all matching videos, with every single mention of "MacBook Pro" highlighted.

My favorite part is that all the highlighted matches are timestamped. So you can click on any of them to jump to that exact moment in the video when that keyword was said!

VideoMentions Search is great for these scenarios:

- You want to find all videos within a YouTube channel that mention your brand, your product, or the topics you care about.

- You remember watching a video where a certain topic was discussed, and now you're trying to remember which video it was to rewatch it.

- You run your own YouTube channel and want to quickly find the exact moments in your videos where you cover certain topics, so you can link others to that content.

I have wanted a way to search YouTube based on spoken words for a while. I couldn't find a tool that provides that capability though, so I built it!

I hope you find VideoMentions Search useful! I'd love to hear any feedback you have on it. Please let me know!


How are you sourcing the text for the videos? This search [1] grabs some results for my query, but it does miss this [2] video which contains the searched keyword multiple times, and the video's subtitles indicates as much.

[1] https://videomentions.com/search?channelUrl=https%253A%252F%...

[2] https://www.youtube.com/watch?v=3denP7wX2XU&t=296s


VideoMentions scrapes the video page markup and pulls out the "baseUrl" for the English caption track. It converts that XML caption track into JSON, then searches it for keyword matches. You're right that this particular search for "toxic" should find several spoken word matches, but it doesn't. It seems like the tool isn't able to access the captions data for that video for some reason. I made a note of this bug, and I'll look into fixing it. Thanks for pointing it out, and for checking out VideoMentions Search!


yt-dlp [1] has command-line options to download only the captions of a video, in available languages, if you want to skip the scraping for the link.

I built something similar [2] for a slightly different use case. I wanted to be able to search through all Ram Dass talks in the 'Here and Now' podcast series on YT. I'm obviously not as skilled at CSS. :) And the display of timestamps is still a bit shaky, but for me it fulfills its purpose.

Since I'm able to preload all caption files ahead of time, I'm just using pcregrep for the search which does a pretty good job.

[1] https://github.com/yt-dlp/yt-dlp [2] https://ramdass-search.net


> - You remember watching a video where a certain topic was discussed, and now you're trying to remember which video it was to rewatch it.

But I need to remember which channel I watched? Maybe I'm missing something, but in my eyes it would make it tremendously more useful if I didn't have to specify channel.


Yeah, YouTube doesn't provide a way to search all of YouTube based on the spoken words in videos.

I could update VideoMentions Search to allow users to select multiple channels, and then perform the search across all of those (maybe importing all the channels they're subscribed to could be handy... ). One way or another though, it would still require selecting specific channels to search within. That limitation notwithstanding, I still think it's a useful tool, though!

Thanks for checking it out!


I get why you wouldn’t be able to index all of YouTube, that’s a big ask.

However, I don’t use YouTube enough to mess w channels much. I’m usually searching on a particular topic.

For example, “drop ceiling panel replacement.”

Perhaps you could help users limit the channel scope by making an intelligent channel selection by keyword.

So I would put in “home improvement,” and you could choose some appropriate channels to search for my search terms.


Hey @bredren! Yeah, I agree that this makes for a nice user experience. This is how the paid VideoMentions.com service works- users can search for channels by name or keywords, without the need to paste in URLs. It looks like this: https://cloudup.com/cUgKqErcx8G

That auto-complete lookup requires spending a finite number of API calls, though, which is why it’s restricted to customers and not available on this freely accessible VideoMentions Search page.

Thanks for checking it out, and for your feedback!


Your pricing model is really solid. I can see how useful this is to people or companies with a brand.

I don’t have one of those, but I can also easily imagine wanting to find “that one video that I watched a while ago” in some topical “power tools” channel where they talked about “parallel battery” tech, without recalling the channel name. If I really wanted to find that again in order to buy the tool, it’s easily worth $5. And then I have a whole month to chase down other vaguely-recalled references!

Well done. I hope this post gets the attention it deserves. You have solved a problem that many people have, and have griped about. Companies and people who maintain brands need to know about it. And your free tier (which I assume is funded by the paid options) has already helped me find something I’d been after for a while. I want this to stick around.


Ah! Okay. Makes sense. Thanks for the reply, maybe I’ll try out the pro version.


They don't offer direct search, but isn't this what "key moments" is in search results? Try eg how to change a lightbulb.

I believe SeekToAction works even if the uploader didn't put chapters in. This was a relatively recent update, to make it fully automatic. So it's presumably doing some audio/video analysis to figure it out. All you need to do is tell Google how to seek your video (so it also works with non youtube videos too).

https://developers.google.com/search/blog/2021/07/new-way-ke...


Oh, cool! I hadn't heard of the Key Moments/SeekToAction feature before. I'll have to dig in and explore that a bit. Thanks for the tip!


Could you do a search based on their watch history if they have it enabled?


This is a cool idea! It wouldn't apply to people who want to search all the videos on a given channel for specific keywords (including those they haven't watched). I can see it being useful for folks trying to locate a specific video they remember watching in the past, though.

One consideration is that getting the user's watch history would likely require calls to the YouTube API. So that means I would have to make this a paid service in order to offset the code of those API requests. The beautiful thing about the current iteration is that it doesn't rely on YouTube's API at all. By scraping YouTube pages and leveraging a few NPM packages, I'm currently able to offer free and unrestricted access to it.

If enough people request that ability though, I'll consider incorporating it.

Thanks for checking out my project and for the great idea!


Sadly YouTube API doesn't allow you to retrieve user's history, so you need to scrape that aswell.

I created a exactly that project: https://github.com/victor141516/YourArch


If you're ambitious, I think a compelling rap genius style social media site could be built based on people commenting on specific segments of YouTube talks (and podcasts, whatever). Rap genius (now genius.com) seemed to be doing something really novel, but then....something happened. But I think the idea is rich with opportunity.

This project is doing a variation of it:

https://news.ycombinator.com/item?id=31527544

My vision of it would require some pretty strong UI skills, or maybe nowadays it's not so hard to accomplish impressive things.


That'd certainly be useful


Do you cache the channel, search term, and results for faster more efficient responses later?


Hey @hanniabu! I do client-side in-memory caching of videos data, only. No server-side caching. In fact, there is no database involved at all- the client-side app calls serverless function API endpoints to fetch the YouTube channel and video data it needs. Here are the tricks I'm using to make it fast:

- As soon as the "Channel URL" field loses focus, I start fetching the most recent 30 videos on that channel in the background. This way, by the time the user enters the keyword and date range, I've already fetched some (maybe even all!) of the data ahead of time, which means less wait time for them.

- Once a specific video's data (title, description, transcript, etc.) has been fetched once, it is saved in memory. All other searches the user performs from that point on will pull the video data from the in-memory cache, if it's there. Otherwise, it will fall back to fetching the video data over the network. This in-memory caching makes subsequent searches within the same date range (or a shorter date range) take <1 second.

- Network requests to fetch video data are processed concurrently rather than one at a time. So the browser fires off as many as it can in parallel to get them all resolved as quickly as possible.

- As soon as any matches are found, the UI updates to show the user. This way, the user can start scrolling through the matches and reviewing them while the search is still in progress– they don't have to wait until it finishes to start interacting with the matches.

Thanks for checking out my project!


>I do client-side in-memory caching of videos data, only.

This works only if the user does not navigate from and back to your website or refresh the page, but if they do, you make the same api calls all over again. You should set HTTP Cache-Control headers in your response from the server, so that the browser knows that it can serve that data from its cache and does not need to make those requests again. You would then probably not need the client-side in-memory cache at all.


I could do this, but it would introduce other cache invalidation issues. Take this sequence of events, for example:

* The user performs a search, reviews the matches, then closes the browser tab.

* The title or description for one or more of the videos on that YouTube channel are updated.

* 45 minutes later, the user remembers that they wanted to search for something else on that same YouTube channel. They return to VideoMentions Search to perform another search.

If I had sent cache-control headers in the first response telling the browser to cache the results of those HTTP requests, they would not be refetched for subsequent requests and the user would potentially be shown incorrect matches.

By fetching the data fresh whenever the user opens VideoMentions Search in a new browser tab, I guarantee that they're getting correct matches based on the titles, descriptions and spoken words that the videos have at the time the search is performed.

So as it's implemented currently, I'm choosing to take a performance hit in exchange for improved accuracy. And I'm comfortable with that tradeoff since searches seem decently speedy to me when I use the tool.

Thanks for your thoughts, and the alternative client-side caching idea!


I'm sure it would be, but then you're indexing whole youtube with respect to words spoken in each video. A thing that Google, arguably the best organization when it comes to indexing stuff, is working on.


Yes, my thoughts exactly. If Google decides that it's going to revamp YouTube search to include spoken word/transcription matches, that would make VideoMentions Search irrelevant– and I'd be okay with that!

In the meantime, I think this is a useful free tool for quickly finding spoken word matches within specific channels.

Thanks for checking it out!


I tried on this channel which is in Spanish [1] and it returned very fast with no results.

Then tried on GeoWizard channel (English) and it worked great.

There is a somewhat related tool called Youglish that works on subtitles, it's great for checking the pronunciation or usage of a word or expression in many languages, but it's based on a curated list of channels with known good subtitles. I thought yours could be a great complement to this as it works directly on the audio.

[1] https://www.youtube.com/c/Tercosmicqueen


Hey @jobigoud! Yeah, currently VideoMentions only uses the auto-generated English video transcriptions when searching for matches. If there was enough demand, it could be expanded to support other languages. Doing so would add complexity and would make searches take much longer, though. I'd probably need to add more UI fields to allow users to select the languages to target. Could be done, though!

Yes, youglish.com is a neat tool! I stumbled upon it when looking around for a way to search YouTube videos based on spoken words.

Thanks for checking it out! :)


Congratulations on the launch!

VideoMentions is addressing the need-gap: 'Searching YouTube videos with transcript'[1] posted on my problem validation forum.

I appreciate that rather than just building a browser add-on to search individual video, You've built a business use case for brand monitoring. I hope you get profitable enough to afford the compute necessary to monitor top channels without having to explicitly mention them.

You're welcomed to explain how it addresses that problem at needgap, So those who need it can find it easily.

[1]: https://needgap.com/problems/88-searching-youtube-videos-wit...


This looks super cool! There's a need for tools to make the millions of videos on the web actually useful for people like me who would much rather skim search and read than watch. Bravo to you for making a stab at it!

I'm guessing you need a specific channel as a way of getting a list of videos, then pulling the captioning, then searching? (That's the only reason I could think of for the channel restriction, which unfortunately removes 90% of the utility)


Hey @loxias! Yes, you're exactly right - VideoMentions scrapes the video page markup and pulls out the "baseUrl" for the English caption track. It converts that XML caption track into JSON, then searches it for keyword matches.

YouTube doesn't provide a way to search all of YouTube based on spoken words, unfortunately. Still, I think this is a useful tool for quickly locating spoken word matches within a certain channel.

Some Hacker News commenters have suggested that I could auto-import all the channels the user is subscribed to, or get videos from their watch history, which are interesting ideas. For this first iteration, I kept things simple with a single "Channel URL" input.

Thanks for checking out my project, and for your feedback!


> Still, I think this is a useful tool for quickly locating spoken word matches within a certain channel.

Absolutely, and I agree, however,.... it necessitates that the user use or understand "channels". When I went to your site my first thought was "what the hell's a channel?" ;-)

But yeah, for whatever percentage of youtube users use channels, I see this having a lot of use. My totally unsolicited advice is that you'll have to find some way to remove the channel requirement. Perhaps suggesting channels based on the search terms. (I've seen your other comments about adding features to a paid version)

I haven't yet managed to get VideoMentions to actually work, but that's fine, I assume you're getting hugged to death. :D Shipping anything is hard, and congrats for that.


"I assume you're getting hugged to death" -- lol

Yeah, maybe I'll consider having two versions of search - one version for paid customers that allows you to search for YouTube channels by name/keyword and allows you to search across all the channels you're subscribed to or across all the videos in your watch history, and a second version that's free and similar to the current iteration.

VideoMentions Search should work just fine for you - there is no server and no database behind it. When searches are performed, serverless functions are called that source data from YouTube, then return a response to the client. So it should scale just fine.

Can you please visit this link, click the button to perform a search and let me know if you see results pop up? https://videomentions.com/search?channelUrl=https%253A%252F%...

I'd love to know if you don't, or if you see any errors in the browser console. Thanks again for your thoughts on it!


(firstly, that search works fine and returns very quickly! Now that I know it's client side, I've been able to do a few searches of my own to see how it works. It works well! The ones I did earlier were looking at a large channel "PBS Newshour" for various news topics over "All time")

(secondly, DO consider putting your email address on your HN profile!)

Whoa! Client side!?! Far out man, far out. To be honest, I was a little unimpressed before -- but didn't feel like "your idea is _brilliant_, but implementation leaves a lot to be desired. makes me want to try my hand at writing my own." would be a constructive comment. ;-)

But, knowing this is all done client side?? I tip my hat, that's *clever*! The whole site could be a static and local set of files? I had no idea things like this could even be done without a server or other real program to do the work. (though now that I think about it, I have an idea how, duh. YouTube exposes a JS API I bet, so you can have the client call it each time, do the searching work, &c.)

That avoids scalability issues and probably legal ones as well! What a game changer...

Was there an existing framework used for building client side web apps like this, or did you roll your own?

While I've got you, and I admit that graphical UIs are ... an area where my opinion should carry no weight (I'm the sort who.. doesn't use icons, and when writing personal tools, the "user interface" is positional parameters to a command line program ;-)), here are some suggestions:

* The UI, while clean and unbusy (yay!!), feels BLOATED with white space. Why do the results occupy this tiny narrow column, forcing me to scroll way more than I should need to?

* In addition to the narrow column, the entries IN the column take up too much space, the UI could be arranged to be much more efficient.

* Since search results are within a channel, don't put the channel name after each entry, there's no point.

* Might be nice to have the time offset visible to the left of the text excerpt. I can see it on mouseover, but still. Might be nice to see.

* Can the videos be made to play inline, without redirecting to youtube, then navigating back, and redoing the search?

* Perhaps, after searching for a result, we could see a graph at the top showing all the videos containing that term over time, and easily click on it to go directly there.


Very cool! Well done!

I've not got a legit use for it right now but the few example searches I thought up returned the videos I was thinking of

Nice work!


Cool! Thank you!

Here are some example use-cases that I included in another comment, in case any of them are relevant to you:

1. You want to find all videos within a YouTube channel that mention your brand, your product, or the topics you care about.

2. You remember watching a video where a certain topic was discussed, and now you're trying to remember which video it was to rewatch it.

3. You run your own YouTube channel and want to quickly find the exact moments in your videos where you cover certain topics, so you can link others to that content.

Maybe #2 happens occasionally? If so, maybe you can bookmark VideoMentions Search for just such an occasion :)

In any event, thanks for checking out my project, and for the kind words!


Site's definitely bookmarked in my toolbox for sure! I imagine #2 will be my primary use case personally -- If not for me, to answer "Does anyone know the video where.."

e: thinking about it I know a few people who pull highlights out of their older content (sort of #3). Shared with them :)


They loved it if you see any stats along the line of Scotland/UK + Discord referral haha


Are the video transcripts searchable?


Yes, this is how VideoMentions Search works. It scrapes the video page markup and pulls out the "baseUrl" for the English caption track. It converts that XML caption track into JSON, then searches it for keyword matches. Is that what you're asking about?

If you want to search within the transcript of a single video, you can accomplish that with these steps: https://kb.swtc.edu/page.php?id=90230


Ah got it. I thought there would be some API where the transcripts could be searched across videos. Maybe that requires way to many resources for Google to index


Yeah, Google is of course the king of search, so they could certainly decide to revamp YouTube search to include spoken word/transcription matches. They have all the data required to make that happen. That would make VideoMentions Search irrelevant– and I'd be okay with that!

In the meantime, I think this is a useful tool for quickly locating videos based on spoken words.

Thanks for checking out my project!


Are we sure YouTube doesn't already do this? Often I have searched for a video while only remembering a certain phrase or even a certain comment left under it, and YouTube was able to actually find it


Hey @HidyBush! No, YouTube does not reliably include spoken word/transcript matches in search results. I've run tests where I open up the transcript of a video using these steps (https://kb.swtc.edu/page.php?id=90230), copy a few of the words, then perform a search using those words across that channel, and the video I copied them from doesn't appear on the list of results.

That's why VideoMentions Search exists. It provides a way to search the videos within a YouTube channel to reliably find spoken word matches (and it includes matches in the title & description, too).

Thanks for checking out my project! Please let me know if I can answer any questions about it.


They do... contents of speech are one of multiple input data to youtube indexing.

Also you can search individual videos by showing transcript and pressing ctrl-f


I have come to the opposite conclusion - YouTube does not reliably include spoken word/transcript matches in search results.

As I wrote in another comment, I've run tests where I open up the transcript of a video using these steps (https://kb.swtc.edu/page.php?id=90230), copy a few of the words, then perform a search using those words across that channel, and the video I copied the words from doesn't appear on the list of results.

If Google decides that it's going to revamp YouTube search to include spoken word/transcription matches, that would make VideoMentions Search irrelevant– and I'd be okay with that!

In the meantime, I think this is a useful free tool for quickly finding spoken word matches within specific channels.

Thanks for checking it out!


I didn't know this...how do you show transcript?


Hey @tomatowurst! You can search within the spoken words/transcript of a single YouTube video by following these steps: https://kb.swtc.edu/page.php?id=90230


This is so cool I had no idea this was possible I would often seek a particular segment manually. Thanks!


Welcome! :)


Very interesting tool could see myself using it a lot.

Couldn't help but search the guy that uploaded 2 million vids to yt, sorry if it breaks anything.

It would be cool to search for a specific quote like "buddy of mine" (frequently said by Joe Rogan). Also searching for parts of a word (similar to regex options?) might be difficult to implement but would be super useful as an option i.e if i searched for any word that starts with yc with a correct pattern (or given option on the UI) i could find results for both "yc" and "ycombinator".


Hey @wolongong942! Nice- I'm glad you're finding it useful!

VideoMentions Search is built entirely using serverless technologies, so it should scale really well. There's no single server or database to act as a bottleneck. When you perform a search, your browser fires off a number of network requests to serverless function API endpoints that fetch the data from YouTube, then return a response. So if you do an "All time" search on a channel with 2 million videos, your laptop fan may kick on while your browser works hard firing off thousand and thousands of requests until it's fetched all 2 million videos, you run out of memory, or you manually hit the "Cancel" button - whichever comes first :)

Your regex idea is interesting, and I'll consider implementing some more complex rules like that if enough people request it. For now, my answer would be to either perform separate searches (like one for "yc" and another for "ycombinator"), or perform one search, then use ctrl/cmd+f to search within the matches displayed on the page.

Thanks for checking out my project, and for the great feedback!


This is such a boss response. Good inspiration to incorporate attacker attacking into systems I build going forward.

"Heh I might've just bricked you" -> "I don't think so, but you might've bricked yourself."

Next step- have the browser send a single request per video, directly to YouTube. Actually, make it 2 or 3 just for redundancy. Then do the caption processing in-browser. Or better yet, distribute an ~electron~ app that runs this locally in the background with a separate process to autorespawn if quit.

Separately and seriously, you should triple your pricing for this.


Hey @silax! Good thoughts. Some requests can't be sent from the browser directly to YouTube due to the Content Security Policy directives that YouTube has in place, though.

For example, if you try to run this code to fetch a YouTube video page in the browser console from any non-YouTube site (like Hacker News or VideoMentions), you'll see the it errors out:

(async () => { const response = await fetch('https://www.youtube.com/watch?v=irjc1nJ1eJs'); console.log(response); })();

That's why my app uses serverless API endpoints as a middleman. It works like this:

1. Browser fires off a request to the API endpoint.

2. The serverless function Node.js process spins up, fetches the data from YouTube, returns the response, then spins back down.

3. My app takes the video data in the response, saves it in memory, searches it for matches, and re-renders the UI to show the matches, if any.

I built my app to do as much work as possible client-side, and to use serverless function API endpoints for anything that can't be done in-browser.

Thanks for checking out my project!


What're the (approximate) costs for monthly usage? (Or if you don't know yet: what're you budgeting for?)


Hey @messe! I built VideoMentions Search with a combination of custom code and a couple NPM packages that are able to scrape publicly accessible YouTube pages to get the channel and videos data the app relies on.

So because it doesn't use the YouTube API at all, the total monthly cost is $0 :)

That's why I'm able to offer free, unrestricted access to the tool and allow folks to perform unlimited searches.

Thanks for checking out my project! I hope you find it useful!


Hi, Great job. I was almost about to build such a thing before I realized the crazy amount of crawling and transcriptions that I have to index.

How exactly did you solve/approach the problem?

1. How did you crawl across those millions of videos from the platform?

2. How are you indexing stuff like that


Based purely on the speed of the results, I believe that the crawling is happening in real time.

The search is scoped by channel, so the closed-caption files for all the videos in the channel are downloaded and searched for on the fly.

Edit: Wow, thanks to dev tools, I can see that the website is downloading the transcript and metadata for all the videos from the channel to the client. So the search is happening client-side!!


Hey @alex_smart- Yep! You're exactly right. My app performs as much of the fetching and searching work as possible on the client, then calls serverless function API endpoints for the few things that can't be done in-browser.

I replied to @lewisjoe's comment above this one with some more details about tricks I'm employing to make searches fast.

Thanks for checking out my project!


Hey @lewisjoe! YouTube doesn't provide a way to search across all of YouTube based on transcriptions/spoken words. My app performs on-the-fly searches, as @alex_smart noticed in another comment.

My app does as much of the fetching and match finding work that it can client-side. For the few things that can't be done client-side, my app calls serverless function API endpoints to fetch the YouTube channel and video data it needs. Here are the tricks I'm using to make it fast:

- As soon as the "Channel URL" field loses focus, I start fetching the most recent 30 videos on that channel in the background. This way, by the time the user enters the keyword and date range, I've already fetched some (maybe even all!) of the data ahead of time, which means less wait time for them.

- Once a specific video's data (title, description, transcript, etc.) has been fetched once, it is saved in memory. All other searches the user performs from that point on will pull the video data from the in-memory cache, if it's there. Otherwise, it will fall back to fetching the video data over the network. This in-memory caching makes subsequent searches within the same date range (or a shorter date range) take <1 second.

- Network requests to fetch video data are processed concurrently rather than one at a time. So the browser fires off as many as it can in parallel to get them all resolved as quickly as possible.

- As soon as any matches are found, the UI updates to show the user. This way, the user can start scrolling through the matches and reviewing them while the search is still in progress– they don't have to wait until it finishes to start interacting with the matches.

Thanks for checking out my project, and for the kind words! I appreciate it.


I built something similar to this when I was first learning to program. Except mine lets you perform a one-time search for videos containing keywords, similar to how you would normally search YouTube. So there are no notifications or anything.

https://phrasefinder.net

I don’t know if it even works anymore and I’m sure the code is atrocious. But I remember that I would just scrape YouTube pages for video IDs, and then use an API that returned video captions for a given ID [1]. I could see how OP would do something similar.

[1] https://pypi.org/project/youtube-transcript-api/


Nice job on PhraseFinder, @SteveDR! I am indeed doing something similar for my app. The user enters the YouTube Channel URL, the keywords and the date range, then I perform an on-the-fly search to find the matches.

I replied to @lewisjoe's comment above this one with some more details about tricks I'm employing to make searches fast.

Thanks for checking out my project!


Its only for a single channel. That's the answer, it queries in real time.


Yep! You nailed it. My app performs an on-the-fly search on the client. I replied to @lewisjoe's comment above this one with some more details about tricks I'm employing to make searches fast.

Thanks for checking out my project!


Bump. Would like the lowdown too!


Hey @jonplackett! as @alex_smart noticed in another comment, I am performing on-the-fly searches on the client.

Some requests can't be sent from the browser directly to YouTube due to the Content Security Policy directives that YouTube has in place, though.

For example, if you try to run this code to fetch a YouTube video page in the browser console from any non-YouTube site (like Hacker News or VideoMentions), you'll see the it errors out:

(async () => { const response = await fetch('https://www.youtube.com/watch?v=irjc1nJ1eJs'); console.log(response); })();

That's why my app uses serverless API endpoints as a middleman. It works like this:

1. Browser fires off a request to the API endpoint.

2. The serverless function Node.js process spins up, fetches the data from YouTube, returns the response, then spins back down.

3. My app takes the video data in the response, saves it in memory, searches it for matches, and re-renders the UI to show the matches, if any.

I built my app to do as much work as possible client-side, and to use serverless function API endpoints for anything that can't be done in-browser.

I replied to @lewisjoe's comment above this one with some more details about tricks I'm employing to make searches fast.

Thanks for checking out my project!


This is amazing.

I just tested it against a smallish British channel with a video that I wanted to see again, and couldn’t remember which one of the 50-odd videos it was. It did not carch the full quote directly, because YT read it as “Marvin nature” instead of “Mother Nature,” a consequence of Alfie’s accent. But my search for “Mother Nature” picked up another reference close enough to it to show the text. Sort of unlucky with the miss and very lucky with the proximity to the hit.

I drew a blank regarding the other channels whose videos I want to revisit, but I know I will think of them later because there are so many. This is extremely useful. I’ve already bookmarked the site.


Hey @filoeleven! Great- glad to hear you're finding it useful!

Thanks for checking out my project! :)


Nice! My friends and I had this idea for a hackathon and we won! We skipped lectures and we needed a way to fast forward to relevant parts of a lecture


Cool @happy_pancake! Nice use-case! Is your project still online somewhere?


This is pretty amazing! Thank you for making it.

I'm not sure if this would be against Google's terms so you might want to check:

If you started indexing such that I could do a search and it would come back with any indexed content I think you would have invented a new search engine. Seems extremely useful.

I already use YouTube this way today but as you pointed out searching by title can be tricky.

Search engines supported by ads are also typically quite profitable.


I wanted this just yesterday! I was searching several channels to recall how many bolts are holding the battery for the electric F150. Sure enough, Top Gear told me how many bolts the F150 used for its battery pack!

Result:

SPOKEN WORDS "…kept things simple it recycled the same basic chassis and attach the battery which can be removed with just eight bolts from underneath and used carryover components wherever possible in fact…"


Nice! Perfect timing, then! Glad you're finding it useful! :)


This is a nice tool. I wish one didn't have to specify the channel, as some comments mentioned. If it's not possible to get the channel when providing the video url via an API, it is possible to get it from the video url's GET response data. The latter may be little slower, but might be worth it in terms of UX.


Hey @atentaten! YouTube doesn't provide a way to search across all of YouTube based on transcriptions/spoken words, unfortunately. That's why specifying a channel is required. Once the user provides a channel URL, then VideoMentions Search can perform an on-the-fly search of the videos on that channel to identify matches.

Thanks for checking out my project!


FWIW Google already shows videos by their transcripts if the video has automatic captions or CC enabled in the Google search results. Try searching for a very specific phrase/sentence in quotes and it's likely you'll get the video in Google Search result page.


I have found that this method is unreliable. If I run a Google search for "the "paradox of choice. You have so many choices, how do you know which one to pick? Even when I recommend to newer developers in the React", no YouTube video results are displayed, despite the fact that that exact string of text exists in the transcript of this video: https://www.youtube.com/watch?v=uQntFkK8Z54

That's why I built VideoMentions Search. It pulls in the transcript for every video within the channel and date range you selected and shows you any and all matches.

Also - even when a Google search does work to find a video based on spoken words, all you're given is a link to that video's page. You can't see all the times that keyword was mentioned within the video and click on one of those mentions to jump to that exact moment in the video, like you can with VideoMentions Search.

If Google/YouTube added those capabilities, then VideoMentions Search would be irrelevant. In the meantime though, I think it's a useful free tool for quickly finding spoken word matches within a specific channel.

Thanks for checking it out!


Great job!

How do you do search result ranking? Any signals that you use to decide “this is a high quality video from reputable channel and it’s relevant”? I would imagine # of followers / likes / comments / like-comment ratio or the likes are used?


Thank you! I don't do anything fancy at the moment. The matching videos are presented in reverse chronological order. So you'll see the most recent videos that contain the keywords at the top, then older video matches as you scroll down the page. And the user provides the URL of the single YouTube channel they want to search within.

Thanks for checking out my project!


How is this different from what Google currently does for in-video text search?


Currently, YouTube does not reliably include spoken word/transcript matches in search results. You would have to manually open each individual video in a browser tab, open the transcript, hit cmd/ctrl+f and search for the words you're interested in. And if you need to search tens/hundreds/thousands of videos, that would take a very long time.

VideoMentions Search allows you to specify the channel URL, keywords and date range, then does all that hard work for you, quickly locating all videos that contain those keywords.

I hope that's helpful. Thanks for checking out my project!


Thank you for responding. This looks like a very cool project.


Fantastic.

Somewhat related, friend use to had setup that automatically downloads YT subtitles, lowbit rate audio and video preview thumbnails to create searchable archive that doesn't use much space. Found it very useful for his studies but thumbnails wasn't great for slides.


I wanted to try it but for some reason it doesn't accept Lex Fridman's channel as a valid URL: https://www.youtube.com/c/lexfridman


I tried with some other channels. Looks like It's not accepting all channels as valid urls.


Hey @pks016 - if you recall a new channel URLs that didn't work, can you please reply with them? Then I can look into why and fix that bug.

Thanks for checking out my project!


Thanks for bringing this up, @atleta– I'll look into why that particular channel URL doesn't work.


Dear @kellenmace - This seems like a great service! I tried to sign up and check out the site but didnt notice any contact info. Could you contact me? Curious if you offer an API facade and/or enterprise subscription plans?


Hey @TurningNYC! Thanks for reaching out. Yes, I'd be happy to speak with you regarding API access and enterprise subscription plans. You can reach out to me at kellenmace at gmail dot com. Thanks! Talk to you soon.


This is neat, but I would like it to search against the entire archive of all youtube videos. I know you can't do that...but perhaps a Google employee making use of Google's Big Data offerings.


Pretty cool. I could see this as a useful automated tool for companies to monitor their brand, or competitors. Maybe a weekly report of channels that have mentioned specific keywords.


Thanks @ojiwan! Yeah, that's precisely what the paid offering on the VideoMentions.com homepage is. It currently emails the user once for each matching video, but in the future, I could certainly add support for grouping them and sending weekly digests, like you're describing.

Thanks for checking out my project!


I've been wanting to make a supercut of Civvie 11 mentioning John Carmack but have been putting it off. This just did half the work for me, amazing!


Nice! Glad to be of assistance :)


Nice! Is it possible to do it without fixing the channel?


No, they'd have to index all youtube videos then.


YouTube doesn't provide a way to search all of YouTube based on the spoken words in videos, unfortunately.

I could update VideoMentions Search to allow users to select multiple channels, and then perform the search across all of those. Like maybe maybe auto-importing all the channels they're subscribed to could be useful. One way or another though, it would still require selecting specific channels to search within. For this first iteration, I just kept things simple with a single channel URL input. Despite that limitation, I still think it's a useful tool, though; I plan to use it often myself.

Thanks for checking it out!


This field was a blocker for me as well. If the channel field was an async select that helped autocomplete lookup channel urls by channel name, this would be way more convenient.

My use case would be for ham radio. Lots of ham radio YouTubers film their QSOs (conversations) and mention the callsigns that they make contact with. I want to find the channels that mention my callsign and I'm sure lots of other hams would want to know. Anywho cool project. GL 73


I agree! This is how the paid VideoMentions.com service works- users can search for channels by name, without the need to paste in URLs. This auto-complete lookup requires spending a finite number of API calls, though, which is why it’s restricted to customers and not available on this freely accessible VideoMentions Search page. Thanks for checking it out!


Perhaps you could have channel owners register their channel if they want to be indexed. That would be super useful.


Hey @truly- this is a good thought, but I think it would be impossible to get buy-in from enough channels to make it useful. I could update the UI to support searching across multiple channels at once, though, or pull the channels from the user’s subscriptions or watches videos list. I’ll see if there’s enough demand for those features, then consider adding them.

Thanks for checking out my project!


I'm assuming it's using the API to download captions and scanning them, which is why it'd need the Channel. It would be so hard to know where to begin without it!

Potentially future updates could search a logged in user's history?


Yep, you're exactly right. The tool works by getting the channel's videos, then fetching the voice-to-text transcripts for them, then searching within the spoken words (along with the title and description) for any keyword matches. So there isn't a feasible way to do that across all of YouTube.

I like your idea of searching the logged-in user's history! That could be handy. Another thing I've thought about is auto-importing all the channels they're subscribed to so they can search within those.

For the first version of VideoMentions Search, I kept things simple with a single "Channel URL" input field, but using their history/subscriptions is totally doable. I'll see if there's enough demand for that.

In any event, I still think that this simple first version has utility. I like being able to quickly pinpoint all the moments when a certain topic was mentioned across all of a channel's videos.

Thanks so much for taking a look, and for your feedback!


Do you keep the transcripts around on the server? It shouldn't matter much in terms of storage unless the site becomes crazy popular, so you could offer a "best effort search" or something along the lines, that just searches everything you got so far, so the site would get better and better over time.


YouTube kills your API key if you do this and make the data available (eg via API).

You're allowed to cache responses for a bit but not store them long term. "How would they know", etc of course, but if you're distributing the data they'll figure it out. Some smart cookies over there at Google.

My small website managed to get on their radar and I didn't even post it to HN!


would you add support for regexes?

I would like to search for rich evans saying "aiiiiiiiiids" with varying word length but now i can only search for "aids"

https://videomentions.com/search?channelUrl=https%253A%252F%...


Hey @dmead! Yeah, I would consider adding support for regex/advanced filtering options if there was enough demand for it. For now, I’d recommend either separate searches or performing a search, the doing ctrl/cmd+f to search within the matches on the page.

Thanks for checking it out!


no problem. here is a thread of people checking it out.

https://www.reddit.com/r/RedLetterMedia/comments/uxoegh/ever...

it's considered funny when one of these movie reviewers calls something "aids" but it says it with a varying amount if "i"'sin the word. so it would be cool if we could find all the instances of rich evans being crass.


Really useful tool! I watch a lot of cycling content and people do reviews of random products in the middle of videos, this helps narrow it down!


s/reviews/paid promotions/


red letter media likes to talk about star trek

https://videomentions.com/search?channelUrl=https%253A%252F%...


Haha- looks like it :)


This is amazing. I actually used yt-dlp to download all subtitles and search in them a few times already.

Thanks a lot!


Hey @baxuz! Nice- I’m glad you like it! I hope it continues to come in handy.

Thanks for checking out my project!


make a filter for mentions vs description - many creators are cross linking other videos for users to checkout which are coming up in results

Very useful website - thank you for the service


Is this based on acoustic keyword recognition or SRT search?


I believe both, in most cases. When a video is uploaded to YouTube, YouTube runs the audio through a speech-to-text algorithm to generate the transcript for the video. And the generated text is also what's used for video captions, when they're turned on. When a search is performed via VideoMentions Search, the tool gets that auto-generated transcript and searches it to find keyword matches (along with the video's title and description). When a transcript is manually uploaded for a video, that is used instead.

Thanks for checking out my project!


Nothing of note to say other than this is really cool!


Thanks @nop_slide! Glad you dig it! Thanks for trying it out! :)


Very cool, seems useful.


Thanks! Glad you think so.

Thanks for checking it out!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: