I believe both, in most cases. When a video is uploaded to YouTube, YouTube runs the audio through a speech-to-text algorithm to generate the transcript for the video. And the generated text is also what's used for video captions, when they're turned on. When a search is performed via VideoMentions Search, the tool gets that auto-generated transcript and searches it to find keyword matches (along with the video's title and description). When a transcript is manually uploaded for a video, that is used instead.