Hacker News new | past | comments | ask | show | jobs | submit login
Youtube2Webpage: Create Websites with Text from Videos (github.com/obra)
201 points by _Microft 11 months ago | hide | past | favorite | 65 comments



It would be really neat if after pulling the captions, an LLM was used to reword the content into an idiomatic "blogpost" (since speech is typically different than writing). Using LLMs, we could even choose the level of summarization and the output tone!

As someone who strongly prefers reading to watching instructional videos, I'd pay for this service :)


I made something sorta like that specific to recipe videos. Basically converts recipe into an idiomatic format (inlines ingredients, detects and renders timers) and links each step in the recipe to its timestamp in the video for easy indexing while you're busy in the kitchen. (I spent too much time trying to scrub to that one spot where "how it's supposed to look" is shown while busy making it look that way)

See example: https://rexipie.com/watch?v=JiJXdoTjw8M

Just s/youtube/rexipie/ in any recipe video URL.

(full disclosure the step/transcript linking is paid-only as it requires a GPT-4 call, everything else is available to demo on free tier)


That's really cool! For comparison, this is the recipe written by the same guy: https://www.allrecipes.com/toy-box-tomato-ricotta-cheese-tor...

I've gotta say, your website might be easier to use during cooking, since it provides the information in-line (especially serving sizes etc.)!


Cool website. Much better than the SEO spam I came across earlier this week when I did a websearch for "pear qwerty horse" after seeing it in the tags under a binging with babish video.

Love the timers and jumping to sections of the video. Though, the second video I tried viewing didn't have linked steps.


The hyperlink "Food Wishes" at the top of the page is broken. It'd be nice too if there was a way on that page to request a new recipe (via video ID or whatever).


This is neat! Do you plan to support videos containing multiple recipes?


As a way too make that easier, maybe it would be nice to support a user-specified set of timestamps? Say recipe A: 0:00-7:46, recipe B: 7:47-15:33 and so on


I regularly use https://www.summarize.tech/ for this purpose.

Not exactly a blog post format, but it must've saved me a hundred hours, no joke!


You can absolutely already do what you describe with GPT-4 plugins (Plus membership required). Using VoxScript and Video Insight :

https://chat.openai.com/share/229e3ac8-3924-48e4-abd5-35bcb2...


Except for the complaining GPT will do, and some censorship based on the whims of its' programming group. No thanks; I'll stick to scripts, where the video dictates the content.


Just use something like llama2-uncensored then, they're on ollama.


It seems that many of "my script can do [something] with [information in a different form]" can be superseded by LLVMs already or in the near future and the quality is way better than what the scripts are capable of.

I just wonder what the price of this is. I can run most of these scripts on an old laptop. But for the LLVM I need a pricy an beefy computer or (even worse) a paid subscription to a big tech's service.


Honestly, I like the GPT 3.5 version I posted here way better.


There you go, friend:

Step right up, folks! Gather around and feast your eyes on the magnificent creatures before us – the elephants! Now, what makes these majestic beings so fascinating, you ask? Well, let me tell you – it's all about their incredibly, unbelievably long... um, trunks! Yes, you heard that right. These gentle giants sport trunks that seem to stretch on for ages, and let me tell you, it's nothing short of impressive. So, as we stand here in awe of these marvelous creatures, remember, it's the little (or should I say, not-so-little) things like their remarkable trunks that make them truly stand out. And that, my friends, wraps up the lowdown on our pachyderm pals – fascinating trunks and all!


I've been working on just such a tool [1] to help me digest podcasts and senate hearings.

[1] https://github.com/the-crypt-keeper/tldw


Did you consider directly taking the subs from yt?


>since speech is typically different than writing

Is a scripted video significantly different to a written blogpost? It might be a symptom of the type of YT videos I watch, but most of them seem to be essay-style "intro/thesis/points 1, 2, 3/counterpoint/conclusion", and the only thing that hints at speech is the umming-and-arring of the presenter.


It is to me...an example from a CNN transcript:

"Former chief-of-staff, Mark Meadows asking a federal judge to put his surrender on hold, while deciding whether to move his trial to federal court, and former DOJ official Jeffrey Clark, seeking the same, making a pretty remarkable argument in his filing."

That's someone doing a sort of play-by-play explanation of what viewers are seeing in a video. Compare to a purposefully written story:

"A federal judge in Georgia rejected a request by former White House chief of staff Mark Meadows to postpone his surrender and arrest in Fulton County, Georgia, as an attempt to move the case to federal court is litigated, according to a court order issued Wednesday."

It seems like there could be some value in an LLM that would rewrite the first into something more like the second.


Or at least chunk transcription onto logical groups.

Chunking on the example webpage[1] is poor.

[1] https://obra.github.io/Youtube2Webpage/example/


can I self-promote here? we are not doing exactly the same but we are transcribing videos ourselves (no auto YT captions) If you want to read a high quality transcript & summarize videos, you can do that at https://alphy.app


This is a blank page on iOS Chrome.


Constantly my brain gets a question: Can I search specific youtube video captions?

Ok, this may be an answer... but is there an online service that given YT URL would spit captions out for me? Or maybe a browser extension?

Maybe even youtube has a hidden link somwhere where I could see all the text?

This submission triggered me for reasearch and found this gem: https://filmot.com/

The guy who created it: https://www.reddit.com/r/linguistics/comments/oo8xbd/search_...


On the YouTube website, if you click the "・・・" button next to share/clip/save, there is a "Show transcript" option and you can use your browsers in-page search to search in it


that's for specific vid, the site shared above lets you search Indexing over 760 million captions across 687 million videos and 52 million channels.


There’s a website designed for language learning from watching YouTube captions with inline translations and dictionary lookup. It also has support for searching videos by subtitle content. But it has a limited index and isn’t free for all features. I thought its source was available but I can’t find it now… https://languageplayer.io/


Show HN: YouTube Full Text Search – Search all of a channel from the commandline

https://news.ycombinator.com/item?id=36009774


Searching transcripts is really something YouTube itself should be doing as part of just regular search (and fed into Google search too). I have a feeling the regular search already does it to some extent, as the system presumably is tagging videos based on its caption extraction. However, that would only apply to somewhat broad topics, not specific combinations of words and the matching text is not surfaced in the UI.


That's a good website you posted.

I encountered the "need" for this functionality a few years ago to find the video of a YouTuber specifically saying something.

Back then I used a website that's actually specifically dedicated to the YouTuber (Northernlion): https://babypig.men/nlss-search?q=Basmati

I'm surprised the website is still live!


YouTube just added a feature to search transcripts. Seems like I might be in an experiment though


Also try www.askYouTube.ai, not exactly pure text search but it can help you find videos that answer your query using LLMs


I wanted, but when I press ENTER, it asks to register... I click cancel and notice PRICING page. I click on it and again it asks for login. That is NOT how one onboards users.


Despite the rude comments you left on my contact form, I have noted your point and am making changes to fix this.


I didn't left any comments there at all :) And I hope my HN comment wasn't rude.


Its fixed now unfortunately the email you provided on the contact form bounces.



You really have to wonder what YouTube are doing. Trying to play catch up with TikTok instead of innovating with what they have.

They’ve had transcriptions for ages - but it’s hidden away and practically useless.

The things they could do with a bit of creative thinking…


> creative thinking…

The death of creative thinking is management and processes. Assembly line work doesnt permit creativity. Google is heading the same route as IBM and other once great corporations made by creative people. When the discussion shifts from ideas to processes and triviliaities it's game over. Long live whatever replaces google. It's over.


There is a term in business for this I can’t remember. Where basically your money comes from business process X and so you protect X at all costs. Which makes it very difficult to innovate away from supporting X. It’s almost impossible for a company to pivot to making their money from Y. You see this classically in things like Sears, Woolworths etc. unable to keep up with the times.

The solution to this seems to be to basically start a new company in your company completely separate from your core company more or less. Meta seems to have initially done this very well with their pivot from Facebook to Instagram. Perhaps less great in their metaverse pivot but we’ll see. Google set themselves up for success at this with alphabet but I don’t think we’ve really seen them be able to have something they feel like they can really pivot into yet so business X continues to be the focus.


Hidden away yes, but it’s amazing. It creates a totally new way of consuming informational videos and everyone here should try it. You can scroll through the transcription vertically and tap on any bit and the video instantly jumps to that point - basically the transcription is the new seek bar. And about 1000x better at the job. No more skipping back to somewhere roughly where you stopped paying attention and just letting it play for a bit until you get back into the thread - now you can jump around with surgical precision. It’s like how you can easily skim back and forth in a text article, but with a video. It’s a total game changer and I find it bizarre that it’s so hidden away so most people won’t find it. It also works particularly well when casting from your phone to a TV, using the phone as the navigator. Oh and the transcriptions are about 99% accurate, which is good enough for me.


That’s really interesting - on my own website I’ve extracted the captions to my videos and I was thinking of wiring it up so you could navigate the videos. I may actually get round to doing this now.


I really wish I could search transcripts easier across YT.


My friend and I made something similar a few years ago as a college hackathon project - it features automatic scene transition detection and a rough editor before publishing the final results.

(The demo site is down, but you can clone the repo and run the code locally)

https://gitlab.com/chocological00/bitcamp-2021


This is actually potentially helpful for me as a lawyer for generating a paper record, and something I was talking about (and meaning to write up a script for) the other day. Sometimes I want to use a Youtube video in a court filing (for example, as prior art in a patent case), and submitting a rough paper record of the video like this is helpful along with the actual video.


If I'm remembering correctly the example they've used is the first video uploaded to YouTube. https://obra.github.io/Youtube2Webpage/example/


That’s correct.


Related, this is a super useful end user app that summarises videos using an LLM:

https://www.summarize.tech/


This is interesting. I think the scenario it should be used is for non sublte messages, such as sarcasm. I gave it a try with KRAZAM's video and the answer is hilarious when you consider the video intended exactly the opposite.

https://www.summarize.tech/www.youtube.com/watch?v=_o7qjN3KF...

> In "The Hustle," the narrator shares their jam-packed daily routine that exemplifies their dedication to productivity. From an early morning workout to late-night preparations for the next day, their schedule is filled with various activities. They efficiently manage their time, incorporating work, social media updates, and even a well-deserved happy hour. The narrator's commitment to self-improvement is also evident through their habit of reading before bed and tweeting inspiring quotes. Overall, this video highlights the narrator's hustle and structured approach to maximizing their day.


I just had to write a research report for a funding agency with many subprojects and could not get one input so I took a short video from a pitch presentation and converted it first to captions using spech2text and then to a research summary and it was really impressive.


I wonder if AI is soon going to be able to pull off the multimedia future I dreamed about as a kid, where no image is just an image, no video is just a video, no text is just text.


Cool to see some Perl in a new repo. The thing I find most appealing about Perl is their backticks for going out to the shell. I know Ruby supports it too. I understand that this can be a security issue but I find the quick shelling out so powerful for a script that also needs to do some text processing or real programming logic that the shell just isn't great for.


Watch out for portability, too. Personally I would consider it a last resort exclusively for local system automation too complex for bash but for some reason python isn't available. I'm not going to pretend I understand everyone else's use case though so I'm glad it exists.


Author here: This was indeed a quick hack. And Perl is still my fastest language for banging out a quick prototype. (I used to be the Perl 5 project lead. I’ve spent a lot of time with Perl.)


And indeed, that's what Perl was built for!


Just last week I saw the opposite, a website that created videos from blogposts: https://www.rephrase.ai/blog-to-video

While searching for its name I found several other services like this.


This is great!

What I’ve wanted for a long time is to use scene detection. So instead of a screenshot at regular intervals, it’s only when the visual scene changes.

Maybe if the transcript were turned into paragraphs you could have a little gallery to the side for any scene changes during that paragraph.


Yes, that is exactly what I came here to comment!

It seems "trivial" enough to detect a total shot change, and then in cases of things like fast-moving action sequences you just collapse all short shots into a single longer shot (e.g. all shots 5 seconds or less get collapsed together with any neighborhing shot 5 seconds or less).

And then pick a single frame from the exact middle, or else the most "still" frame that shows the least change from neighboring ones.


Have started playing with this and ffmpeg has a scene change detection filter already so it should be fairly straightforward


FYI this isn't AI-powered or anything. It's not even doing speech-to-text synthesis. It just uses yt-dlp (youtube-dl) to download the video with its existing [auto-generated] subtitles from YouTube.


Yup! It’s basically a glorified shell script. Sometimes the old ways are useful.

(I’m the author)


I'm working on adding this functionality to my iOS/macOS Japanese reading app, Manabi Reader. However since my app is essentially a web browser, it adds this functionality via userscripts instead of "youtube-dl" which Apple doesn't like.

for future reference - https://reader.manabi.io


Finally the web could be searchable again. If Google is clever they should pay big bucks for this as a last resort before getting strangled by Microsofts most important investment


This is really cool, but why?


I can usually read 10 minutes of spoken content in less than 3 minutes. Accompanying video is often useless, unless there are specific diagrams or illustrations/photos. I don't need to see someone's face moving to absorb the info. Probably lots of people out there with similar gripes.


I didn't create the tool, so I can't claim to know the author's intentions. To me, this would be very useful in a variety of circumstances:

- the 15 minute video containing 3 minutes of useful information

- tutorials that you're trying to follow step-by-step that have been tightly edited such that actually doing each step while following along is impossible

- the video equivalent of listicles in which you're really only interested in the list, not the padding

- quickly getting an idea of whether or not it's worth your time to watch a lengthy video by quickly scrubbing through the content


Cool idea but it's still pretty minimal and hard to read. My tool YOU-TLDR allows you to search through the transcript in your language of choice side by side the video: https://www.you-tldr.com/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: