Hacker News new | past | comments | ask | show | jobs | submit login
What I've Learned in the Past Year Spent Building an AI Video Editor (makeartwithpython.com)
168 points by burningion 58 days ago | hide | past | favorite | 67 comments



As someone who has worked as a video editor, the most helpful AI tool would be prompt based editing.

For example “find all the interview sections where people are talking about x and make a sequence”.

OpusClip claims to have this but it’s behind a waitlist.


As an outsider: sounds like the main value lies in the AI extracting detailed and accurate (but heuristic) metadata from video: audio transcriptions, text, people, environment and objects.

Once that’s there, you can use it for organizing, searching, filtering, or whatever you want. It does not need to be coupled with an LLM-based interface.

ML models for eg face & object recognition have been deployed in both local- and cloud based photo organization for at least a decade. I very much welcome transformers to do a much better job, but I also very much reject the everything-is-a-prompt hammer as a solution to all problems. Especially in deep and professional workflows where details matter.


Author here.

Yes, this is a big feature I've been working on, should be ready for a beta by the end of the month.

I allude to it in the post, but good search (for editing) is a challenge, and necessitates a mix of embeddings/vector search and text models.


Derushing in general is the most time consuming, so not only language pattern recognition but also image recognition: "From the rushes, extract all the sequences with bicycle crashes to give me a pile of clips to use in my edit" !


Yes, agreed.

I film a bunch of skateboarding, and it can take tens of tries to land a trick. Similarly, there's usually an unique sound that signals a trick was finally landed.

Good multi-modal search and discovery is a huge part of cracking the editing problem.


Looks like https://kino.ai addresses that derushing stage, but as a specialized tool rather than as a function inside a video editor - which makes a lot of sense to me.


Tens? It sometimes takes my crew hundreds of tries (all on DV tapes).

How far have you been able to come with search for trick variations? It would be interesting to see a system that can reliably recognize what’s switch, nollie vs fakie etc. Then have it generate a list of all tricks for each skater and perhaps outstanding fails. Just some thoughts.


Detect the cheer everyone makes when the trick lands. Lots of proxy indicators to key off of.


> I allude to it

And that’s why I read the comments to see if anyone mentioned it.

To be able to literally take the source files used to put the video together and edit each piece individually would be great.

I wanted to create a car driving down a road covered in arches if greenery. I got lots of great options but I wanted a particular combination of options. If I could do something like that with video, that would be terrific


Not a personal jab, but I am astounded how every day, HN is full of discussion around how articles, newsletters, podcasts, and videos need to be aggregated and summarized for actual consumption. Repeat ad infinitum in both directions.

In my experience, I’ve always listened to live discussions or read long form blog posts, specifically for the story and obscure points being made. Summaries never capture that and always miss nuances.


It's approaching a very strange situation where people make overly wordy and bloated AI generated content and other people try to use AI to compress it back into useful pellets vaguely corresponding to the actual prompts used to generate the initial content. Which were the only bits anybody cared about in the first place.

One guy pays the AI to dig a hole, the other guy pays the AI to fill in the hole. Back and forth they go, raising the BNP but otherwise not accomplishing anything.


I haven't worried about search engines since I was trying to get my site into yahoo, but my understanding is that they rank long flowery prose far higher than things that are straight to the point.

There's then the added "benefit" of being able to put more adverts in such long text.

One of the main appeals of chatgpt is it just gives you the answer


*an answer

Not necessarily the answer


So no different to searching online and finding some random page then. In my experience chatgpt is usually far more accurate, and as it gets right to the point you have far more time to understand if the answer is reasonable


No one searches online for a random page. You search for something you may or may not find. You don’t go in a library looking for Jules Verne and get out with any random book. I can agree that search engines may be bad, but they don’t create web sites out of thin air.


Hmmm, not entirely certain about that metaphor.

I do that sort of thing all the time. Sure it is nice to walk out with the Verne, but I am quite certain that I'll probably be walking out with several random books, with or without the one I was looking for.


It's common. I get annoyed at my wife all the time for jumping to conclusions from some random piece of web info.


I wanted to know when the clocks went back in the US and UK earlier.

---------

when do clocks go back uk and us

ChatGPT said:

In 2024, clocks go back on Sunday, October 27th in the UK and most of Europe, marking the end of Daylight Saving Time (DST). At 2:00 AM, clocks are set back one hour to 1:00 AM, giving people an extra hour of sleep. This marks the shift back to Standard Time and will last until spring when clocks go forward again.

In the United States, the clocks will go back a bit later, on Sunday, November 3rd, 2024.

---------

Compare to using a search engine to find this out, which involves one search, then clicking another page, then finding out the dates for the UK, then searching for the US, multiple pages, multiple paragraphs of text

First result was the evening standard

---------

What date do clocks go back in 2024 and when does British Summer Time end?

Brits will get an extra hour of sleep from next month as the days get shorter and shorter.

The temperatures are starting to drop, marking the end of summer – even if it’s not going quietly. Nonetheless, autumn is well and truly on the way and that also marks the end of British Summer Time (BST).

For those who aren’t a fan of dark mornings, that means you’ll gain one hour of sleep.

The custom of changing the clocks twice a year has been around in the UK for over a century, taking place once in March and once in October.

There’s still a little while until the clocks change but the date is already known, as it always happens on the last Sunday of October.

In 2019, the European Parliament voted to scrap mandatory daylight saving but Britain has no plans to, err, see the light.

This is what it all means for the UK.

When do the clocks go back?

The clocks go back on Sunday, October 27 at 2am.

---------

All that nonsense to parse and I still haven't got the US date


Because a search engine is not an answer engine. I just type 'daylight saving time uk' and 'daylight saving time us' and the answer was right at the top [0].

You're supposed to give a query, not a question (even though google et al. have worked hard to trick people into that). Which is why search engines works for me even if there are lot of garbage filled sites.

[0]: https://ibb.co/GpZ19nK (screenshot)


> Because a search engine is not an answer engine.

People have come to expect that though, and until a few years ago Google had actually gotten really good at it, partially because people finally started using structured metadata to give context.


Strange experience. I tried to replicate it by typing "US daylight savings time" into my URL bar and Duck Duck Go's summary blurb at the top of the results says "Daylight Savings Time Ends Sunday, November 3rd, 2024" and the first result is Wikipedia. Without even following it, the summary on the search page says "in the US, daylight savings time begins on the second Sunday in March and ends on the first Sunday in November."

Hacker News commenters seem to consistently have far more trouble searching for things than I do and I don't get it.


They do questions-based, not query-based search. The trick is knowing the right keywords, which is fairly easy.


Tiny nit: it's daylight saving time.


It’s clearly different in that ChatGPT sounds authoritative but you still have to track down sources and make sure they’re correctly summarized and accurate. Search doesn’t give you the impression that you’re doing anything else but ChatGPT always sounds authoritative even when it’s wrong, which makes it a hazard for the people who need it the most because they don’t have the personal expertise to recognize when it goes off track.


And webpages always sound authoritative even when they're wrong.


There’s a key difference to understand: web pages have individual reputation. If I see something about the moon landings on NASA.gov I assign it a different trust level than something I read on youcanthandlethetruth.social, whereas LLM output comes with the imprimatur of the company which made the system. Some LLMs do generate citations but those don’t always exist, come from authoritative sources, or say what they’re listed as saying but users are notoriously prone to not checking unless they’re primed to be suspicious.


Insightful :)


Not sure about articles, but people keep recommending multi-hour-long podcasts and videos far beyond the ability of any employed person to keep up with what they might want, so a summary is a useful tool to extract the salient points and possibly consider if something meets the threshold of being better than all the other hour-long things I might want to spend my free hour on.

It sometimes feels like media has bifurcated into hyper-dense (let me explain a whole field of law in a 30 second tiktok) versus hyper-fluffy (documentary with 30 minutes of material spread out into six episodes, with a recap before and after each commercial break), depending on whether the target audience has a job or not.


Sounds like you're suffering from FOMO if you feel the need to consume summaries of multi-hour content you don't have time to consume.


It’s also changes in market dynamics. Professional podcasters sell ads so they need lots of content, and the pivot to video or podcasters which advertisers drove means that things which a decade ago would have been a blog post taking 15 minutes to read are now an hour or more commitment for the same amount of information.

This is a common complaint here because HN is so text heavy that you’re not going to find many people here who can’t read much faster than the average speaker can present information.


Yeah that's what I meant by spam.


If that’s what you meant, you didn’t say it and it’s not spam by normal definition of that term.


Oh sorry I was talking about my other comment under this post, my bad.


Or they are just interested in the content?


I doubt it.


I generally agree with you when it comes to learning-focused content but there are definite cases where using an AI summary makes a lot of sense.

Imagine searching for a guide on how to disassemble your laptop. Unfortunately, you can only find a 30 minute video which is full of rambling, ads or other things irrelevant to you. You can at least in theory use AI to produce a textual summary which contains only the disassembly instructions and relevant snapshots of the video.

All professionals I've ever talked to seem to agree that videos are a terrible form of reference information (i.e. you need information to accomplish a task right now).

The same applies to recipe websites: an AI that can throw all the fluff away is useful considering the annoying habit of the authors to seemingly write about everything but ingredients and the steps necessary to cook the dish.

I think this relates to the https://nick.groenen.me/posts/the-4-types-of-technical-docum... as in any documentation that serves immediate work rather than learning should be straight to the point with as little clutter as possible.


>All professionals I've ever talked to seem to agree that videos are a terrible form of reference information

It really depends. For most software things, I'd prefer to have written documentation. If it's purely for reference, then yes I agree text is better.

For working on my bicycle or car, often I like watching videos because you pick up on little ways the pros make the jobs easier - for example, the steps might do a poor job of describing the angle and movement of tyre levers, but it's easily understood via video (just an example).

As a result, it can be a much richer experience when you are building skills as opposed to just following a checklist.


I totally agree. What is life living with just summaries?

Podcasts and blog posts fall into "unique value/view/information I am learning" or entertainment "something that feels like a (parasocial) friend - content I can predictably expect and get some dopamine/sense of socialness from".

Summaries for the former remove the eureka moments and brain connections between ideas, replacing them with takeaways, and summaries for the latter are like summarizing a TV episode in text: no entertainment tends to really come from it.

I think it comes from having many messages at work, and I get that. When you have 50-100 messages/documents a day, quick summaries are a lifesaver, they help you filter, avoid, or get to the facts. But for things I select listening to.. for those hours of rest or (scientific) curiosity in my life.. summaries are not a virtue.

(for Parasocial - the feeling is: This person won't update me on their relationship problems, they'll explain a cool thing about castles to me and share their opinion, etc.)


It has a lot to do with the kinds of articles that appear on HN and across the internet. And also, that spending time on something requires being interested in it, and so, there's a larger audience for summaries.

I think, in general, most people have areas of interest to them where it would not occur to them to summarise what they're having fun engaging with.


People use these summaries to generate spam which they sell to advertising networks, that's why they keep talking about it.


Thats fair, and there will always be people who want summaries.


I don't read much online drivel, but the way I would describe my interest in AI summary/model building, is that I do read a few articles/books deeply, but these refer to many other things that it would be useful to have a general picture of in my mind, but I'm never going to put the manual effort into building that surrounding structure.

E.g. I'm interested in classical art, and come across a lot of "he painted this while he was in $X before he moved to $Y". I'd like information about $X and $Y to be also available, how far apart are they, were they ruled by the same people, etc. But I won't be doing that sort of digging myself, I'd like it to show up next to what I'm reading, because I (will) have an AI reading along and doing this work for me.


You don't understand! I need to procrastinate more efficiently!


that seems really hard


You should check out scenery.video (disclaimer: I have a relationship with the company)


Check out https://kino.ai (YC S23)


I agree that building AI on top of the video editor is probably a mistake. Maybe the format of the representation of the video can be something better than a series of matrices of pixel values.


Absolutely true ! Re-imagined AI first products will kill AI patched up legacy products.

Always.


I think this is sometimes true, and certainly after a ton of failure first.


It seems like we are currently in the "skeuomorphic" product design era for AI products. Which is to say we are building the same products but with AI tacked on. I appreciate that you are approaching this problem from first principles and attempting to break from the model of the previous generation. Kudos.


I’m a beginner video editor, and I’ve been using a few AI tools that make things easier. I use https://captions.media for adding captions to my videos it’s super quick and easy. I also use makeshorts.ai to repurpose my videos into shorts and reels, which saves me a ton of time. These AI tools really help speed up editing, especially when you’re just starting.


Impressive blog! I am building a professional web video editor - https://chillin.online and trying to embed various AI workflows into it. Your article has given me a lot of inspiration. Thank you!


Looks so interesting..


Good work on pushing through. It’s like you say, building anything is an achievement.


Seriously, every person needs the opportunity to really throw themselves into creating something for a year. I think so many people walk around thinking "if only I had time/money/space/whatever I could do something amazing".

It is really humbling to actually try it and realize how difficult making anything original is. You also realize that... you just might not be talented haha.


Love the author sharing their winding journey as well as the tools and things they learned along the way. You can tell the author did grow a lot through this process, and through the year. Great stuff, thanks for sharing these great tips :D


Because I don't see it mentioned elsewhere, I wanted to plug OpenTimelineIO, as a lot of the industry is building support around it as a format right now, and it would be great for any new video editor to support.

https://opentimelineio.readthedocs.io/en/stable/


what do you think of this versus the ai that is hiring actors that are then reused as models in the videos via script


Author here. I imagine that being one of the components you can "plug in" to what I'm building.

Imagine taking in a prompt, which describes the video you'd like generated. At render time you pass along variables which get injected to describe the specifics for your audience.

We can then adjust the video edit according to that audience, including mixing generated and non-generated content.


did i miss something or this is "video editing was too hard so i just made a Wikipedia reading bot that generates drivel for Instagram and TikTok at the same time"?


Author here.

This is a genuine concern of mine! I don't want to build something that generates slop.

Rather, I think whenever we change the costs / process of things, new possibilities open up.

As an example, last night I re-watched Starship Troopers for the six-hundredth time. I'm a huge fan of Paul Verhoeven.

What if I could watch a custom edit of Starship Troopers on demand, and this edit surprised me with something new? I don't know exactly how this would look, but maybe it's interesting?

It's tough to predict the future and how things will change.

But I'd rather be participating in its creation, trying to make it better.


>What if I could watch a custom edit of Starship Troopers on demand, and this edit surprised me with something new? I don't know exactly how this would look, but maybe it's interesting?

this is not interesting whatsoever actually


lol alright, appreciate the skepticism.

What if instead of an algorithm designed to hold your attention captive to sell you shit, a feed of videos created to help you focus on what you aspire to learn / be / do?

idk probably bullshit too, but why not?


We don't advance as a society unless people ask new questions. Having folk willing to spend some time answering those questions (in public, no less!) helps others. It's really, really damn hard to predict how advancements in one area can help another.

All that said, thanks for your interesting new question, and thanks for spending time on it :D


You have no idea what you're doing do you

Bfr


>What if I could watch a custom edit of Starship Troopers on demand, and this edit surprised me with something new? I don't know exactly how this would look, but maybe it's interesting?

Is this what you want to do? https://www.youtube.com/watch?v=6sUR6ylVH7E


There is a high time cost to actual video editing and managing the details well, quite different than generating one template/perception of it, which is more where social media slop is at.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: