Hacker News new | past | comments | ask | show | jobs | submit login
Video editing with Python (github.com/zulko)
346 points by wilsonfiifi on Feb 3, 2018 | hide | past | favorite | 48 comments



I've recently learned that 3blue1brown's [1] videos are generated with python:

https://github.com/3b1b/manim

[1] https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw


I've actually played around with this a bit and it was surprisingly pleasant to use even with all the "beta software" warnings


I'm the CTO at Kapwing (https://www.kapwing.com), where we use MoviePy for all of our products (all of which help users edit video online). Sometimes the documentation has been enigmatic but overall the library has made it really easy to create a video processing backend without having to dive deeply into combining ImageMagick and FFMPEG at a low level.

Funny enough, MoviePy makes it so easy to edit videos that we also use it to edit images. But the way it works is when a user uploads an image, we convert the image into a video format, edit it with MoviePy, then convert it back into an image. A little inefficient, but it works better for us than writing a completely different branch of code to handle images.

Having worked with MoviePy for a while my main ideas for improvement would be: 1) Create a callback for video editing progress so that it could be outputted to the frontend, etc, 2) Make predefined transitions - currently transitions need to be manually calculated, would be nice to have smarter presets to make it easier to create something like a slideshow, 3) improve threading over multiple CPUs - not sure how possible it would be for the library itself to handle this but from our initial tests ffmpeg by itself seems to handle threading more effectively


A few months ago I used MoviePy to create animated Magic: the Gathering cards (HN discussion: https://news.ycombinator.com/item?id=15449955 , technical writeup: http://minimaxir.com/2017/11/magic-the-gifening/)

Even though MoviePy is an "older" library (original HN discussion from back in 2014: https://news.ycombinator.com/item?id=7121104), it still works pretty well, although you may have to fuss around with the low-level FFMPEG settings to get what you want. I'm surprised that there hasn't been more usage of MoviePy actually.


Don’t forget to check Leif Andersen’s video language

  https://lang.video/
It is a Racket language for editing videos. Quite handy when you need to edit many, similar conference videos.


I love this tool and just started using it.

If this did DNxHD codec transcoding I would love it forever. I need to ask and look to see if this is possible to add. When you edit videos you SHOULD transcode your videos into a format that actually uses one image per frame (DNxHD or PreRes (On Mac)) and than then deliver it back into a compressed format like H.264.


> When you edit videos you SHOULD transcode your videos into a format that actually uses one image per frame (DNxHD or PreRes (On Mac)) and than then deliver it back into a compressed format like H.264.

Yes, you should transcode your videos to an intra-frame codec (i use ProRes at 540p). But (and this is important), this is only a proxy format. When you export your creation the video editor should automatically use the original files for rendering. Premiere this very well nowadays[1].

[1] https://helpx.adobe.com/premiere-pro/how-to/proxy-media.html


If you have a H.264 (non-intraframed) and it is playing back in preview fine I try to just use that instead most of the time.

ProRes 540p is HUGE. Are you working with 4k? If not I would knock that down. Apple ProRes 422 is only 147 Mbs. https://www.premiumbeat.com/blog/5-things-you-should-know-ab...


> ProRes 540p is HUGE.

540p means 540 pixels on the short side (height). That is very low res and they're not big.

> Are you working with 4k?

Very close, UHD resolution.


Clickable link https://lang.video/


It seems similar to Vapoursynth: http://www.vapoursynth.com/

AFAIK, Vapoursynth is more powerful thanks to its module system, however the ability to integrate with IPython notebooks of MoviePy seems nice.


Recently I discovered ffmpeg-python [1], also a nice Python frontend to FFmpeg. One of its best features is being able to show nice graphs [2] showing where which filters are applied from in to output. And all FFmpeg filters can be used, it is a thin layer over FFmpeg, just much more readable.

[1] https://github.com/kkroening/ffmpeg-python [2] https://github.com/kkroening/ffmpeg-python#complex-filter-gr...


Video editing apis are a pretty neat way to edit video.

Apple provides a nice set of apis to achieve this as well:

https://developer.apple.com/documentation/avfoundation/avmut...


A few years ago I used moviepy to make the animated user-test videos in [1]. The idea was to create science education videos using a software-like process of iterative development and guerrilla street usability testing.

Traditional science education videos are created by sort of a waterfall process. Throw the script/footage/video over the wall, and just hope it kind of works when it hits users. But it seems this usually doesn't work - prolifically creating misconceptions, while failing to provide transferable understanding. Ok for a "motivational speaker" role, but not for a "tutor". Someone at a leading educational video shop once told me 'sure, it would be great to make user testing part of our development - just as soon as we find anyone willing to fund that'.

Regrettably, my videos didn't user test well. The exercise didn't converge, and was abandoned. I had envisioned an incremental process, of informal testing for appeal, understanding and misconceptions, while improving clarity of explanation. Instead, what I found was a minefield. Bacteria and viruses get bad press, so for some "a fun story about bacteria!" has the emotion tone of "a fun story about genocide!". An elderly British gentleman was distressed by the mention of "millimeters", which brought back unhappy memories of learning metric as a child. An MIT student was disturbed by the viscerals of breaking off the head of a T4 phage (zoomed to chicken-size). Someone ran away exclaiming "how could you show me something so disgusting?!", for reasons unknown, but perhaps from having not understood anything, and thus seeing a pile of E.coli, as a pile of poo. Lot's of "oh my, I certainly didn't anticipate that failure mode". And this wasn't long-tail variance - I wasn't testing enough to see any. This kind of nonlinearity seemed common case. I also hit some newby with "insufficient domain expertise on tap" mistakes, like underestimating the interference effects between spoken and written input - having a character speaking visible text works startlingly poorly.

I wrapped moviepy in a framework to create lots of minor variants for testing...

    class NovX4 (FilmCommon2):
        default_shots = 'context','question','realsize','zooming','views','playset','buildings','measure','stories'
        question_shots = 'question_how','question_sizes','question_planets','question_atoms'
with the animations timed to recorded speech (the computer voices I had available didn't test well)... "How can you remember, the sizes of small things?"

        question_how_script = aka(am18.HowCanYouRemember)
        question_sizes_script = aka(am18.TheSizesOfThings)
and animation code sharing using inheritance and mixins and injection and ...

    class NovX6 (NovX4):

        def question_sizes_frame (self,t,d):
            ca = self.setup_table()
            ca.teacher().draw()
            ca.side_table()
            ca.room.mv(.2,.55).gs().otp().rotate(.2).draw_milli(image_of_book_life_size_zoo).gr()
            ca.room.mv(.0,.3).gs().otp().rotate(.4).draw_milli(image_of_pizza_slice).gr()
            if t > d * .5:
                ca.room.mv(.3,.4).gs().otp().rotate(-.2).draw_milli(image_of_walking_cat).gr()
            self.teardown()
Doing programmatic video production had some nice properties, vis a traditional video animation pipeline. It was easier to revise past decisions, to support a set of active and evolving tests, and thus to explore possibility space. Without my being a "redraw this scene for the tenth time this morning" magically-fast skilled animation artist. But some things would have been much easier with a normal direct-manipulation UI. And there was a slippery slope of reinventing wheels ("oh, I guess I need to add feature X to the character rigging to get that effect :("). If I were starting on it today, years later... I'd likely use python and Blender instead. For the ecosystem. I was pushing moviepy well outside its normal use case.

[1] The stick-figure videos, in the top How to remember sizes section, of the slowly-loading wasn't-intended-to-be-public page http://www.clarifyscience.info/part/Atoms .


If anyone wants to read the docs , here is a direct link

>http://zulko.github.io/moviepy/


The docs do appear to be somewhat incomplete, though, there's functionality used in the examples that the docs otherwise don't seem to mention even exists.


I've used Moviepy fairly extensively in a recent project and yes, the docs are incomplete. I've had to resort to actually reading the source code to figure out why, for example, my argument was being ignored. I still have some unresolved (though seemingly obscure) issues with transparency, and my question to the moviepy subreddit went unanswered.


I have this notion that in time, as people's general comfort level interacting with computers and advanced computing platforms like Python become rote. That computer interfaces will be more REPL like, and less application, or GUI like. That the principle of general computing will be applicable at interaction level, not application level, if you will.

Future users may consider today's restrictive GUI interfaces as quaint anachronism of the stupid ages.


I strongly disagree with this prediction. That may be true for some lines of work where people have a strong technical background and uses a specific set of applications all the time (such as scientific computing). In all other cases, gui will always win because:

- Non technical people lack the capacity to formalize their goal - many document creators want wysiwyg applications - programming languages are powerful but hard to "discover". People don't want to read the docs for something they will use once in a while


I think there's a happy middle ground between extensibility/flexibility and simplicity. You can use a GUI for simpler stuff while still allowing code for the complex stuff.

In moviepy's case I wouldn't use python to do describe simple video transformations as shown above, I'd use YAML. Unlike python, it's something that you can build by hand, can easily get a GUI or script to output and if you put it in version control you'll get readable diffs.

At the same time if you made every "video project" include the source media, the YAML file describing transformations used to build the output and by default include a stub 'plugin' python file, you could reduce the friction between people with coding skills seeing the need for an ad hoc custom transformation and creating it.


While I never used it, it sounds like HyperCard actually did achieve both goals: Ease of use for non-technical users, and wysiwyg GUI design + use.

We used to have pretty small, simple languages in the past that were very effective in driving a simple text-based or stock-UI-component interface. These simple languages aren't difficult for people to be introduced to, preferably by accessible examples & demonstrations, and do just enough to be practical for common uses.


'Future people' will all be 'technically strong' by today's standards at least. They will expect their tools to be powerful and have a whole childhood to familiarize and internalise concepts and paradigms. Pointy clicky is not going to cut it, not for anything worth doing. Of course there will be toys, of which I enjoy many, I'm not talking about those.

The only intuitive interface is the nipple, everything else is learned. It also seems you got stuck on the notion that 'programming languages' are hard. That does not need to be the case, not for native talkers. And REPL and 'wysiwyg' are certainly not mutually exclusive things, http://nbviewer.jupyter.org/url/jakevdp.github.com/downloads... that seems pretty what you see is what you get to me.


I wonder where you get that prediction from. We've came from windows GUIs over single apps on mobile phones to "just tell the machine what you want". The development goes further away from consoles and "typing" of any kind.

Consumers want GUIs. Show and don't tell is the way it goes.


I believe that in order to grow strong enough programming skills, one has to have strong enough analytical skills. Those can also be learned but are much more of a personality trait.

As for notebook being wysiwyg, I also disagree. That would be like having a markup editor with a rendering window. Document creator like the "painting/sculptor" metaphor, where they interact directly with their creation. Want to modify a part of a document? click on it.


It's a nice thought but we aren't heading in that direction. The young are almost exclusively brought up on swipe interfaces. The future is likely to be a dystopia of idiot proof GUIs or even worse voice control.


True, but those interfaces are built by adults still stuck in the old paradigm. Kids have astonishing familiarity with tables and phones. The notion that interfaces need to be dumbed down because users are 'stupid' might be true today, but that won't be the case for those raised with smartphones.

2007 was 11 years ago, in 10 years those iPhone reared people will enter the workplace. They will have instincts you and I do not. I suspect it will push general computing interface design to the 'power user' side.


A lot has been written on the “digital native” (people growing up with technology being good with technology). I encourage you to look it up.

As computer savvy people, I think we imagine how much better we could have been with more opportunity. It seems like the prevailing opinion is that the digital native is a myth and that just having technology doesn’t make the next generations experts in it.


Kids have strong familiarity with swipe UIs and iPhone games. This won't make them switch to programming-based UIs any more than growing up with a TV turned one into an electrician. In fact, the opposite is likely to be true - the more "magical" and abstracted away the UI is from the underlying technology - the less likely it is that its users will develop any intuitive understanding of what lies beneath.


We were going that route, and then I'd say Windows' and Mac's market acceptance proved otherwise.

I think the argument might there for interfaces to scale more fluidly from GUI to text (e.g. every GUI command is text mapped and shows you). But the on ramp to text is too difficult for new users...


Many professional tools have a strong GUI/Engine separation, AutoCAD famously used Lisp, and one could see and modify commands that were being dumped into the application REPL in real time.

Many applications on the Amiga had an ARexx Port that allowed scripts to drive all the same actions as the user.

http://docs.autodesk.com/ACDMAC/2013/ENU/PDFs/acdmac_2013_au...

http://www.pjhutchison.org/tutorial/arexx.html


Good point, but I tend to believe that no one kind of user interface can ever be complete, so the best kinds of programs will support multiple interfaces that can be used with people with varying levels of skills and familiarity, to accomplish the program's tasks. Point and click GUIs will always have their place for those who have very little or no familiarity with software concepts.


Nonsense. Point and click is useful for anyone who uses a tool infrequently and wants to make progress without resorting to a manual. Honestly the average app is not so complex its fundamentals cannot be represented as point-and-click. Its partly ego that keeps people on command lines. As expressed in that comment, which has a vibe of chest-beating.


If a task seems straightforward I will likely apply my past experiences with any similar task, and skip any provided manual until I become unsure or hit an obstacle. This is how normal people approach computing devices.


This is more or less the ongoing wider state of 3D software. There’s a pile of exceptions to every rule around each corner. If anything, coding has become less important as better software design evolves.


I highly doubt this would be the case. Well designed GUI's makes learning software significantly easier than without GUIs.

Just imagine going to a chemical plant that makes fertilizer. Now imagine you have an operator that stares at 5 or 6 screens showing each IO process for making said paper. From the drying process, to the chemical rxns, heat, temp, water, pressure ,etc. Its a complex system to understand, and GUI's make it easier to break down.

Take away the GUI and throw in all python REPL and CLI. Things will go to shit so fast. Plants will shut down. Operators will need to be retrained and paid more to understand the inner workings of what the GUI was doing.

Thats not to say that CLI and REPL are necessarily worst than their GUI counterpart. A good example where CLI and REPL are significantlly better than its GUI counterpart outside of programming would be AutoCAD (AutoDesk). Any competent architect uses a list of well known shortcut hotkey commands, through its CLI. Such as "M" for move an object, "O" for offseting a line. It takes way longer to use a GUI to do this.

Arguably, some common GUI interfaces for programming like GIT makes it much easier to do things, like branch off, rebase, merge, and pull. I personally prefer Atom's built in GUI, or even a possible stand-alone, as I simply cannot remember every GIT command when I need too. Just the most common ones.

Lastly, windows/mac VS Linux. Its thanks to the GUI-based systems in windows/mac that make it much more popular and appealing to programmers and nonprogrammers a like, for focusing on things that actually matter (Business functions, persona life, etc).


We may be trading todays restrictive GUI's for badly documented application API's...

Actually thinking about fairly advanced non developer applications. Mapping/GIS work is a lot like what you describe. GUI tooling with tool boxes of little functions that you can chain together to get the desired outcome.

Though its incredibly complex when you get the advanced use cases, it becomes not dissimilar to regular software development.


How much video/audio editing have you done?


I think about this from time to time, it's sort of the difference between using the computer and using an application.


The main issue with this is discoverability.

How would anyone know what commands are supported?


I don't agree with 'ageofwant’s prediction (though I really wish that I could), but command discovery is a solved problem:

Pharo Smalltalk has a nifty feature whereby you can find methods by example[1].

E.g. typing

    'hello, world', 'HELLO, WORLD' 
in the finder returns the method `asUppercase` on the String class.

In a statically-typed language, you can just search by function signature (e.g. Clip -> RotatedClip (an effect) or Clip -> Time -> Clip * Clip (a cut)).

[1] https://www.youtube.com/watch?v=HOuZyOKa91o&app=desktop&t=30...


Python tab completion does wonders. I know when I was messing around with blender it worked really well when I didn't want to read the code (or figure out what was missing from the API for patches).

Not too sure the "artist types" will ever get very comfortable with the command line but they can usually describe what they need well enough for a techie to code up what they need given a good enough API.


Is there anything similar for non-destructive audio production except for Ecasound [1], Nama [2], Csound or SoX?

There was an interesting paper about SPED, a sound file editor [3], but I'm afraid this was just a proposal. And AudioRegent [4], and a few links, somewhere, to using makefiles in audio production, but I've lost track on these.

A quote from the SPED paper:

    "The editor should be usable without graphical user
    interface or it should not make any assumptions about 
    it.
    This feature may seem strange if we think of the modern
    computer based editing systems which basically always
    have some kind of graphical user interface. On the other
    hand when comparing to analogue tape splicing one
    might ask how mandatory the graphical user interface
    really is."
I've been thinking that a CLI audio editor (slicer, re-organizer) with a syntax similar to ed or sam [5] would be cool to experiment with. Imagine a CLI non-destructive digital audio editor built on top of a RPi or similar, running headless, only needing a tiny conrtol panel with a few buttons; maybe hooked to a tiny e-ink display for emergencies.

(Yep, love sound editing; tired of screens or laptops in bags. I would love an "iPod Nano of non-destructive digital audio editing", but I am not the man to build one myself.)

1: http://nosignal.fi/ecasound/

2: https://freeshell.de/~bolangi/nama

3: https://tinyurl.com/y9s8mgme [pdf]

4: http://journal.code4lib.org/articles/2882

5: http://doc.cat-v.org/plan_9/4th_edition/papers/sam/


I use this for my Freeze Frame Bot (and many others). MoviePy is super powerful and love playing with it.

http://twitter.com/freezeframebot


This looks cool. I am planning to write some tools for semi-automated processing of a video dataset soon. Are there any other strong alternatives to this library that I should consider? Am OK with any language.

An example task: automatically detect cuts between different shots with high sensitivity, e.g. many false positives, show the user a clip of +/- 2 seconds around each proposed cut, user presses a key to confirm/deny that it's actually a cut.


It seems to me like what you're trying to do is analyze video, while Moviepy is mostly really good at programatic editing. Have you looked in to OpenCV?


Yeah I would detect the cuts with something else. I just need the library to read/write video files, display them on screen, etc.


I've used moviepy quite a lot and even submitted a PR or two. It's not perfect, and docs aren't super up to date. For many tasks a good nonlinear editor is way easier. BUT, it also makes some tasks really easy that seem surprisingly complex to do with (the admittedly simple) GUI tools I've tried. E.g. making a 3x3 montage of videos playing simultaneously, with a caption beneath each.


I think I'll just use this. I often need to cut up videos and I've found that using linux's video editing tools are horrendous. Even using ffmpeg from the command line doesn't work out due to its horrendous tolerance for choosing start and end points.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: