It’s a bit less dry than the changelog, notably for the evolutions of the APIs.
What’s also important is the changes about the release schedule that we’ve been pushing with the community. Major version every year at the beginning of the year, with ABI and API break, minor releases during the year and an LTS every other year…
A bit off topic, but in your talk you said that intel, nvidia and amd encoder are worse than svt-av1. How much worse and which card is the best? Always encoding with ab-av1/av1an using svt-av1, but with the "right" preset it's still really slow. Which hardware encoder (in your opinion/experience) is the best to use (size/quality speed doesn't matter as much). If I want to convert my media server to av1 only should I stick with svt-av1 and eat the power/time-cost?
What is the the "right" preset? SVT-AV1 has a wide range of presets, from being about as fast as x264 veryfast onwards. And for any given speed level it should provide significantly lower bitrates at the same quality than x264, x265 or libvpx.
For example if you pick speed 6 on SVT-AV1 it should take about as much time as x264 veryslow while still being 30-40% lower bitrate at the same quality.
I dont have a card to test, but the last VMAF bench I saw shows hardware AV1 is slightly better than veryslow x264.
Which is way better than hardware h.264... but also terrible compared to svt-av1, lol.
Try using av1an to parallelize (and improve the quality of) your encode. And TBH you should not re encode your library unless it has raw rips or something like that, and you are short on space.
I wouldn't be too worried. I was a lurker for a long time before joining but it seems like there's an annual, if not bi-annual, flood from reddit or other sites that gets shut down because low effort comments and F-tier jokes/memes/puns get no engagement, removed, or downvoted to oblivion.
You're sweet, and we appreciate being a known secret, but I hardly worry about swathes of ffmpeg 6.0 nerds joining the community. Indeed they might be a welcome addition.
Fun fact of the day: ffmpeg is written by Fabrice Bellard, who among other impressive things, wrote the JS PC emulator capable of running windows 2000 in a browser over 10 years ago that got me fascinated with emulators in general.
Fabrice is superhuman for sure, but in 20 years the urban legend will go like ...
Fabrice Bellard once read all of TAOCP in 20 minutes before being absorbed into his own AI and is now ascended in the ethereal ethernet silently fixing bugs in your code. Blessed be the bits.
... before being absorbed into his own AI and is now ascended in the Ethernet plane silently generating endless frames of international scandi-noir thriller serials for Netflix.
If we studied Fabrice from the perspective of "how can all be more like him/as productive as him/as innovative/as smart", do you think the findings would basically come back "he's just genetically gifted"?
He's like the Lebron James of the tech world from what I can deduct.
He’s good at creating useful MVPs that are architected well enough to attract other people to finish the remaining 99% of the work as he moves on to the next project.
I don't see anything wrong with this. It's like saying "Apple is built by Steve Jobs" even though obviously a lot more people are involved, and even if Jobs is no longer even around.
This mostly looks like they're adding support for new codecs, there's only 4 new options in there, which seems rather paltry for a dot zero version release. Should we consider ffmpeg feature complete?
At a prior job we pinned to 4.2.3 because after that version an internal change massively impacted high frame rates. With a 1080p mp4 file on disk, and that disc being an ssd, version 4.2.3 delivers 700-800 frames per second, while all the versions after that 4.2.3 release - no other app or library changes - that same file may reach 300 fps. I spent some time trying to find the cause of that 50% cut in efficiency, but never found enough time.
Nostalgia moment with the CrystalHD decoder deprecation. That came along the Intel Atom netbooks, one of the few ways we had to bump a bit their austere performance.
I absolutely love ffmpeg, but for the life of me I cannot understand how its pipeline system works.
Each time I need to use it, I attempt to construct the command myself, but end up giving up and consulting StackOverflow. Amazingly, someone has usually done the exact thing I need to do and posted their command line to StackOverflow, so I'm never out of luck!
How do I actually start understanding how ffmpeg works? I want to be an ffmpeg power user.
I know a lot of people look down on ChatGpt. But I have been using it for creating scripts to use with ffmpeg and I was able to get most of the things that I needed with very little massaging required. You can then ask ChatGpt about what it did and why so it will explain it as well so you can get some basic understanding on how things work. People don't realise what kind of tool Chat Gpt is and how to properly utilise it but it can be very useful for stuff like this.
This command uses FFmpeg to create a single output video file from two input video files. It starts by specifying a start and duration for each of the two files (input1.mp4 and input2.mp4). It then applies a filter complex to the two files, which combines the two videos and audio into one stream, and adds a brightness filter with a value of 0.3. Finally, it maps the output video stream to the output file (output.mp4).
-filter_complex specifies a series of filters that accept inputs and return outputs. Any value contained in brackets ([]), is a value that can be input or output by a filter. [0:v],[0:a],[1:v],[1:a] are values supplied by ffmpeg representing the video and audio streams of the 1st and 2nd inputs, in this case input1.mp4 and input2.mp4.
The first filter, concat, takes in a set of synchronized audio and video segments, concatenates them, and returns the resulting audio and video clips. n specifies the number of segments, v specifies the number of output video clips, and a specifies the number of output audio clips. The results are saved to the values of [v] and [a] for video and audio respectively.
The eq filter then takes the [v] video returned by concat, and adjusts the value to a brightness of 0.3. For reference, 0 represent no change to the brightness.
This [v] value is then mapped to the output video using -map.
That being said, this filter isn't correct, as the [a] value is never used or mapped, so the filter would fail. The correct way to write the filter, if the intended use is to discard the audio, would be:
But I also understand my sister doesn't need to know how her phone does any of what it does to play candy crush or read her emails.
Just like she doesn't need to know how a microwave works to reheat her meal.
If you want to know how things are done, of course get yourself involved in the details, but for most things in life you just want to use it without bothering with the details, so you can focus of the parts that are of interests to you.
(I know some people like to know the details of everything, and maybe you are one of them, and that's great, but the vast majority of people do not)
Yes, chat GPT excels at comprehending and explaining things that have a consistent structure, restructuring, and and synthesising variations. If you keep it in its lane, it’s an excellent tool.
It’s really really bad at counting though. For example, try asking it to produce a line of 40 asterisks.
It’s bad at counting because counting relies on a stateful O(N) algorithm you run in your brain.
GPT is trained to reproduce human text, which tends to simply have the output of this O(N) counting process, but not the process itself. So GPT “thinks” it should be able to just spit out the number just like human text implies we do. It doesn’t know we are relying on an offline O(N) algorithm.
If you have it emit a numbered list of 40 elements, it will succeed, because producing a numbered list embeds the O(N) process and state into the text, which is the only thing it can see and reason about.
That’s very interesting. I assumed it was something about the fact that it is a language model rather than a calculating machine. So printing 44 asterisks instead of 40 is kind of close.
I wonder if would it be possible to teach the machine to recognise situations it’s better at and be less confident other answers? Or does it need to be confident about everything in order to produce good answers where it does.
It’s kind of funny how confident chatgpt is about giving out bullshit, and then even when you correct it, it says oh I’m terribly sorry, here is definitely the correct answer this time and then it gives you another wrong answer. Just an observation, I realise it is just a tool that you have to understand the limitations of.
> here is definitely the correct answer this time and then it gives you another wrong answer.
My favorite is when it gets into some weird context loop, apologizes and claims to have corrected an issue, but gives you literally, character-for-character, the same answer it gave before.
Fortunately, it mostly happens to me when I am asking particularly ambiguous or weird questions -- e.g., asking for any assembly in AT&T/GAS syntax seems to always go wrong, not necessarily in terms of the logic itself, but rather that it ends up mixing Intel and AT&T, or asking explicitly for POSIX-compliant shell often gives weird Bash/GNUisms, presumably since so many StackOverflow posts seem to conflate all shells with Bash and always expect GNU coreutils.
We can check our answers, we can spit out bullshit like it does but then take the time to check them. It has no process for checking the answers or analyzing them and I'd rather not ask it how confident it is because that's just not what I care about.
I find it amazing that it can actually sort of run code "in its head", all the code output it does is not actually run through an interpreter but it's still pretty close if not perfect each time. But trying to run code with it is mostly for kicks, rather I asked it to produce a simple API for me and then produce a python script that tests it. it had no bugs and I could check it myself fairly fast; certainly faster than it would've taken me to write all that code without any bugs. I'd have had to check my own code for bugs anyway.
So if you accept that chatGPT is sort of like a guy that looked over millions of programmers shoulders but never actually communicated with any of them to understand the code, it has a perfect memory while not being able to compute much in its head then it can still be a great tool. Just understand its limitations and its advantages. Just because it can't reverse a string in its head doesn't mean it's "dumb" or not useful for everyday tasks.
Note that language models get much better at pretty much any reasoning task when they are prompted to use chain-of-thaught (Cot) reasoning. The difference between "Solve x" and "Solve x, let's think step by step" comes from the language model using the context window as short term memory in some sense. Perhaps your explanation in terms of complexity is better, but I'm not sure whether it explains the effectiveness of CoT in general.
You cannot RL learn an O(N) algorithm in an O(1) feed forward neural network.
You could RL learn that when someone specifies a number, the appropriate thing to say is "Ok, 40 asterisks, let's count them, 1, *, 2, *, 3 , *, ..." and then it would indeed produce 40 asterisks. But not as a single string. Because producing them as a single contiguous string requires some offline state/memory/processing, and all the neural network has access to is the last ~page of text.
Embedding the counting process into the text itself kind of embeds the state of the O(N) algorithm in the O(N) text itself, that is, "unrolling the loop" externally.
It doesn’t have any logic; it just tries to complete strings in the most plausible way. It’s training material probably did not have a lot of “write five at signs: @@@@@“. RLHF might help steer it in the right direction, but probably wouldn’t product the concept of counting or loops.
So, this is where I guess I just don't understand. I've had ChatGPT produce code for me that there is absolutely no way it already had in its training set. I realize it can't actually "think", but then I also don't know how to describe what I'm seeing.
Agree! I used ChatGPT to explain some (uncommented)ffmpeg scripts I wrote few years ago. Scripts where created by going trough many websites and adapting to my needs. Explanation from chatGPT was spot on.
Here's a brief explanation of what each part of the command does:
- `f v4l2`: specifies that the input device is a V4L2 (Video for Linux 2) device, which is typically what webcams use.
- `framerate 30`: sets the framerate to 30 frames per second.
- `video_size 640x480`: sets the video resolution to 640x480 pixels. You can adjust this to your desired resolution.
- `i /dev/video0``: specifies the input device to use, which is typically /dev/video0 for the first connected webcam. If you have multiple webcams, you may need to adjust this to the appropriate device.
- `c:v libx264``: sets the video codec to use, which is H.264 in this case.
- `preset veryfast``: sets the encoding speed preset to "veryfast", which is a good balance between speed and quality.
- `crf 28`: sets the video quality using the Constant Rate Factor (CRF) method, with a value of 28. Lower values mean higher quality and larger file sizes, while higher values mean lower quality and smaller file sizes. You can adjust this to your desired quality/file size.
- `pix_fmt yuv420p``: sets the pixel format to YUV 4:2:0, which is a commonly used format for H.264 video.
- `output.mp4`: specifies the output file name and format.
<<<
Never used the `crf` or `pix_fmt` flags personally, but seems correct (the issue with ChatGPT).
crf: "Constant Rate Factor" for x264 among other encoders. Think it like a output quality (higher = lower quality). I'd say 28 is way too high especially when you're using preset=veryfast (too bad quality). Since it's only 640-480, most of computers can handle fast or even medium in real-time. If not sure, I'd say just use default (by not giving anything).
pix_fmt: pixel format. 99% videos use this format, and it has best compatibility. FFMPEG will try to keep the original format when do transcoding, so in most of time you don't need to specify. However, in this very case, since the input is from a webcam, the chance it uses some weird format is high. And you don't want to keep that in your final result. So it's good to specify it out here.
My 2c: ChatGPT is great, but I recommend to read the comments it gives about each parameter, try to understand their purposes, and adjust accordingly if needed.
Also having a rough idea about how FFMPEG pipeline works (mainly the order of input, output and their associated switches in arguments) helps a lot.
Video process is a very complex thing and lots of time it relies on experience. Just be prepared sometimes your "typically works" command would break.
Why do you trust the descriptions of `ffmpeg -help`? What if some evil daemon [1] went into the binary and changed completely the behaviour of the flags? Do you read the source code and verify the checksums for every program you run? In the real world, good or bad, very few people care beyond a first level of trust: does it work for my current issue? great, no?, try something else.
Also, you disingenuously left out the parentheses from the quote: using a tool requires undertaking its downsides, if the downsides can be mitigated accordingly then the tool is useful. Millions of users put to good use imperfect tools daily.
ChatGPT is a stochastic parrot it doesn't even pass the sniff test, or first level of trust. Does it work? That's meaningless pretty much as everything else that ChatGPT spits out. The information contained in them is zero. If you read something ChatGPT wrote, you know exactly as much as you did before you read it. It sounds plausible and it's basically a zero day on human cognition. Be vigilant.
You linked Wikipedia: I don't trust that either. If I want to actually know something I will follow the sources and then evaluate the trustworthiness of the sources. Indeed, most of the Scottish Wikipedia being written by an American teenager who didn't speak Scottish at all very strongly parallels ChatGPT.
YUV420p is the only pixel format that all H.264 compatible decoders MUST support. Other formats are optional.
In practice that means that most hardware decoders (not just iPhones, also other phones, TVs, older PC GPUs) won't be able to decode YUV 4:2:2 or YUV 4:4:4 videos.
It's kinda annoying, since YUV420 really messes up red text in screen captures for example.
You need this parameter to force ffmpeg to convert the color space if your input isn't YUV420P (it defaults to keeping the input pixel format to avoid quality loss).
Yes, it’s because red is dim. Dark blue (like, RGB 0, 0, 255) also suffers, perhaps worse, but that shade of blue is hard to reaf even when it is reproduced perfectly.
The capture pixel format depends on device - I think 4:2:2 is pretty common for webcams, which in H.264 requires profiles that aren’t widely implemented outside of x264 and ffmpeg.
I wouldn't call myself a power user by any means, but for many parts, the documentation is quite thorough, and if you're wrestling with some specific filter long enough, you might even begin to understand some of the magic incantation required to get it to work.
It's helpful to have some background in media container formats, compression algorithms, sound formats, and all the jargon and acronyms associated with the above. Easy!
I know this doesn't answer your question, but have a look at GStreamer pipelines. They take the basic idea of shell pipelines, but add typing, format negotiation, M:N connections, etc all while giving you optional control over low-level details (which demuxer?), but also high-level abstractions (just show me a window with a preview). Once prototyped on the CLI (gst-launch-1.0[1]), they're also very easy to start using within an application through the framework (gst_parse_launch[2]), where you can e.g. iteratively add things like a volume slider. You can also access most of FFmpeg's codecs through libav.
The ffmpeg cli supports multiple inputs and multiple outputs, so there needs to be a way to unambiguously map an option to its target. Position is looked at, to group options for the same target together.
argument order is definitely a feature. with -ss being one that behaves differently depending on it's location in reference to -i. it's not an accidental thing and the desired outcome dictates where you place it. not understanding that just means you're not using it enough to grok it.
Even given an option it can be difficult to find the corresponding documentation, if only because of the many different submodules and encoders and decoders and filters that have oh-so-slightly different options. That said, I've just switched from pydub to ffmpeg-python (due to memory issues of the former[1]) and judging from the Jupyter notebook[2] it seems a much more intuitive method of constructing ffmpeg pipelines.
If you don't use it every day, then this will be the typical result. But that can be said about anything, not specific to ffmpeg.
Practice, practice, practice. Eventually, you'll start thinking like ffmpeg. Knowing how ffmpeg labels the various streams inside a file is a great place to start. For example [0:a:1] means the second audio stream inside the first input. This is key for stringing together complex filter chains in the appropriately named -filter_complex.
There are some filters that require you to merge streams together so the processing is done evenly, followed by a split to get back to the original stream layout. amerge/channelsplit is a common combo in most of my commands.
I've been trying to get color space and bit conversion to work with png's and bt2020 video. Apparently any time I use a png with ffmpeg's avif encoder it comes out too bright.
Not parent but in my case yes, video production: ffmpeg to extract audio from captured video, process audio with Audacity, ffmpeg to cut video and merge it with the audio from Audacity.
Interesting, I wonder what this is / why you'd want it. In particular, when you have the DTS but not the PTS.
The recent gstreamer 1.22 release [2] had what I read as the opposite—calculate a plausible DTS from the order and PTS. They did a nice job of explaining why it's useful. AFAICT, this approach is the only viable way to get B frames to work properly from a received RTP stream.
> H.264/H.265 timestamp correction elements ... Muxers are often picky and need proper PTS/DTS timestamps set on the input buffers, but that can be a problem if the encoded input media stream comes from a source that doesn't provide proper signalling of DTS, such as is often the case for RTP, RTSP and WebRTC streams or Matroska container files. Theoretically parsers should be able to fix this up, but it would probably require fairly invasive changes in the parsers, so two new elements h264timestamper and h265timestamper bridge the gap in the meantime and can reconstruct missing PTS/DTS.
Looks like the ffmpeg thing is dts2pts_bsf.c. [3] I haven't really read the implementation, but I was hoping the comment at the top would illuminate things, but "Derive PTS by reordering DTS from supported streams" isn't enough for me.
Thanks. Looks like the input in question was an .avi file. That's a container format I don't know anything about (and don't particularly care to). I suspect then it's not relevant for anything more modern.
A surprising number of mp4 files are missing the ctts atom that contains pts since it’s sort-of optional outside the Apple stack (like ffmpeg generated such files from raw h264, which is probably the actual motivation for finally fixing this.) This should allow you to not generate or fix such files.
It’s sort-of optional for most playback stacks because they leave frame reordering to individual decoders as a codec-specific implementation detail, but Apple’s stack actually cares about frame-accurate random access so it relies on the codec-independent container timestamps.
The inaccurate seeking you get without container pts is okay for playback but it falls apart with editing or stuff like av1an.
Thanks, that makes more sense! Raw (Annex B) h264 in particular isn't exactly an ideal media format but I've definitely gone through it while debugging stuff and can see how you'd need this plugin to get from it to a correctly muxed file with B frames.
I have a bit of a love-hate relationship with ffmpeg. Love because it does things nothing else does (e.g. has a main-profile software H.264 decoder) and is so comprehensive, hate because code quality varies (lots of long-unfixed bugs/quirks in trac) and documentation/API/CLI experience is poor. E.g., I wish the documentation for this new bsf explained what you just said.
You're right, this comment should just be added to the file header and maybe the docs about H.264 processing in FFmpeg; otherwise this comment will get lost in a couple hours and the knowledge about all this will only be known by the people that made the change and not many more...
I really wonder why it costs people so much to write the whys of everything technical they do in at least a source code comment, but it is indeed an everyday's uphill battle.
It’s used heavily by professional broadcasters and TV/movie studios. AWS run it, satellite TV services use it, some VFX studios use it, etc. it’s become pretty foundational industry tech.
Firefox vendors a subset of ffmpeg, to decode royalty free codecs such as vp8/9, mp3, flac, on all OSes. We're using our own decoders/demuxers for other royalty free codecs for historical reasons.
It can also use the system's ffmpeg to decode patented formats such as h264 or AAC, on desktop Linux, if the copy of ffmpeg present on the host includes non-free codecs (same on other OSes, with their respective system libraries).
> YouTube does not recommend the RGB color matrix on uploads. In this case, YouTube initially sets the color matrix to unspecified before the standardization. It will then infer the color matrix using the color primaries during standardization. Note that sRGB TRC will convert to BT.709 TRC. YouTube re-tags the color primaries/matrix/TRC to BT.709 when it is not supported by FFmpeg colorspace conversion filter.
Since FFmpeg is gpl2.1 I thought they had to make it easier to know they're using this, like under a "licenses" section, but I don't see anything under studio.youtube.com indicating this.
> Since FFmpeg is gpl2.1 I thought they had to make it easier to know they're using this
Precisely, I doubt youtube just brazenly ignored this, but their legal might have buried it under some obscure link. I wonder if there's a license that require it to be "highly visible" in some sort of way.
Huh, I never really thought about it, but I guess that's one thing that GPL kinda fails at: if your server does all the work, you can leverage open source technology to do all sorts of stuff for your clients, so long as all the work the code you serve them does is collect and present data to the server.
Like, they get to see the code they run, but there's no insight to the code you run to deliver whatever service you're doing. Maybe this is obvious to other people lol, I just hadn't thought of it till now.
Yes, this is the reason AGPL was created. With AGPL software you have to open your changes, even if you don't distribute it and only use it serverside. There are still nuances, but this is the main gist.
This is known as "Tivoisation" (after a dead set top box company), and is the driving motivation behind GPLv3 and some other licenses adding terms to cover this.
Do any of you know if this has Dolby AC-4 support? Ticket 8349 [1] has been open for years to add this, but it's not there yet. This would be very nice so we can watch ATSC 3.0 OTA broadcasts via FFmpeg-based things like Plex.
(Currently if one uses something like a SiliconDust HDHomeRun, viewing an ATSC 3.0 stream requires using their app/player, which uses a SiliconDust cloud service to do the decoding. It'd be really nice to have a not-network-dependent way to view/hear OTA broadcasts.)
ffmpeg is my go-to re-encoder. I didn't know ffprobe existed until I read some "howto" and its also increadibly useful as a way to get fundamental video and audio stream data out as e.g. csv or json.
So I can run an ffprobe script to get x,y info out, decide if the video needs re-encoding, pass to an ffmpeg call which does fast or veryfast settings to reset the x/y scale (for instance)
It's also unquestionably 'self documenting' because all of the sheharazade 1001 options are listed in --help. The problem is knowing which one will make the horse speak.
I've been so frustrated that FFprobe functionality is not part of FFmpeg.
My app extracts screenshots from videos to create a beautiful gallery of videos. But even though I include FFmpeg already, I need a 50mb FFprobe executable to be bundled with my app just so that I can determine the width, height, duration, and fps of a video file!
What is it that FFprobe does that FFmpeg couldn't do with a few extra pieces of exposed API?
ffmpeg and ffprobe are built against the exact same set of libraries, so they could certainly be combined into a single program if the ffmpeg maintainers chose to do so.
Two possible options to reduce the size of your application:
(1) Instead of using ffprobe, just call "ffmpeg -i <filename>" without specifying an output file, then parse stderr:
This is admittedly messy compared to parsing structured ffprobe output, but it does contain all the information you mentioned (assuming duration in centiseconds is sufficiently precise for your application).
(2) Link both ffmpeg and ffprobe dynamically, in which case they'll share all but a few hundred kilobytes of on-disk code.
For example, consider ffmpeg and ffprobe as installed from package repositories on a variety of systems:
ffmpeg 4.4.2 installed by MacPorts on macOS Monterey (x86_64):
$ du -hA /opt/local/bin/ff{mpeg,probe}
339K /opt/local/bin/ffmpeg
260K /opt/local/bin/ffprobe
$ diff -s <(otool -L /opt/local/bin/ffmpeg) <(otool -L /opt/local/bin/ffprobe)
1c1
< /opt/local/bin/ffmpeg:
---
> /opt/local/bin/ffprobe:
72d71
< /usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 228.0.0)
ffmpeg 5.1.2 installed from RPM Fusion on Fedora 37 (x86_64):
I love ffmpeg, a single command of `ffmpeg -ss 01:15:42 -to 01:16:00 -i example.mp4 -c copy output.mp4` will let you create a video clip at a certain point in time from a larger video without decoding/encoding it.
i understand that this is the fastest way to extract frames, but it is limited by key frame availability in the original file (i.e.) you can only start the output from a keyframe.
in this case, it is important to be aware that the times you specify may not be extracted exactly. it will be off by a few frames based on keyframe availability. the only way to extract exact frames is to re-encode. :)
This is true. Do you have a use case where being off by a few frames might be a deal breaker?
Personally I've created dozens of clips using this method and it always turns out ok. It gives you about ~1 second precision on where you want to make your cuts. After I create the clips I can play things back normally, complete with an ability to seek to specific points successfully.
>> Do you have a use case where being off by a few frames might be a deal breaker?
Yes, in action recognition tasks (machine learning), e.g., if you have a large video with temporal annotations (start/end times where an action occurs) you may want to extract clips to sense-check the annotations. Being exact is important.
yeah it probably doesn't matter in practice. you may run into some audio sync issues.
but the ~1 second precision that you see is by accident where the source file happens to have a keyframe every 1 second. that may not be the case always. :)
I don't know if there's any work being done on this, but I wish ffmpeg had better support for modern ML based filters, like super resolution, frame interpolation, segmentation, automatic subtitles, etc. There was an ml filter made years ago as part of a Google summer of code projekt, which includes super resolution, but it's difficult to use and you need to train or find pretrained model files. ML is where video and audio filter research is happening at the moment, hopefully ffmpeg can get a good pipeline going. And please use an inference library that can run on all computers.
I had an API breakage with audio one year ago. Is the "new" seeking API up and running? (because I have segfaults while seeking in mp4 video files with the current one), and I am ready to make the changes in my media player.
Until they keep away from c++ or the ISO planned obsolescence of the C language, keep the SDK minimal, I guess I tolerate their excessive heavy use of nasm macro preprocessor.
If the video is mostly about versioning, well since I use a weekly ffmpeg git...
I wrote a large and messy script to encode a random amount of videos of any aspect ratio so it can fit in a mosaic with the xstack filter.
I think there are still slight audio sync issues, because I should reencode video files individually but I do a single pass instead to not deal with leftover files.
I'm quite happy with the result.
I wish I could write a script to do some basic effect on a video, like adding some moving text. The ideal would be to have an animated SVG file and make a video out of it.
You may want to check out AviSynth [1]. It has been quite a few years since I last used it, but it is able to do things like moving text with a script. It was very fun to work with, particularly being able to copy a script from a previous project and modify it for a new project. When I last used it, I used a text editor called AvsPmod [2] that had some nice integrations.
Yes, that's how I do it. I have a script that takes a list of cuts, source videos/images, and effects, and runs ffmpeg to produce the final result. That's also how many GUI video editing tools work - they just use the GUI to build the cut list.
One that comes to mind is this AMCDX Video Patcher [1] made by Alex Mogurenko. I've played around with it and it's very cool. There is the frame editor to overlay a PNG or file to file where you can insert all or some of another video over your target file. I haven't used it in production but I've played around with it. The file to file tool seemingly only touches the effected blocks of video and there's no rendering.
I just did a test where I was able to cover a persons face with an image. Alex has a video showing him blurring a section but I think blurring option was removed in favour of being able to overlay an image. [2]
The video needs to be Prores and I'm not able to guarantee it's not touching any other part of the frame but it feels like it's just the effected section of the video frame that is getting changed.
If you are trying it out make a copy of your original video before you start, it does the edits in place.
https://fosdem.org/2023/schedule/event/om_vlc/ (Video + Slides)
It’s a bit less dry than the changelog, notably for the evolutions of the APIs.
What’s also important is the changes about the release schedule that we’ve been pushing with the community. Major version every year at the beginning of the year, with ABI and API break, minor releases during the year and an LTS every other year…