Hacker News new | past | comments | ask | show | jobs | submit login
FFmpeg Drawtext Filter for Overlays, Scrolling Text, Timestamps on Videos (ottverse.com)
181 points by ponderingfish on Oct 23, 2020 | hide | past | favorite | 79 comments



Here's mine for doing crab videos

    ffmpeg -ss 66 -i crab.mp4 -t 30 -crf 27 -preset veryfast -vf "drawtext=fontfile=/usr/share/fonts/truetype/liberation/LiberationSans-Bold.ttf:text='YOUR TEXT HERE':fontcolor=white:fontsize=240:shadowx=5:shadowy=5:box=1:boxcolor=black@0.0:boxborderw=5:x=(w-text_w)/2:y=(h-text_h)/2:enable='between(t,9,30)',scale=iw/3:ih/3" -c:v libx264 -filter:a "volume=0.2" -c:a libmp3lame output.mp4
You just need to get the original crab rave video from youtube: https://www.youtube.com/watch?v=LDU_Txk06tM

You might also need to supply your own font path I'm not sure how universal they are.


If you don't mind my asking... what's a crab video


You afix a caption describing an unfortunate scenario onto the song for comical juxtaposition, since the Crab Rave song itself is jovial sounding. For example, one might add the caption "I got fired".



When I switched to Linux and started using FFmpeg, the idea of automating video production (this thing that was so time intensive in my mind) was so appealing. This is an awesome little example. What a damn good piece of software.


Just wait until you try out Gstreamer

(and suffer a few weeks learning it! I don't even remember C++ being so painful)


I downloaded the video (720p) and I have to tweak slightly the command to make it work on MacOS Catalina (FFmpeg installed via brew)

  ffmpeg -ss 66 -i crab.mp4 -t 30 -crf 27 -preset veryfast \
  -vf "drawtext=fontfile=~/Library/Fonts/SF-Compact-Display-Bold.otf:text='YOUR TEXT HERE':fontcolor=white:fontsize=68:shadowx=5:shadowy=5:box=1:boxcolor=black@0.0:boxborderw=5:x=(w-text_w)/2:y=(h-text_h)/2:enable='between(t,9,30)'" \
  -c:v libx264 -filter:a "volume=0.2"  output.mp4
Thank you very much, crazy how powerful but hard is FFmpeg.


Ah yes! Sorry the settings I posted are for the original crab rave music video in 4k.


My experience with fmpeg is like a DSL embedded into another DSL. It's very powerful yet confusing at times. I saved a ton of ffmpeg one-liners in a notebook just in case.


FWIW when I asked about doing something on IRC (splitting a stream, process one, and overlay it over the existing one at 20%), they were quite helpful.

It was: `split[a][b];[b]lut3d=${path}[c];[a][c]blend=all_mode=overlay:all_opacity=${opacity}`


I find gstreamer (using gst-launch)[1] a more pleasant experience than ffmpeg.

You can't use some expresions on the command line like `x=(w-text_w)/2:y=(h-text_h)` but it can be workaround with some scripting using python bindings for example.

Some (simple) text overlay can be acomplished with:

  gst-launch filesrc location=sample.mp4 \
    ! decodebin \
    ! videoconvert \
    ! textoverlay text="Hello" font-desc="Sans Bold 150px" y-absolute=0.5 \
    ! autovideosink

1. http://manpages.org/gst-launch


They also have very cool support for HTML overlays: https://base-art.net/Articles/web-overlay-in-gstreamer-with-...


Looks like BBC’s Brave project was the inspiration for this: https://github.com/bbc/brave (specifically the gst-WebRenderSrc) Brave is a real-time remote video/audio editing app. Looks neat.

Unrelated but wow, BBC has quite a bit of interesting and relevant open source projects. Simorgh, their react SSR framework, caught my eye “used on some of our biggest websites”. Encouraging for those looking to build out performance react/amp platforms https://github.com/bbc/simorgh


I used CasparCG[0] to do live html overlays with a major broacaster out of Singapore about 5 years ago, still going strong. The actual on air graphics that were used rather tame compared with the sample ones I did to prove the system.

[0] https://github.com/CasparCG/


Why do you say you can't do expressions like that on the command line? I do it all of the time. You have to escape the parentheses, but other than that it is totally doable.


You can't use expressions like that (from ffmpeg cmld) in gst-launch.

As far as I know in gst-launch you can't access properties from one element (video stream width) in other elements.


Yes, and the best part is everything is turned into command line arguments.

ffmpeg of 15 years ago was confusing, but now it's so simple - to compile, to add your own filters, and to use, they've done a great job.


Yes, same here. The big problem is that the filters DSL is probably too complex because it uses too many short expressions.

It is unbelievably powerful though.


Very true. And, the parameters change over time, but, the stackoverflow answers using the old parameters remain on top. This IMO adds to the confusion :)


Yes, the stackexchange (implicit) model contributes to this. It assumes once a Q has been asked and answered, the issue is resolved. Due to the non-duplicates policy, any corrections or updates are supposed to be added to the same thread. But the original accepted answer will have a tick mark next to it and likely a high score. So naive users don't read further and the obsolete answers remain heeded.

Some sort of deprecation or salience decay needs to be added.


You can edit the accepted answer!


I don't like to mix voices - that's inorganic when the changes are more than minor and can lead to parts of the 'answer' being at odds with itself.

Except for typos, syntax..etc, I don't edit someone else's answer; I leave a comment and let the original respondent edit.


Indeed. I use libraries such as MoviePy* that wrap ffmpeg and allow you to implement effects on each frame.

* https://zulko.github.io/moviepy/index.html


Seems like everyone has a notebook of ffmpeg commands: https://hhsprings.bitbucket.io/docs/programming/examples/ffm...


Yes

And the worse part is everything is turned into command line arguments

It would be easier if you could specify it in something more human readable like (sigh) yaml


someone need to make a GPT-3 to ffmpeg commands converter


I wrote a small Python front-end for ffmpeg just so I can do some of my most common video encoding tasks done without Google.


I am not a native English speaker and needed a translation for OTT (Over The Top). Here's a link from this very website (hard to find):

https://ottverse.com/what-is-ott-video-streaming/


It's confusing for native English speakers as well.


I'm a broadcast engineer with 15 years of experience, and before that experience doing web broadcasting before things like youtube existed.

I still get confused by the terminology.


Does anyone have some experience with this use-case: I have a MP4 file and a SRT file with subtitles is there a convenient way to re render the video with subtitles drawn on it so I can use it in places where subtitles are not supported? Is it a super cpu intensive process that takes a long time for let's say a typical 2 hour movie?


Burning subtitles into the video implies re-encoding. And if you want maintain video quality, you'll have to choose a pretty good quality setting like crf=24 and this will take a lot of time even on H.264/AVC.

You are looking at multiple hours (>2) for a 2-hour movie from my experience. One way to reduce this is to spin up a DigitalOcean droplet (a $80 per month droplet) that'll give you a lot of processing power, finish your encoding in 2-3 hours,and shut it down. It'll save you a lot of headache and time.

Not sure I answered your question, but, hope that helped!


> Burning subtitles into the video implies re-encoding

In principle, would it be possible to only re-encode the macroblocks affected by the subtitles, leaving the rest of the frame bit-for-bit identical?

It's been a long time since i read a video encoding specification, so i don't know if things still work this way.


Much harder than it sounds because future macroblocks depend on present ones, and motion vectors mean that over a few seconds, transitive dependency can reach the entire image (and does in panned and zoomed shots)

Can possibly work, but depending on movie content might not save all that much encoding.


What would be much better would be to encode an overlay raster stream, which the player can then composite at runtime, like they do with text subtitles.

Much faster to encode. Can be toggled. But players need to support it.


Yeah, but the whole point of the exercise is to see subtitles on players that don't even support subtitles; they're definitely not going to support an overlay stream.


Thanks. I'm surprised that there isn't something like lambda where you can just get a really fancy cpu on demand for a quick encoding job.


Paralleling many of the other comments that give excellent test advice, it's also worth seeing if your CPU has Intel Quicksync support, which can give a hardware-accelerated rendering option to ffmpeg. See below link for reference of the flags.

https://trac.ffmpeg.org/wiki/Hardware/QuickSync

Even the most low-end processors like the Celeron, if they have embeded GPUs, have this encoding acceleration built in and it makes a huge difference versus using the general purpose CPU portion. The generation of CPU will determine the encoders/decoders available.

I learned about this from the below post, and have been using the functionality where ever I can as it applies many of the popular transcoding sofwares out there, for example handbrake also.

https://forums.serverbuilds.net/t/guide-hardware-transcoding...


> I have a MP4 file and a SRT file with subtitles is there a convenient way to re render the video with subtitles drawn on it so I can use it in places where subtitles are not supported?

You're going to have to reencode it, which is a lossy and CPU-intensive process (unless you manage to get hardware encoding working with your video card, which is possible but very fiddly IME).

> Is it a super cpu intensive process that takes a long time for let's say a typical 2 hour movie?

There are a lot of compute/filesize tradeoffs that you can make. If you don't mind a file that's 2-3x larger than your original then you can do it fairly quickly (say 1/3 of realtime). If you don't mind an effectively uncompressed video (so tens or hundreds of gigabytes) then you can do it as fast as your disks will write. If you want something similar to the original without sacrificing too much quality then it'll probably be slower than realtime.


If compression ratio or filesize isn't a priority, NVENC definitely is good enough to consider. The speed is just too good.

It can do 300fps+ (for simply transcoding. Burning in subtitles probably will slow it down a little ibt) already on my very dated GPU, likely would be even faster on newer ones.


What some people might not know is that the transcoding results and quality are hardware dependent. You will get different results depending on whether you're using NVENC, intel's on chip stuff, actual encoding hardware, etc.

Edit: Obviously it's more complicated than this. But I think this is one of the reasons behind the need for things like Netflix's VMAF


Also generations of hardware matter - earlier GeForces and Intel CPUs have a worse encoding block than newer generations.

In any case, they massively lag behind software encoding.


I mean, obviously it would be the case for NVENC. Old(er) versions don't even support B frames.

But I guess you can call it "firmware" or "software" dependant too, because they don't really use/support the same version of NVENC to begin with.

And I doubt the result is significantly different, if any, for software encoders like x264 - of course assuming you use the exactly same version and parameter.


That's right. I did not realize it before: if you do not mind the size, just "dumping" the stream will work quite fast, obviously.


ffmpeg -i video.mp4 -vf subtitles=subtitle.srt -t 300 out.mp4

The above line will hardcode the subtitles onto the video and stop after five minutes. This should give you an idea of how slow re-encoding is going to be on your machine, and what the quality of the result is going to look like.

What you ideally want is that your rendered video has approximately the same bitrate as the original, and the same visual quality. You can play with the -crf parameter to get the quality the same, and the -preset parameter to get the bitrate the same.

For reference, when I'm rendering 720p video I'm getting about 4x speed on my 10th gen Core i5. 1080p should take about twice the time.

I've played around a bit with using the gpu-accelerated encoders in ffmpeg, but I could never get the same quality or bitrate as the cpu encoder, it seems to me that they're tuned for quickly turning raw capture into something that can realistically be streamed and handled.


And when ingesting srt files in ffmpeg. Ffmpeg expects the srt file to be utf8 encoded, but it is sometimes encoded in Windows-1252 which is the original format. So you might need to transcode it first.


Urghk. I had no idea, but then again I would never use .srt myself, it's a terrible format. I'm always using .ass, because then you can get proper left-aligned subtitles, proper font, and a proper box around them.


You can set the charenc option for other character sets.


Is it that a separate subtitle file is not supported or subtitles are not supported at all? Otherwise you can use ffmpeg to merge the srt file and the mp4 file into either a single mp4 or mkv file without re-encoding the video (using the -c copy flag). Many players support subtitles embedded into a .mp4 file.


^this. I'd be curious what player gp has found that can playback the mp4, but not read embedded subtitles. There are a few that have poor support for external (separate file) subtitles.

See eg: https://mutsinzi.com/add-srt-subtitles-to-quicktime/

TL;DR:

ffmpeg -i yourVideo.mp4 -i yourSubtitle.srt -c:v copy -c:a copy -c:s mov_text -metadata:s:s:0 language=eng yourOutputVideo.mp4

I'm uncertain if there are some metadata you could toggle to hint at default subtitle language and/or default to subtitles on (or off). I generally use vlc, so this isn't much of a problem for me.


You can hint the video player with metadata.

To make the first subtitle track default:

  -disposition:s:0 default
Or you can make it forced:

  -disposition:s:0 forced
And to clear the disposition of the second subtitle track (if it was default in the source stream)

  -disposition:s:1 
The problem is that video players are inconsistent in term of how they honour these parameters.


You can do this easily with Handbrake as well - using the "hard burn" option: https://handbrake.fr/docs/en/latest/advanced/subtitles.html


or with VLC.


It will probably take approx 1:1 time. Maybe a bit more but not so much. As another sibling says, you need good quality re-encoding. Obviosly it depends on your CPU/GPU but I guess that is a giod approximation.


Disclaimer: I work for Transloadit

At my job we have an API that lets you do this to "burn in" subtitles onto a video.

A demo for this: https://transloadit.com/demos/video-encoding/add-subtitle-to...


I don't know how Plex does it, but it burns subtitles in near real time and I don't notice any quality hit


Plex uses FFmpeg for transcoding video and audio.

See https://support.plex.tv/articles/200250377-transcoding-media...


So does Airflow, which I'm a happy user of.

https://airflow.app/


Ooo that's interesting. I am going to check it out. What resolution btw? 1080p60? Same codec before and after?


I'm sorry I'm not knowledgeable in that area to tell you what it does, but it works regardless of resolution and quality. 4k or 1080p, it doesn't matter.

What's also impressive is it can stream your media at a lower quality, say 4k to 720p to your phone, also in near real time, on a very old laptop that I repurposed into a NAS.


By the way, I have just donated to this awesome project (the comments I made previously have reminded me that I had not done so...).


Is there a SAAS that enable the powers of ffmpeg filters through a user friendly gui?


There's this: https://github.com/mifi/lossless-cut

My startup is using ffmpeg filters to create video courses so the GUI is aimed towards that use case rather than being flexible: https://blog.modernlearner.org/product-update-split-content-...

The results aren't bad: https://www.youtube.com/watch?v=L-NjLrwTyxs

Great results depend on fine-tuning of the ffmpeg filters and having great quality video to work with.

The ffmpeg filter mini-language is easy to work with when you're targeting a particular use-case.


Probably not exactly what you mean, but the closest I can think of would be https://www.kapwing.com/


Does anyone know of an effecient way of overlaying lowerthirds graphics onto live video with ffmpeg? Ideally the overlay would be a webpage.


This is great, ffmpeg has a lot of great utilities within it.

I'm using it for creating short videos: https://www.youtube.com/watch?v=L-NjLrwTyxs

And I'm scratching the surface since ffmpeg has so many amazing filters: https://ffmpeg.org/ffmpeg-filters.html


FFmpeg has almost never worked for me on simple conversion jobs, now adding such advanced features.


I am curious if you would like to share your experiences. I use FFmpeg (almost) daily and I would go as far as saying it is one of the most reliable pieces of software I 've ever used.


It all depends on what kind of data you're feeding it. I'd been using ffmpeg for years without ever seeing it segfault, until I started processing videos at an unusually small resolution and suddenly it crashes a lot. (But nondeterminstically, so I can put the command in a loop and it'll eventually finish successfully.) Well, I submitted a core dump via Ubuntu's Apport and hopefully someone will have a look at it.


For quicker triage, open a ticket at trac.ffmpeg.org.


What are you using it for? I have the opposite experience - where almost anything or everything else fails, ffmpeg is able to handle it.


Really? For simple conversions?

I've been running basically this, which is a simple conversion to ogv, mp4, and VP9 webm, with a tiny bit of audio cleaning:

    ffmpeg -y -nostdin -i $f -threads 8 -codec:v libtheora -qscale:v 4 -codec:a libvorbis -af afftdn -qscale:a 0 $f.ogv -codec:v libvpx-vp9 -qscale:v 4 -codec:a libvorbis -af afftdn -qscale:a 0 $f.webm -codec:v mpeg4 -codec:a aac -af afftdn -qscale:a 0 -qscale:v 4 $f.mp4
Across a huge dataset from a huge number of backgrounds, usually incredibly crap, for about 18 months, now. Without downtime. And with decent quality output, or at least pretty close to equal the input (did I mention it is often incredibly bad? Including near-broken files, and even some files with less than 20,000 pixels, total).

I'm not sure exactly what "never worked" means without more detail, but I'd say it's a workhorse.


It's actually super powerful and useful for simple one-off encodes or complex batch processes. I have worked for clients who required an end-to-end transcoding pipeline (ABR) with packaging, subtitles, multiple-audio, etc. and it took a 100 line shell script to get everything done.


It's true that if you simply use it on it's simple form such as "ffmpeg -i in.mp4 out.mov" the default bitrate will favour creating small files that will have artefacts, as soon as you set the right option you get very nice results.


Can you share any typical settings you recommend? I’m experiencing small amounts of artifacts, would love to know of someone’s “safe default”. Thanks!


It's difficult because it depends on the player you will use and the destination codec.

I keep a document with arguments I use for each of my use case.


  -c copy


I think a Magit-like interface for ffmpeg would make ffmpeg much easier to use


I usually steer new-comers to Handbrake. That's a great starting point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: