AutoFlip: an open-source framework for intelligent video reframing

krisoft · on Feb 15, 2020

"By detecting the subjects of interest, AutoFlip is able to avoid cropping off important visual content."

And they manage to demo it with a scene where a lady talks to a guy in a checkered shirt in front of a yellow background. The algorithm cuts off the lady completely and only keeps the guy. I guess she mustn't have been important. Next time she should try to look more like "interesting, salient content".

rcthompson · on Feb 15, 2020

Yeah, I noticed that. While it's possible this is the result of the machine learning algorithm implicitly picking up a male bias in the data set that causes it to rank men more highly in terms of importance, I don't know how likely that is in practice. Maybe the explanation is as simple as the man taking up more pixels, or a bias toward lighter colors (like the man's white & blue shirt) while black colors (like the woman's clothing) are more likely to be considered unimportant background. Maybe the algorithm considered both approximately equally important and the result was essentially a coin flip.

Regardless, I agree that it's not a great look when the caption on that example talks about how AutoFlip "detects the subjects of interest" and "avoids cropping off important visual content". It's definitely not sending the intended message.

drusepth · on Feb 16, 2020

I interpreted it as just picking one visual entity to focus on for that cropped clip for design reasons, since the letterboxes necessary to get both entities in that portrait orientation would be huge and visually unappealing [1].

[1] https://i.imgur.com/yxxEnGe.png

aloknnikhil · on Feb 15, 2020

> AutoFlip’s configuration graph provides settings for either best-effort or required reframing. If it becomes infeasible to cover all the required regions (for example, when they are too spread out on the frame), the pipeline will automatically switch to a less aggressive strategy by applying a letterbox effect, padding the image to fill the frame.

Looks like that was particularly demoing the "best effort" mode. If you see the demo near that paragraph, it highlights the possibility of having certain faces cut-off unless you ask for all faces to be required.

ZeroGravitas · on Feb 15, 2020

They cover this on the what's next section:

> Like any machine learning algorithm, AutoFlip can benefit from an improved ability to detect objects relevant to the intent of the video, such as speaker detection for interviews

ArtWomb · on Feb 15, 2020

Results actually look really good. Having to just worry about filming in one format, such as widescreen 4K. Means you just have to worry about getting the shot. I think its good enough to build an emotion predictor vector.

dehrmann · on Feb 15, 2020

Aside, but I've always wondered how pan and scan was done in practice for converting film to VHS. It seems either tedious or sloppy.

btown · on Feb 16, 2020

Pan and scan was historically done manually. Turner Classic Movies made a great 5-minute documentary about the practice and how much of the filmmakers' intention it loses. Definitely worth a watch: https://www.youtube.com/watch?v=5m1-pP1-5K8

sharpercoder · on Feb 16, 2020

This is a more scalable solution then the Spectacle solution (record both hori- and vertical). I wonder if filmmakers will adapt their filming to cater to this algo - i.e. having a bigger margin to the edge in order to have AutoFlip give even better results.

32gbsd · on Feb 15, 2020

Anytime I see anything with the phrase "open source" + "google" I immediately think programmer API trap.