My first reaction is how actor's faces look surprisingly like traditional caricatures that illustrators do -- e.g. shrinking foreheads and chins which are detail-light but keeping eyes and ears which are detail-heavy.
But my second thought is that the extreme jumpiness in frames occurs because each frame is processed separately. But if you considered each seam not to be a "jagged line" from point A on one edge to point B on the opposite edge of a single frame, but rather a "jagged plane" cutting through a series of frames -- all frames in a single shot -- you could eliminate the jumpiness entirely.
You might need to build a bit more flexibility into it to allow for discontinuities generated from object movement and camera panning, but I wonder if anyone's tried to do something like that?
Though I imagine it might be quite a lot of programming for a tool that might only ever be used as a kind of video filter for entertainment purposes -- I have a hard time imagining a cinematographer ever using it for serious purposes.
Actually the authors of the seam carving paper went on to do just that [0]. From the abstract: "We present video retargeting using an improved seam carving operator. Instead of removing 1D seams from 2D images we remove 2D seam manifolds from 3D space-time volumes. To achieve this we replace the dynamic programming method of seam carving with graph cuts that are suitable for 3D volumes."
Son of a gun, this is why I love HN. Thank you! And it turns out the results are shockingly good, far better than I expected. They have demo videos at:
Gladly! And yeah, the results really are quite good. This is why I like optimization problems - if you can formally capture what you want as an objective, and if you can find a way to optimize it, you can get surprisingly good results. Of course these are two very big IFs...
>You might need to build a bit more flexibility into it to allow for discontinuities generated from object movement and camera panning, but I wonder if anyone's tried to do something like that?
easier solution would probably be frame interpolation between the two seperate frames.
Interesting. A couple thoughts.... this might be a lot smoother on cartoons since there's usually less colors and noise overall.
It seems as though there's an additional effect (for extra... effect) when they scream. Not sure if that is a natural result of the content of the visual scene being processed, or if there's some sort of audio input into the visual processing, or if they manually/intentionally applied some sort of parameter change (at 0:29 and 0:38 in the video) that causes the video to get all chaotic.
In the video comments he includes a link to a tutorial but essentially he is dumping all the frames and then running a script to content aware scale down the frames 50% and then merges it all back together.
I wonder if further research on this has been done.
For example, what if some ML tagging mechanism is used to find the silhouette of interesting objects in the image (people, animals, traffics signs, etc), and then "freezing" them to prevent the energy function from operating on those areas, thus preserving those objects intact, while resizing the rest of the image.
You see lots of demos of this, I think because the algorithm is interesting but also pretty easy to implement.
As an approach it seems to do a decent job with either very small changes (e.g. slight change of aspect ratio) or uninteresting images, but aesthetically the results seem bad on most interesting images; I suspect because targeting "low information" regions of the image removes tension that is needed. Often a simple crop is much better, it seems.
Totally unrelated to this great project, but I'm curious: do people really use content-aware resizing feature in practice?
I use Photoshop frequently, and I use content-aware removal A LOT (super handy). But it never occured to me, not even once, that I need to use the content-aware resizing despite it's there for years. If I really need to change the ratio of an image/photo I usually just crop.
Now that I am aware it exists, I can use it for blog headers where I'd like to maintain the same ratio, this would be needed for images that have details around the edges I don't want to lose.
One of the biggest complaints I have about HN is that it promotes really crappy "Look at me! I just learned a thing and wrote a 300 word blog doing a crappy job explaining it because I don't really get it but want to pad my CV..."
> promotes really crappy "Look at me! I just learned a thing and wrote a 300 word blog doing a crappy job explaining it because I don't really get it but want to pad my CV..."
This is a broad brush, are you sure the intent is always resume padding? Some folks (like me) write poorly but I find writing tests what I know (and shows me what I don’t). I share anyway so I can be corrected and learn more, and so others might benefit if they have a similar problem. Your comment felt like shaming.
> This article is exceptional. Thank you OP.
100% agree, OP’s writing and content are examplary!
> Some folks (like me) write poorly but I find writing tests what I know (and shows me what I don’t)
That's fine, just don't have such a big ego you need to share your crap with the world unless you have something important to say. That's why when you try to google something to learn, you have to wade through pages and pages of half-baked crap: all the good stuff has been drowned out.
Simple solutions are great and I find seam carving elegant, but maybe that's an application where machine learning can shine?
As globally the task is defined as displacing pixels while minimizing a perceptual loss, it should be reasonably easy to express in a differentiable way. The benefits I see are higher quality semantics preservation, and potentially faster inference (one pass only).
The recent development of transformer models might provide just the tool to tackle variable sizes efficiently, maybe I should give it a go
Edit: if you're interested too and want to play on it together, shoot me a message :)
I think this would make a very good data augmentation for training deep learning models, because the resulting images are both unique, not just linearly transformed, and still often look natural.
One of the good explanation of seam carving, but there are other facts as well.
1. It was first developed by Shai Avidan at MERL.
2. Then introduced in paper by Vidya Setlur, Saeko Takage, Ramesh Raskar, Michael Gleicher and Bruce Gooch in 2005 which won 10-year impact award in 2015.
3. Adobe Systems acquired a non-exclusive license to seam carving technology from MERL and implemented in Photoshop CS4.
Worked well with the "Pale Blue Dot" by Voyager regardless of which sub-region containing the dot I uploaded.
Then, when uploading the Solar System, it managed to capture each planet and its label without distorting them while only removing the space in-between... except for Saturn's rings which became wobbly :)
Architecture pictures tends to perform horrible because they contain so many straight lines and perspective cues. Faces are too stretched regardless of aspect ratio.
Amazing! I tried it with a schematic (practically all right angles), and it did an impressive job (until it ran out of room and decided to mess with the text). Of course, images work great too :)
I do have one question: I see this is based on RGB, but how good is a "seam carving" implementation using RGB compared to one based on a color space more like human vision (such as CIELAB)?
I first saw this algorithm in the mid 2000s, maybe when it was invented. It’s simple in retrospect but it’s a beautiful fusion of physics, algorithms, and graphics. Amazing things happen at the cross section of different disciplines.
That's more related to a problem known as infilling. Generally throwing out information is a lot easier than generating it. You can do some statistical things by sampling other points of the image into newly created "gap" but it will probably look bad if fully automated, at least on big changes.
There's an obvious version of the algorithm in that direction. For one line "seam", it's easy enough, you just pull data from either side. But repeatedly applying it the more often your new "seams" end up next to something already estimated, the less real information there is - I suspect this becomes visually noticeable pretty fast.
Are you referring to image inpainting? I think that's what it's usually called (please correct me if I'm wrong though!)
Although I'm not really familiar with traditional algorithms for inpainting, I've seen some ML research do some stuff with it that I found to be really impressive.
The algorithm they describe is able to inpaint pixels AND depth information from existing RGB-D photos, enabling images to be viewed in 3d space and be used with parallax effects. Really cool stuff!
It'd be pretty interesting to train an ML model. You could generate a bunch of training examples: downsize lots of images, then use the upsized versions as targets.
It's not quite the same thing as superresolution, since it's seam carving.
Yeah, I believe Seam Carving paper did this. Pretty simple, find the low-energy path and do linear interpolation between the two neighboring pixels (I think it also introduced some tricks to find n low-energy paths at once to avoid inserting into the same path again and again).
But like the top comment pointed out. This algorithm is easy to implement and interesting, but in real-world examples are not better than salient object detection + cropping.
In this specific image there are even greater difficulties than in the Van Gogh painting: the unpainted wall portions on the left prevail over the "low energy" surfaces that surround them, and there are many rectilinear structures that demand very good preservation, like court lines, wall edges, iron bars, and worst of all the backboard, located in the middle of lower energy regions (the low contrast purple grid) that attract seams that bite off the corners of the backboard.
This image should look better with a strong penalty against seams that depart from vertical or horizontal lines, but it wouldn't be enough: the purple pillars and panels would be straighter but still squeezed.
My brother has a YouTube channel full of content-aware scaling videos:
https://youtu.be/a8k3b-QNbhs