Hacker News new | past | comments | ask | show | jobs | submit login
Content-aware image resizing in JavaScript (trekhleb.dev)
489 points by mmazzarolo on April 16, 2021 | hide | past | favorite | 61 comments



You can also apply this to each frame in a video for a rather interesting effect.

My brother has a YouTube channel full of content-aware scaling videos:

https://youtu.be/a8k3b-QNbhs


Wow, that's incredibly interesting!

My first reaction is how actor's faces look surprisingly like traditional caricatures that illustrators do -- e.g. shrinking foreheads and chins which are detail-light but keeping eyes and ears which are detail-heavy.

But my second thought is that the extreme jumpiness in frames occurs because each frame is processed separately. But if you considered each seam not to be a "jagged line" from point A on one edge to point B on the opposite edge of a single frame, but rather a "jagged plane" cutting through a series of frames -- all frames in a single shot -- you could eliminate the jumpiness entirely.

You might need to build a bit more flexibility into it to allow for discontinuities generated from object movement and camera panning, but I wonder if anyone's tried to do something like that?

Though I imagine it might be quite a lot of programming for a tool that might only ever be used as a kind of video filter for entertainment purposes -- I have a hard time imagining a cinematographer ever using it for serious purposes.


That's quite an insight!

Actually the authors of the seam carving paper went on to do just that [0]. From the abstract: "We present video retargeting using an improved seam carving operator. Instead of removing 1D seams from 2D images we remove 2D seam manifolds from 3D space-time volumes. To achieve this we replace the dynamic programming method of seam carving with graph cuts that are suitable for 3D volumes."

[0] https://faculty.idc.ac.il/arik/SCWeb/vidret/index.html


Son of a gun, this is why I love HN. Thank you! And it turns out the results are shockingly good, far better than I expected. They have demo videos at:

https://faculty.idc.ac.il/arik/SCWeb/vidret/results/video_re...

My favorite is:

Original: https://faculty.idc.ac.il/arik/SCWeb/vidret/results/videos/w...

Narrowed: https://faculty.idc.ac.il/arik/SCWeb/vidret/results/videos/w...

Widened: https://faculty.idc.ac.il/arik/SCWeb/vidret/results/videos/w...

Just wow.


Gladly! And yeah, the results really are quite good. This is why I like optimization problems - if you can formally capture what you want as an objective, and if you can find a way to optimize it, you can get surprisingly good results. Of course these are two very big IFs...


Interesting.


>You might need to build a bit more flexibility into it to allow for discontinuities generated from object movement and camera panning, but I wonder if anyone's tried to do something like that?

easier solution would probably be frame interpolation between the two seperate frames.


Interesting. A couple thoughts.... this might be a lot smoother on cartoons since there's usually less colors and noise overall.

It seems as though there's an additional effect (for extra... effect) when they scream. Not sure if that is a natural result of the content of the visual scene being processed, or if there's some sort of audio input into the visual processing, or if they manually/intentionally applied some sort of parameter change (at 0:29 and 0:38 in the video) that causes the video to get all chaotic.


Oh that's horrifying/skin-crawling. I don't know what's wrong with me, but I can't stand to watch it. :)


Very interesting, but it is hard to watch for me because of extreme shaking. Maybe some morphing would smooth the transitions.


Really cool. Do you know what he’s using to do the scaling?


In the video comments he includes a link to a tutorial but essentially he is dumping all the frames and then running a script to content aware scale down the frames 50% and then merges it all back together.


Damn that's trippy


There's an improvement to seam carving using something termed "forward energy", see: https://avikdas.com/2019/07/29/improved-seam-carving-with-fo...


I wonder if further research on this has been done.

For example, what if some ML tagging mechanism is used to find the silhouette of interesting objects in the image (people, animals, traffics signs, etc), and then "freezing" them to prevent the energy function from operating on those areas, thus preserving those objects intact, while resizing the rest of the image.


I don't remember where I saw it linked from, but someone did that with face detection: https://github.com/esimov/caire


You see lots of demos of this, I think because the algorithm is interesting but also pretty easy to implement.

As an approach it seems to do a decent job with either very small changes (e.g. slight change of aspect ratio) or uninteresting images, but aesthetically the results seem bad on most interesting images; I suspect because targeting "low information" regions of the image removes tension that is needed. Often a simple crop is much better, it seems.


That what I was amazed by. The algorithm is pretty ingenious.


Learned about this from the amazing Grant Sanderson (3b1b) in the Computational Thinking course using Julia.

https://www.youtube.com/watch?v=rpB6zQNsbQU


Unsurprisingly it fails when there are no low-risk paths: https://i.imgur.com/58d5AFM.png

Brilliant implementation anyway. Having a lot of fun!


Interesting have someone somewhere found a method to handle images like this?


Totally unrelated to this great project, but I'm curious: do people really use content-aware resizing feature in practice?

I use Photoshop frequently, and I use content-aware removal A LOT (super handy). But it never occured to me, not even once, that I need to use the content-aware resizing despite it's there for years. If I really need to change the ratio of an image/photo I usually just crop.


Yes, I use it daily. Here is my project:

https://github.com/jftuga/photo_id_resizer

I am using this content aware image resizing library, which is used for: "Face detection to avoid face deformation."

https://github.com/esimov/caire


Now that I am aware it exists, I can use it for blog headers where I'd like to maintain the same ratio, this would be needed for images that have details around the edges I don't want to lose.


It would be awesome if sharp (https://github.com/lovell/sharp) would implement this algorithm


On a side note, thanks for posting this. I didn't sharp existed and had been doing things the hard way (process calls to image magick).


I wish there were more of such interesting and educational articles with great examples and explanations.


Yaz, Right???

One of the biggest complaints I have about HN is that it promotes really crappy "Look at me! I just learned a thing and wrote a 300 word blog doing a crappy job explaining it because I don't really get it but want to pad my CV..."

This article is exceptional. Thank you OP.


> promotes really crappy "Look at me! I just learned a thing and wrote a 300 word blog doing a crappy job explaining it because I don't really get it but want to pad my CV..."

This is a broad brush, are you sure the intent is always resume padding? Some folks (like me) write poorly but I find writing tests what I know (and shows me what I don’t). I share anyway so I can be corrected and learn more, and so others might benefit if they have a similar problem. Your comment felt like shaming.

> This article is exceptional. Thank you OP.

100% agree, OP’s writing and content are examplary!


> Some folks (like me) write poorly but I find writing tests what I know (and shows me what I don’t)

That's fine, just don't have such a big ego you need to share your crap with the world unless you have something important to say. That's why when you try to google something to learn, you have to wade through pages and pages of half-baked crap: all the good stuff has been drowned out.


Unfortunately HN is not immune from lowest common denominator content.


Tragedy of the commons. Private hacker communities tend to produce higher quality content.


This should be a prime example where WebAssembly could come into play, no?


WebGL would work really well for this assuming the constrained shader subset it has to work with can actually do the analysis and transforms


That example where you can upload your own image is amazing. It even shows the seams being carved in real time. Well done!


Simple solutions are great and I find seam carving elegant, but maybe that's an application where machine learning can shine?

As globally the task is defined as displacing pixels while minimizing a perceptual loss, it should be reasonably easy to express in a differentiable way. The benefits I see are higher quality semantics preservation, and potentially faster inference (one pass only).

The recent development of transformer models might provide just the tool to tackle variable sizes efficiently, maybe I should give it a go

Edit: if you're interested too and want to play on it together, shoot me a message :)


I think this would make a very good data augmentation for training deep learning models, because the resulting images are both unique, not just linearly transformed, and still often look natural.


One of the good explanation of seam carving, but there are other facts as well.

1. It was first developed by Shai Avidan at MERL.

2. Then introduced in paper by Vidya Setlur, Saeko Takage, Ramesh Raskar, Michael Gleicher and Bruce Gooch in 2005 which won 10-year impact award in 2015.

3. Adobe Systems acquired a non-exclusive license to seam carving technology from MERL and implemented in Photoshop CS4.


Do you meant that this is patented?


As I said licence is non-exclusive so anyone can use it but it was mentioned that adobe used Content Aware Scaling.


Worked well with the "Pale Blue Dot" by Voyager regardless of which sub-region containing the dot I uploaded.

Then, when uploading the Solar System, it managed to capture each planet and its label without distorting them while only removing the space in-between... except for Saturn's rings which became wobbly :)

Architecture pictures tends to perform horrible because they contain so many straight lines and perspective cues. Faces are too stretched regardless of aspect ratio.


Amazing! I tried it with a schematic (practically all right angles), and it did an impressive job (until it ran out of room and decided to mess with the text). Of course, images work great too :)

I do have one question: I see this is based on RGB, but how good is a "seam carving" implementation using RGB compared to one based on a color space more like human vision (such as CIELAB)?


I did seam carving as part of Coursera's Algorithms course by Bob Sedgewick. It was gratifying.


I first saw this algorithm in the mid 2000s, maybe when it was invented. It’s simple in retrospect but it’s a beautiful fusion of physics, algorithms, and graphics. Amazing things happen at the cross section of different disciplines.


Has the author put this up on `npm` yet? I don't see it and want to use it in platforms like ObservableHQ easily.


Nice presentation. You can also add regions that are important to be kept do as not to distort van gogh's face


I've been playing with this for half an hour now trying different images. Really fun. Nice work.


Gosh dang this is amazing.


I was very impressed with this algorithm when I first found it and am very happy to see this implementation that seems really polished.

What are the performance implications of this? Would it be possible and or a good idea to implement this in WebASM?


Is there an opposite of this, where it will expand an image size?


That's more related to a problem known as infilling. Generally throwing out information is a lot easier than generating it. You can do some statistical things by sampling other points of the image into newly created "gap" but it will probably look bad if fully automated, at least on big changes.

There's an obvious version of the algorithm in that direction. For one line "seam", it's easy enough, you just pull data from either side. But repeatedly applying it the more often your new "seams" end up next to something already estimated, the less real information there is - I suspect this becomes visually noticeable pretty fast.


Are you referring to image inpainting? I think that's what it's usually called (please correct me if I'm wrong though!)

Although I'm not really familiar with traditional algorithms for inpainting, I've seen some ML research do some stuff with it that I found to be really impressive.

One demo that really stood out to me was the following: https://shihmengli.github.io/3D-Photo-Inpainting/

The algorithm they describe is able to inpaint pixels AND depth information from existing RGB-D photos, enabling images to be viewed in 3d space and be used with parallax effects. Really cool stuff!


> Are you referring to image inpainting?

Yes, too late to edit but that's the more common name.


It'd be pretty interesting to train an ML model. You could generate a bunch of training examples: downsize lots of images, then use the upsized versions as targets.

It's not quite the same thing as superresolution, since it's seam carving.


Yeah, I believe Seam Carving paper did this. Pretty simple, find the low-energy path and do linear interpolation between the two neighboring pixels (I think it also introduced some tricks to find n low-energy paths at once to avoid inserting into the same path again and again).

But like the top comment pointed out. This algorithm is easy to implement and interesting, but in real-world examples are not better than salient object detection + cropping.


daaaang that's really nice. i sort of wondered if someone had done something like this in javascript. is really cool to see, nice work.


Strange, but someone is still writing about algorithms that are not deep learning these days!


Very cool for simple images like the demo ones provided. But images with detailed content don't resize well and are much worse than a naive resize.

Try: https://unsplash.com/photos/ZtRuoAKr9vM

Resize: 50% width, 70% height

The basketball hoop is heavily distorted, as is the court, the squares on the building and the 3 point line.


That is explained in the section with the van Gogh painting. It is not like they are advertising the algorithm as a jack of all trades.


In this specific image there are even greater difficulties than in the Van Gogh painting: the unpainted wall portions on the left prevail over the "low energy" surfaces that surround them, and there are many rectilinear structures that demand very good preservation, like court lines, wall edges, iron bars, and worst of all the backboard, located in the middle of lower energy regions (the low contrast purple grid) that attract seams that bite off the corners of the backboard.

This image should look better with a strong penalty against seams that depart from vertical or horizontal lines, but it wouldn't be enough: the purple pillars and panels would be straighter but still squeezed.


Jesus, JavaScript.

As a plumber once said to me; you can't flush an 8 inch shite down a 4 inch hole.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: