Hacker News new | past | comments | ask | show | jobs | submit login
Video Stabilization on YouTube (googleresearch.blogspot.in)
196 points by Garbage on May 6, 2012 | hide | past | favorite | 48 comments



Seems to work well- I just tried it on a tiger cub vid, which I've had for years and was taken with an el-cheapo digital camera: Shaky version: http://youtu.be/g5P9WwHdOSI Stabilised version http://youtu.be/wXfbRUk_1Bg

There is some motion blur (around 24 seconds, when the momma tiger lies down) which is a bit puzzling once the image has been stabilised. But hey, it's better than it was before.


The motion blur was there in the first place, you just don't notice it because there's also _motion_. When you remove the motion (shaking, jerking the camera), you're left with just the blur.


Yes, I know. But the viewer won't know that, because they don't know where the motion was before it was stabilised.


It also looks like the stabilized video is magnified a little bit. Regardless, these tools are really neat and they are still in their infancy. It's good to see that GooTube still believes in user-generated content.


Cropping and magnifying the video is required for all methods of video stabilization, because otherwise you would see the edge of the original frame move in and out of view.


Are there any stabilization algorithms that use other frames in the past and future to fill in the gaps with (possibly anachronistic) data?


As mentioned in the blogpost, the rolling-shutter version of this won the best paper prize at the International Conference on Computation Photography (ICCP), which was held last weekend in Seattle. This is a fairly new but very high quality conference. In many respects, I prefer it to the standard-bearing vision conferences like CVPR, ICCV, or ECCV -- although of course, ICCP is more narrowly focused on computational imaging and photography applications.

In their talk, the authors of this work showed many more video results and they were all quite impressive. In fact, they were good enough to fall into an "uncanny valley of motion", similar to the "uncanny valley" of faces or humans [1] that most people are familiar with. I.e., the motion correction was almost perfect, but just enough off that something felt vaguely surreal about the results. Nevertheless, it's a nice step forward.

Also, as others have pointed out, this is a fully uncalibrated method -- requiring no knowledge of how the video was captured. If you do have some knowledge, then you can often exploit it to do better. But the authors mentioned that most videos uploaded to youtube have either no calibration information, or if present, it's often incorrect. As such, it made sense for them to focus on the uncalibrated case.

Finally, I should point out that rolling shutter, standard on most mobile cameras, is causing all sorts of problems for traditional image and video analysis algorithms, which often make the assumption, sometimes implicitly, that the entire frame was captured at a single instance in time. This is not true anymore, and can lead to gross errors in many methods. Hence the many recent papers on correcting for, and in some cases exploiting, rolling shutter effects.

A recent interesting work along these lines from my former lab at Columbia University is "coded rolling shutter photography: flexible space-time photography" [2]. This paper takes advantage of the fact that different rows in an image are seeing the world at slightly different instances in time to do things like high-speed photography, HDR imaging, etc.

[1] http://en.wikipedia.org/wiki/Uncanny_valley

[2] http://www.cs.columbia.edu/CAVE/projects/crsp/


> Finally, I should point out that rolling shutter, standard on most mobile cameras, is causing all sorts of problems for traditional image and video analysis algorithms, which often make the assumption, sometimes implicitly, that the entire frame was captured at a single instance in time. This is not true anymore, and can lead to gross errors in many methods. Hence the many recent papers on correcting for, and in some cases exploiting, rolling shutter effects.

This is not just true of mobile phones, but of any current CMOS-sensored imaging device (most of them on the market). Compact cameras and SLR's included.


You can try something similar on linux using transcode with these two lines:

    transcode -J stabilize --mplayer_probe -i $infile
    transcode -J transform --mplayer_probe -i $infile -y xvid4 -o $outfile
I found this blog post with more info: http://kevin.deldycke.com/tag/transcode/


Just as a note for anyone using this, you'll want to read some of the options as well. The amount of smoothing is hard to get right automatically, and you will want different values for different effects.

Having said that, my results with this tool have been excellent in the past.


I really hope this keeps being optional because otherwise a lot of the authentic value of the videos will be lost. Also, the demo they showed looked like it was algorithmically deteriorated to make the change more noticeable. There's shaky hands and then there's parkinsons-level shaking which was what the demo showed...


"...algorithmically deteriorated..."

Or the person holding the camera was shaking like mad to demonstrate the abilities of the algorithm. Artistically speaking, shake removal is also authenticity removal. But most times, personal videos are shot by people with no eye for framing and stability. And most times, artistic (and professionals) videos are shot with an eye toward these things.

I can't foresee a reason not to keep this feature as an option rather than enforcing it on all uploads.


If you look carefully, the shakes appear too uniform to be a person shaking the camera. It looks like the camera always "shakes" in a circular motion around the center of the original camera location.


Probably a staged setup, inducing a regular shaking pattern, but I think generating rolling shutter algorithmicly would be a stretch for them, so I call physical but maybe voluntary shake.


it's much easier to record it without much shake and then maximize the effect of their algorithm after the fact instead of recording a bunch of versions with varying degrees of shake. They completely overdid it in my opinion, the example doesn't seem authentic at all to me (though the technology remains cool)


Or maybe it's due to the 'rolling' shutter causing shear stretching as mentioned in the article


But the point is that if they faked shaking algorithmically then the original signal would be shear-free right ?


Sometimes you want authentic value, and sometimes motion sickness isn't what you were aiming for in your video. This gives video makers more tools to express their vision.

Edit: also what's new here isn't the stabilization, it's that they will fix "rolling shutter" artifacts in each frame as well. Rolling shutter is something photographers generally dislike.


Of course it's optional. You can choose to apply it in the editor if you wish, and can vary how much shaking is removed.


If they used a telephoto lens then the shaking would be much more noticeable. Also the motion hints at someone actually attempting to correct for the movements, but because it's zoomed so far in every movement to counteract the shaking is greatly magnified.


Haven't you ever seen the Zapruder film?


That video demo was quite impressive.

I've used video stabilizer filters on virtualdub but I doubt they could fix as much as was done in that demo.

Another interesting stabilization demo http://www.youtube.com/watch?v=_Pr_fpbAok8


I'd like to see this come in Android so you can automatically shoot and record videos without the shaking in them. It would be a great selling point in my opinion.


You should already start to see that working its way out: the iPhone 4S does video stabilisation, I am sure high-end Android phones do or will start to do the same.

The algorithm being discussed here is specifically designed for when information about the camera or environment is not available: there are much better ways of carrying out digital image stabilisation on the device itself, such as using the accelerometer data to compensate, or in significantly advanced cameras (DSLRs, for example) compensating by moving the lens itself.


The iPhone 4S algorithm is far less aggressive than YouTube's.


I wonder when will we get this quality http://pages.cs.wisc.edu/~fliu/project/3dstab.htm


If you read the Google paper, you'll notice that they actually refer to this and other work by Liu et al. The overall technique is the same, estimate the original camera path, calculate an optimal camera path, retarget the input frames to a crop window that fits the optimal path.

The primary difference seems to be estimation and calculation technique. Liu's work does a structure-from-motion reconstruction, ie: rebuild a 3d model of the original scene. Google's work uses something called pyramidal Lucas-Kanade to do 'feature tracking' instead. This is sort of localized reconstruction, it seems to only care about the viewport differences from frame to frame. They then feed it through some linear programming voodoo to get the best path.

I don't understand either well enough to say why one is better than the other, although I'd guess it's because Lucas-Kanade is temporally and physically localized, it's easier to farm out to a parallel cluster than an SfM technique.

There also seems to be a difference on the rear end of the technique, having feature detection allows them to add 'saliency' constraints, ie: retarget based on the inclusion of certain features, like a person's face. Again, the math is beyond my understanding, but it seems like this isn't part of Liu's work.


Have you tried it? I have and I'd say the quality is pretty close if not the same. The much bigger problem to solve now is that shaky videos shot in less-than-perfect lighting contain motion blur, which is extremely hard to remove. You'll notice that all of these demo videos were conveniently shot outside in direct sunlight and contain no motion blur at all.


I have only tried the one offered by YouTube, and not recently. I don't know if they have improved the algorithm in this respect, but for what I've seen in the past, the filter often creates a very eerie wobbly effect on the video, an effect that makes it look fake, like being underwater or drunk. It is slightly visible in the demo video if you observe the borders. This strange effect is totally absent on the link I posted, which I believe it's on a different level of quality. But I imagine it's computationally very expensive and can't be offered to millions of users for free.


An eerie wobbly effect is definitely visible to me in the videos from that paper.


Thanks for that link, this should have a submission of its own. I can easily see how something like this would make a 'point and shoot' video camera really useful. Think "Flip Camera meets James Cameron"


This is amazing. Wonder if there is a commercial implementation yet?


These guys do some pretty amazing stuff, video stabilization is one of the applications: http://www.2d3.com/


Exactly, I also though of 2d3 as soon as I have seen that blog post. I came to know the company since it is linked by the (very valuable, if I may) book "Multiple View Geometry in Computer Vision", http://www.amazon.com/Multiple-View-Geometry-Computer-Vision... . IIRC the company is run by Andrew Fitzgibbon http://research.microsoft.com/en-us/um/people/awf/ , which now seems odd to me since he's a full time researcher at Microsoft. I wander how Andrew's algorithm in the 2d3.com product differs from the one published by the youtube guys in the post.


They should apply this during movie fighting scenes. Then we might actually see the fight instead of the blur caused by "exciting cameras".


Going off on a tangent here, but from the distant memory of my film studies degree days one of the reasons you get so much fast cutting and hard-to-make-out action in modern fight scenes is because choreographing and shooting a fight scene properly is hard work, particularly if your actors aren't that experienced in stage combat. It's a big cheat, designed to make shooting fight scenes much easier (this is especially true if you're shooting a fight-scene where one participant is CG'd in).


Higher frame rates would help a lot with the blur. I think Peter Jackson made a mistake shooting The Hobbit at 48 FPS for the entire movie. He should have shot most of it at the traditional 24 FPS but used 48 or 72 for fast motion shots. Hopefully his blunder won't poison high FPS forever in the minds of filmgoers.


> Higher frame rates would help a lot with the blur. I think Peter Jackson made a mistake shooting The Hobbit at 48 FPS for the entire movie. He should have shot most of it at the traditional 24 FPS but used 48 or 72 for fast motion shots. Hopefully his blunder won't poison high FPS forever in the minds of filmgoers.

You can't shoot parts of a film at 24FPS, and parts at 48FPS - the 48FPS parts would be transformed down to 24FPS and would appear to be in "slow motion".

Jackson, for what it's worth, is sticking to his guns re: 48FPS and believes that part of the dislike is because it's "change".


You absolutely can shoot parts of a film in 24 FPS and parts at 48 FPS. Instead of transforming the 48 FPS parts to 24 FPS, you do the other way around and transform the 24 FPS parts to 48 FPS, not by doubling the speed but by repeating each frame twice. In fact, film projectors have always displayed movies at 48 FPS with frame doubling to reduce the appearance of flicker: http://en.wikipedia.org/wiki/Frame_rate#Background


I wish my tv/movie playback devices had a "de-lensflare" filter.


Are you referring to JJ Abrams recent flirts with anisotropic lenses?


Protip: add ?m=1 to the end of blogger urls to remove the pointless javascript bloat.


Isn't quality inherently lost because the same video has to be reencoded again but without the shakes? Also, I just tested on a video and it looked slightly smudgy. OK, so if I am filming driving down a dirt road or after half a bottle of Jack Daniel's (or both) then it'd be good, otherwise it does more harm than good.


As I understand it the motion blur is a product of lossy compression (CCD has very short pixel-local exposure times; the shearing the article refers to appears when sweeping the whole image); which means that yes, stabilisation algorithms would work best with source data that hasn't been compressed using a perceptual model of motion blur.


What's the next step? Include accelerometer data from the camera, synced with the video, to use as support for stabilizing algorithms.


that's a really good idea actually. Maybe the video container format could add that metadata support.


Submitting video to a streaming service makes your content look better.

Shows how far we are in this whole cloud era.


Automatically?

sigh




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: