Hacker News new | past | comments | ask | show | jobs | submit login
The Technology Behind Hyperlapse from Instagram (instagram-engineering.tumblr.com)
167 points by peterhunt on Aug 27, 2014 | hide | past | favorite | 48 comments



Before we repeat the same discussion[1] about Microsoft's recent hyperlapse prototype, note that the term hyperlapse was not coined by Microsoft[2].

Also note that they are using entirely different methods. Microsoft's approach analyzes the content of individual frames to model the camera's position and rotation in 3D space while Instagram uses the iPhone's gyroscope.

[1] https://news.ycombinator.com/item?id=8227321

[2] http://en.m.wikipedia.org/wiki/Hyperlapse

via https://news.ycombinator.com/item?id=8227330

edit: On a side note, I'd like to point out that each approach has very interesting merits:

Microsoft's method can be applied to any video file, but requires more substantial CPU usage. Since instagram isn't analyzing video content, but instead cropping the input video based on gyroscope data, it requires much less processing time and can be done on the fly on a mobile device.

Microsoft's method can even render an entire scene from its 3D model (see their mountain climbing example where the helmet-mounted camera looks in an entirely different direction) while Instagram's method crops input video frames based on gyroscope data. This makes Microsoft's method more robust, but much less practical on a mobile device. Each method makes different trade-offs.


The Microsoft project is technically impressive. But I think John Gruber was onto something when he wrote, "But while Microsoft was publishing research, Instagram was building an app that will soon be in the hands of hundreds of thousands of people."

My impression of Microsoft Research is that they are like PARC: doing a lot of impressive research that rarely gets turned into compelling products.

The Kinect was a big payoff from MSR. What are some others?


Kinect was something like the fastest selling accessory of all time, so you can't expect too many hits at that level. But there have certainly been many other consumer-visible successes, like Excel's flash fill or Skype's upcoming real-time translation. Then there are enterprise things that consumers may be unaware of, like SQL Server's Hekaton technology. And beyond that, there are lots of additional projects that meaningfully translate to value to Microsoft but that aren't really externally visible, like speech recognition improvements, search engine ranking techniques, datacenter networking improvements, efficient algorithms for traffic routing and prediction, etc.


Generics in the CLR are a direct result from MSR. .Net would not have the nice generics it has without them. It even came down to specific deadlines and code deliveries MSR had to make.

The same team also delivered F#, which might have a large impact on keeping .Net competitive. Not only because features trickle down into C# (like async). But also because there may be lots of people only staying on .Net due to having a good language option. (That's just my own impression. I'd personally leave .Net without F#. I'm probably overestimating the impact.)


I don't follow their product releases very closely, but I was excited about Photosynth. I feel like the underlying research has been put to use in subtle ways in various products (possibly even Kinect?). I'm not sure if Deep Zoom went anywhere, but their initial implementation leaned on Silverlight..

As a research division, they are expected to focus on long plays by developing enabling technologies. Packaging that prototype in a simple mobile app for use by hundreds of thousands of people, for free, wouldn't move the needle very much for MS as an organization.

I'm not trying to defend MS/MSR, just trying to be pragmatic. It's up to MS leadership to incorporate cutting edge research into meaningful products.


Photosynth is still around. The current version is more limited than it used to be, it only supports straight lines and circular panoramas. But they're working on a more impressive update. Here's a preview http://photosynth.net/preview/view/df869f96-2765-4939-8eb3-2... You can zoom in an out, pan around, and hit 'c' to see a wireframe.

Edit: you can also make Deep Zoom images offline in the Image Composite Editor, but that's been around for a long time. https://research.microsoft.com/en-us/um/redmond/groups/ivm/I...


The TrueSkill ranking system was incorporated into Xbox Live: http://research.microsoft.com/en-us/projects/trueskill/


I have no doubt they're overall a very small research producer but Facebook's stuff[0] seems to be really heavily based on stuff they've actually done. I won't make a judgement on the kind of stuff MS produce but there's definitely something to be said for building something awesome and then sharing the technical information.

[0] https://www.facebook.com/publications


I have no doubt they're overall a very small research producer but Facebook's stuff[0] seems to be really heavily based on stuff they've actually done. I won't make a judgement on the kind of stuff MS produce but there's definitely something to be said for building something awesome and then sharing the technical information.

[0]https://www.facebook.com/publications


Kinect was not an MSR product [1] [1] http://en.wikipedia.org/wiki/PrimeSense


Kinect v1.0 was not an MSR product, although they did assist with the software. Kinect v2.0 was wholly developed at Microsoft, however. Again, with v2.0, MSR assisted with significant portions of the development.


The pose detection was from MSR and was technically very impressive at the time. Jamie Shotton and colleagues did the work. Primesense provided the RGBD camera: Shotton et al did the skeleton extraction.


Thanks. I was wondering where I got that impression, and it looks like MSR does take some credit for working with that technology.

http://research.microsoft.com/en-us/about/brochure-7.aspx


In the microsoft method, it would be reasonably easy to approximate their SFM step with info from the gyro sensor. From their paper, SFM seems to be one of the most expensive steps.


Here's Alex Karpenko's 2011 paper on the technology[1] which was the basis for Luma[2] (by Midnox YC W11) which was then acquired by Instagram.

There's some additional interesting information in the paper's corresponding video overview[3]. For instance, it also does rolling shutter correction by treating each row of the video as separate time periods, computing a separate warp for each.

There's also an early version of the source code available on github: https://github.com/alexgo1/Video-Stabilization

[1] http://graphics.stanford.edu/papers/stabilization/karpenko_g...

[2] http://luma.io/

[3] https://www.youtube.com/watch?v=I54X4NRuB-Q


I mounted my iPhone 5 on my bike this morning, and took a Hyperlapse recording on the bikeride from home to the office (15 minutes).

When I reached the office and stopped the recording, the app would just hang with the text "Processing".. I left it processing for about 4 hours, with no progressbar or anything like it. I ended up giving up, closing the application and rebooting the iPhone. When I launched the program again, it had an error similar to "something bad happened" with a couple of options..

I don't think I'll try be using this program again.


That seems like a pretty extreme response. I know most apps only get one chance to fail before people uninstall them, but giving up on the whole app seems a bit much. It's new technology, and not trivial either. Why not give them a little more chance to improve before entirely shooting it down?


Yeah, because normally programs are perfect on launch, and the best way to have progress is to quit using them forever on the first bug...

I'm not even sure what this response has to do with the hacker mentality...


I'm curious to know how Toronto got to be the demo city. Facebook has a non-engineering presence there but I was surprised to see it on their engineering blog!


Both Alex and I are from Toronto (we are the engineers on Hyperlapse). I shot it on a little visit at the start of the summer and it became our go-to demo. That water taxi is particularly windy, bumpy and beautiful.


Cool, you built Waterloo Course Qualifier! I remember using that. Glad to hear that Instagram has an engineering presence in Toronto.


(Too late to edit this, but Thomas clarified that the team is from Toronto but based in Menlo Park)


This tech was originally built by the Midnox[1] team from Waterloo/Toronto (YCW12), and later acquired by Instagram.

[1] http://blog.luma.io/


Just in case someone does not remember the discussion around http://steady.stupeflix.com/ here https://news.ycombinator.com/item?id=8048413 I am posting the links. I came away really impressed by their page, couldnt try it out for the lack of a device. Would be eager to hear reactions about hyperlapse from those guys.

I wonder how troublesome it is to keep the gyro stream and the frame stream in sync.


Hi srean ! One of the author of steady speaking. Hyperlapse tech looks like a lot Steady, because when you are using the same gyro data, you're quite bound to follow the same maths to compensate the camera shake. About the gyro stream / frame stream sync, it's very important to have it as precise as possible. We have seen that having just 1 ms of error in sync is already very visible to the naked eye. So you have to build a very accurate model of the timing of the CMOS and gyro sensors, taking into account exposure, rolling shutter, and sometimes software / hardware glitches in the lower technological stack.... Hope it give you some clues about that kind of tech !


Hi madlag thanks for your answer and wishing you all the best for steady. Wonder if a tap or something like that in the beginning can be used to synchronize, much like the clapper in movies. On the flipside that may not be good for usability.


You're welcome and thanks for the encouragement ! Actually, when you get the timings ok you don't need anymore some "runtime" adjustments, everything is fixed in advance. You just have to get the calibration data for every device, of course. By the way, as you said, you can use a clap to synchronize automatically two videos, using just the audio track, but that's another story ;-)


I wonder which will come first:

A MS like product that is general purpose or GoPro and others start integrating gyroscopes into their products to take advantage of this similar technology.

The MS presentation really resonated with me since I do take a lot of video from my mountain biking trips but much of it is not interesting or hard to watch. I don't think I'll ever strap my iPhone to my chest so I'd like an action cam setup that can also deliver these types of videos.

In any case, this 'hyperlapse' processing is cool and I want to use it (just not on instagram).


The biggest downside to the MSR method that I saw was the visual artifacts where small details like the rocks on the ground would pop in and out of existence. It's not that bad but for things that Instagram's method would smooth I'd prefer to use it. I think that a mixture of both methods would be most useful. MSR's method could be made less computationally intense with a good stream of IMU data giving it the camera path and Instagram's method could use some of MSRs frame selection algorithms.


Does anyone else find it strange that the IG engineering blog is on Tumblr?


Where else would they blog?


Since Facebook bought them, how about Facebook?


Facebook has a really bad 'blogging' interface. Everytime I see an fb engineering post/blog it is jarring and seems out of place.


Facebook.


On a side note, what are the current options for iOS video stabilisation? I assume Instagram's Cinema framework is proprietary, so are there any open-source alternatives available?


Cool, it's basically the same thing google did last year with youtube; they have the same basic algorithm and automatically applied it to videos, only youtube had to do image recognition and a lot of cpu power where instagram can just use the gyroscope data.


The results are quite impressive. But why are they not combining the gyro data with that from the accelerometer? That should theoretically give superior results (i.e. sensor fusion). Or maybe they do, but simplified it for this write-up.


Afaik, the gyro data on the iphone is already sensor fused with the accelerometer.


You can have fused information from CoreLocation, with some smoothing and so with lower time resolution, but you can have too raw data from the gyro / accelerometer.


I tried this indoors at my company as I dont have an iDevice. It didn't work all that well and I was completely underwhelmed. The lighting was poor but I don't understand how poor lighting can affect this algorithm.


Low lighting means long exposure times which means motion blur trails that look weird once we run them through our stabilizer. Hope you get the chance to try it in brighter lighting at some point.


Try watching some videos from normal users.

Also, why try it in poor lighting? 90% of video/photography is about having appropriate lighting.


Is there an open source version of either something like Cinema or what MSR did? I'd really like to be able to post-process some of my gopro footage.


So, what is the reason to use the gyroscope data? Can't frame-to-frame pixel comparisons get you the information about how the frame moves?


Partially and that's what the MSR Hyperlapse program did. That method is extremely expensive to do computationally so why do it when the devices recording has a good gyroscope available?


Yeah, at the cost of huge delay to do the calculations on a mobile CPU.

This bypasses that.


Excellent idea for an app. I tried it out yesterday and it is really nice. Very simple user interface.


gyroLens did a similar thing (2010) but they never extended it to shooting video.

http://hunter.pairsite.com/mobile/gyrolens/help/index.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: