Before we repeat the same discussion[1] about Microsoft's recent hyperlapse prototype, note that the term hyperlapse was not coined by Microsoft[2].
Also note that they are using entirely different methods. Microsoft's approach analyzes the content of individual frames to model the camera's position and rotation in 3D space while Instagram uses the iPhone's gyroscope.
edit: On a side note, I'd like to point out that each approach has very interesting merits:
Microsoft's method can be applied to any video file, but requires more substantial CPU usage. Since instagram isn't analyzing video content, but instead cropping the input video based on gyroscope data, it requires much less processing time and can be done on the fly on a mobile device.
Microsoft's method can even render an entire scene from its 3D model (see their mountain climbing example where the helmet-mounted camera looks in an entirely different direction) while Instagram's method crops input video frames based on gyroscope data. This makes Microsoft's method more robust, but much less practical on a mobile device. Each method makes different trade-offs.
The Microsoft project is technically impressive. But I think John Gruber was onto something when he wrote, "But while Microsoft was publishing research, Instagram was building an app that will soon be in the hands of hundreds of thousands of people."
My impression of Microsoft Research is that they are like PARC: doing a lot of impressive research that rarely gets turned into compelling products.
The Kinect was a big payoff from MSR. What are some others?
Kinect was something like the fastest selling accessory of all time, so you can't expect too many hits at that level. But there have certainly been many other consumer-visible successes, like Excel's flash fill or Skype's upcoming real-time translation. Then there are enterprise things that consumers may be unaware of, like SQL Server's Hekaton technology. And beyond that, there are lots of additional projects that meaningfully translate to value to Microsoft but that aren't really externally visible, like speech recognition improvements, search engine ranking techniques, datacenter networking improvements, efficient algorithms for traffic routing and prediction, etc.
Generics in the CLR are a direct result from MSR. .Net would not have the nice generics it has without them. It even came down to specific deadlines and code deliveries MSR had to make.
The same team also delivered F#, which might have a large impact on keeping .Net competitive. Not only because features trickle down into C# (like async). But also because there may be lots of people only staying on .Net due to having a good language option. (That's just my own impression. I'd personally leave .Net without F#. I'm probably overestimating the impact.)
I don't follow their product releases very closely, but I was excited about Photosynth. I feel like the underlying research has been put to use in subtle ways in various products (possibly even Kinect?). I'm not sure if Deep Zoom went anywhere, but their initial implementation leaned on Silverlight..
As a research division, they are expected to focus on long plays by developing enabling technologies. Packaging that prototype in a simple mobile app for use by hundreds of thousands of people, for free, wouldn't move the needle very much for MS as an organization.
I'm not trying to defend MS/MSR, just trying to be pragmatic. It's up to MS leadership to incorporate cutting edge research into meaningful products.
Photosynth is still around. The current version is more limited than it used to be, it only supports straight lines and circular panoramas. But they're working on a more impressive update. Here's a preview http://photosynth.net/preview/view/df869f96-2765-4939-8eb3-2... You can zoom in an out, pan around, and hit 'c' to see a wireframe.
I have no doubt they're overall a very small research producer but Facebook's stuff[0] seems to be really heavily based on stuff they've actually done. I won't make a judgement on the kind of stuff MS produce but there's definitely something to be said for building something awesome and then sharing the technical information.
I have no doubt they're overall a very small research producer but Facebook's stuff[0] seems to be really heavily based on stuff they've actually done. I won't make a judgement on the kind of stuff MS produce but there's definitely something to be said for building something awesome and then sharing the technical information.
Kinect v1.0 was not an MSR product, although they did assist with the software. Kinect v2.0 was wholly developed at Microsoft, however. Again, with v2.0, MSR assisted with significant portions of the development.
The pose detection was from MSR and was technically very impressive at the time. Jamie Shotton and colleagues did the work. Primesense provided the RGBD camera: Shotton et al did the skeleton extraction.
In the microsoft method, it would be reasonably easy to approximate their SFM step with info from the gyro sensor. From their paper, SFM seems to be one of the most expensive steps.
Here's Alex Karpenko's 2011 paper on the technology[1] which was the basis for Luma[2] (by Midnox YC W11) which was then acquired by Instagram.
There's some additional interesting information in the paper's corresponding video overview[3]. For instance, it also does rolling shutter correction by treating each row of the video as separate time periods, computing a separate warp for each.
I mounted my iPhone 5 on my bike this morning, and took a Hyperlapse recording on the bikeride from home to the office (15 minutes).
When I reached the office and stopped the recording, the app would just hang with the text "Processing".. I left it processing for about 4 hours, with no progressbar or anything like it. I ended up giving up, closing the application and rebooting the iPhone. When I launched the program again, it had an error similar to "something bad happened" with a couple of options..
I don't think I'll try be using this program again.
That seems like a pretty extreme response. I know most apps only get one chance to fail before people uninstall them, but giving up on the whole app seems a bit much. It's new technology, and not trivial either. Why not give them a little more chance to improve before entirely shooting it down?
I'm curious to know how Toronto got to be the demo city. Facebook has a non-engineering presence there but I was surprised to see it on their engineering blog!
Both Alex and I are from Toronto (we are the engineers on Hyperlapse). I shot it on a little visit at the start of the summer and it became our go-to demo. That water taxi is particularly windy, bumpy and beautiful.
Just in case someone does not remember the discussion around http://steady.stupeflix.com/ here https://news.ycombinator.com/item?id=8048413 I am posting the links. I came away really impressed by their page, couldnt try it out for the lack of a device. Would be eager to hear reactions about hyperlapse from those guys.
I wonder how troublesome it is to keep the gyro stream and the frame stream in sync.
Hi srean ! One of the author of steady speaking.
Hyperlapse tech looks like a lot Steady, because when you are using the same gyro data, you're quite bound to follow the same maths to compensate the camera shake.
About the gyro stream / frame stream sync, it's very important to have it as precise as possible. We have seen that having just 1 ms of error in sync is already very visible to the naked eye.
So you have to build a very accurate model of the timing of the CMOS and gyro sensors, taking into account exposure, rolling shutter, and sometimes software / hardware glitches in the lower technological stack....
Hope it give you some clues about that kind of tech !
Hi madlag thanks for your answer and wishing you all the best for steady. Wonder if a tap or something like that in the beginning can be used to synchronize, much like the clapper in movies. On the flipside that may not be good for usability.
You're welcome and thanks for the encouragement !
Actually, when you get the timings ok you don't need anymore some "runtime" adjustments, everything is fixed in advance.
You just have to get the calibration data for every device, of course.
By the way, as you said, you can use a clap to synchronize automatically two videos, using just the audio track, but that's another story ;-)
A MS like product that is general purpose or GoPro and others start integrating gyroscopes into their products to take advantage of this similar technology.
The MS presentation really resonated with me since I do take a lot of video from my mountain biking trips but much of it is not interesting or hard to watch. I don't think I'll ever strap my iPhone to my chest so I'd like an action cam setup that can also deliver these types of videos.
In any case, this 'hyperlapse' processing is cool and I want to use it (just not on instagram).
The biggest downside to the MSR method that I saw was the visual artifacts where small details like the rocks on the ground would pop in and out of existence. It's not that bad but for things that Instagram's method would smooth I'd prefer to use it. I think that a mixture of both methods would be most useful. MSR's method could be made less computationally intense with a good stream of IMU data giving it the camera path and Instagram's method could use some of MSRs frame selection algorithms.
On a side note, what are the current options for iOS video stabilisation? I assume Instagram's Cinema framework is proprietary, so are there any open-source alternatives available?
Cool, it's basically the same thing google did last year with youtube; they have the same basic algorithm and automatically applied it to videos, only youtube had to do image recognition and a lot of cpu power where instagram can just use the gyroscope data.
The results are quite impressive. But why are they not combining the gyro data with that from the accelerometer? That should theoretically give superior results (i.e. sensor fusion). Or maybe they do, but simplified it for this write-up.
You can have fused information from CoreLocation, with some smoothing and so with lower time resolution, but you can have too raw data from the gyro / accelerometer.
I tried this indoors at my company as I dont have an iDevice.
It didn't work all that well and I was completely underwhelmed. The lighting was poor but I don't understand how poor lighting can affect this algorithm.
Low lighting means long exposure times which means motion blur trails that look weird once we run them through our stabilizer. Hope you get the chance to try it in brighter lighting at some point.
Partially and that's what the MSR Hyperlapse program did. That method is extremely expensive to do computationally so why do it when the devices recording has a good gyroscope available?
Also note that they are using entirely different methods. Microsoft's approach analyzes the content of individual frames to model the camera's position and rotation in 3D space while Instagram uses the iPhone's gyroscope.
[1] https://news.ycombinator.com/item?id=8227321
[2] http://en.m.wikipedia.org/wiki/Hyperlapse
via https://news.ycombinator.com/item?id=8227330
edit: On a side note, I'd like to point out that each approach has very interesting merits:
Microsoft's method can be applied to any video file, but requires more substantial CPU usage. Since instagram isn't analyzing video content, but instead cropping the input video based on gyroscope data, it requires much less processing time and can be done on the fly on a mobile device.
Microsoft's method can even render an entire scene from its 3D model (see their mountain climbing example where the helmet-mounted camera looks in an entirely different direction) while Instagram's method crops input video frames based on gyroscope data. This makes Microsoft's method more robust, but much less practical on a mobile device. Each method makes different trade-offs.