Hacker News new | past | comments | ask | show | jobs | submit login
Disney's new image algorithm turns 2D photos into a 3D model (3ders.org)
201 points by makos on July 30, 2013 | hide | past | favorite | 35 comments



Wow tough crowd, this is actually extremely impressive if you're interested in the subject. The fineness and complexity of the reconstructed depth field is unparalleled.


I work in the field of Computer Vision based 3D Reconstruction. Although this development is "nice", it is hardly capable of the potential applications cited. Notice any 3D reconstructions used anywhere other than novelty? That's because they are mostly useless. In order to create useful 3D Reconstrucions, the underlying reconstruction system needs to recognize the object and use prior acquired references to supply information obscured from view, and/or identify transformations (damage, wear) to the "original" objects and scene being analyzed. The goal is to supply one image of an object, and receive back a correctly formed, rigged for correct motion, 3d geometry of nearly any common object. And it's coming.


I have worked in the field before and I know 3D reconstructions get used for serious work - albeit rarely for the purposes of redisplay. Besides an overworked applications section of a paper? I'm shocked and horrified :)

The work you're describing sounds impressive though.


Sometimes all you need is the geometry.


What if you can't separate the geometry you want from the geometry next to it, or the geometry it sets upon? That's the current public state of the art. Such systems become much more useful after they can identify objects so they can tell when two items are separate things, and when two sections of the same thing are in fact one object.


If you are making a robot that has to manipulate objects in it's environment, fair enough. If you are making a vehicle that just needs to know about the free "drivable" space in it's vicinity, the barrier is lower.


Looks very nice - and of great interest to me, since I was already playing with epipolar images for structure - from - motion albeit for a very different application.


Very impressive! I think this has the best quality I've seen.

Also the shot on the boat makes me wonder if this could be used as steady-shot-software.


That's pretty awesome. I wonder if they will hook it up to google maps and start using real cities for game levels.

Also related, check out the work Japan has been doing for years

http://www.engadget.com/2012/08/23/live2d-drawing-technology...

http://www.engadget.com/2009/06/04/video-sonys-motionportrai...


This is pretty neat, although I would suggest perhaps of limited use. The whole point of a 3D model is that it can be viewed from any angle (also at any scale). The 2D scene lacks sufficient information to reconstruct the 3D scene. This can be seen with the toys on the sofa when the back of the hippo becomes visible and there is no texture or model data. You could extrapolate and assume there's a degree of symmetry in the object, but this would only work on a limited subset of objects.


It looks like it could be a tremendous time saver for camera projection mapping for matte paintings even if you don't circle all around set or geometry. I suspect that was their motivation.


Maybe it was to make a 3d versions of old 2d disney movies.

3d Aristocats anyone?


Makes sense, since camera projection is used to convert 2d to 3d (movies).


The title is a bit misleading, it's not from a single photo but from a video stream.


photos, not photo


The English language is ambiguous here. In computer graphics, this is commonly resolved by saying "image set", "video stream", "image sequences", etc.

Since this is computer vision topic, and it said "photos", and that this particular "2D images -> 3D" has been implemented double-digit times each year for at least the past 10 years, with so many research papers that you can choose between dozens of survey papers instead. I too felt the title was misleading.


How is it ambiguous? Photos is plural and model is singular. That would seem to be completely and utterly clear. Also, how does the fact that this area is heavily researched affect anything?

Could it possibly be that your desire for novelty over incremental improvement in an area heavily mined is what is colouring your expectations, rather than the headline itself being misleading?


I agree that it isn't a big deal but I was genuinely confused when I first read the article. That's why I made this comment.

Even now, I still think that it is a bit misleading because they cannot reconstruct the scene from any photo(s) but only from a sequence of closely related photos which is better described by the word 'video'.


The last paragraph in the article states otherwise though.

"The photos were captured along a linear path; this geometry provided structure that the researchers could leverage to make processing the data more efficient. However, the researchers also generalized their approach so that it can be applied even to a set of images taken with a hand-held camera."

Also, even the photos taken on the rail are not taken as video. You can tell this from the shots where someone is briefly in front of the camera. This is because to get the best data you want the camera to stop at a position on the rail while the shutter is open as otherwise you have to deal with motion blur or keep your exposure very short.


Hmm.. I think it's ambiguous, because if the article was about a new method on cleverly deducing 3D information from a single 2D image (with clever assumptions -- object recognition, lighting analysis, etc.), you would still be able to say "Extracts 3D information from 2D images".

The ambiguity comes from A) Does each run of the algorithm require one image? -- "images". B) Does each run of the algorithm require multiple images? -- "images".

At least, that's what confused me. As to why this being heavily researched adds to my confusion, it's not to try to be disrespectful, or diminish the value of the results or accomplishment. It's just that it's exactly what it is, an incremental research. "Why does it appear on hacker news all of a sudden? There must be something quite exceptional about his (i.e. some more exceptional than 'incremental research')".


I don't want to participate in this argument except to point out that the sentence is definitely ambiguous for at least some dialects (evidence: people, including myself, are getting different readings of the same sentence). The second reading that you aren't getting should be clear enough from context, so I won't spell it out. If I wanted to be really clear one way or the other, I would say:

1) Disney's new image algorithm turns a 2D photo into a 3D model.

or

2) Disney's new image algorithm turns a group of 2D photos into a 3D model.

If I just wanted to get the meaning of (1) across, I would probably phrase my title exactly how OP did. Definitely (not necessarily intentionally) misleading.



That one is pretty neat. I remember it had interactive demos, but I can't find them in that site (skimmed over it while at work).


Other option:

Videotrace http://punchcard.com.au/



Maybe I am missing the point - but this style of 3d I have never really found visually impressive or fun. I think it is because of the limited range of movement on display. Half the fun of this kind of 3d would be to find things initially hidden in the opening view. I can imagine my sun tilting his ipad to see behind things. But the affect only shifts very slightly. I know that masses more information would be needed for such an affect but to me at least this is what this 3d hints at and why i guess ultimately i find it disappointing.


You're disappointed because it's not magic or strong AI?


Something like this maybe? : http://www.youtube.com/watch?v=qHepKd38pr0


Imagine capturing footage on your phone/Glass and reviewing it on a large display that tracks your head movements and adjusts the camera accordingly.

Think of the Minority Report scene when Cruise is watching pseudo-3D footage of his family.

The history of media shows a trend towards 100% faithfully recreated experiences. Photos > Film > Video w/ audio > 3D. This gets us one step closer.


This looks like it could be used as a replacement for 3d laser scanner.


It turns a series of photos into a 3D model, which has been done to death.

Autodesk 123D (which is free) can create a 3D model from just two photos.


Don't be daft, people don't stop developing in an area just because it has been approached before and besides, this stuff is way ahead of the tech in Autodesk 123D.

You might as well complain that people are still developing video compression techniques.


Yes, very similar. However there is still room for improvement. Maybe Disney found a solution to make cleaner models, leading to a 3d-print ready solution. With 123D you need to spend time on cleaning the mesh.

ps. this seems to be an open source version of 123D http://insight3d.sourceforge.net/ Interesting, but havent used it yet


Gahh! 9 out of 10 times, software made in academia has cool features but awful code/structure/packaging/younameit. In this case, the Linux (!) tarball contains Windows binaries and a handwritten Makefile that is not able to compile even the first source file. What a pity.


Pity every photo you upload to that service gets made public.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: