Perhaps I'm wrong, but judging by the wibbly-wobbly walls, this is well behind the state of the art when it comes to the preservation of spatial invariants.
By comparison a lot of the more recent demos are not only capable of polyframe structure preservation, but also do a very good job at preserving invariants even of subjects that are moving and deforming (such as a speaking human).
I've got no idea where the most recent deep fake progress has come from - my google foo wasn't up to the task, but with a lot of more recent videos it seems that they're actually dealing with deformable 3D surfaces, and not just deformations of 2D projections.
> enables conditional generation of 3D scenes from different modalities like text or RGB images.
Please help me understand few dumb questions I have.
- What exactly is used as an input to generate such scenes is it just few pictures or even text description?
- Is it able to generate data for something which was not in the input? Like you have some common object in the corner of your photo and its able to expand the picture as if you had it in the frame in the first place?
- What is the end game of technologies like these? Could it be one day fed lets say every piece of data google has about the world like every 360 picture, every book, article, video, movie and so on allowing you to take picture of something and spawning infinitely walkable world looking and behaving as our reality? Similar to procedurally generated video game map.
i think this takes a scene, pictures or videos and reconstructs a 3D scene where it recognizes entities.
i dont think so? it just reconstructs the space it sees but it could absolutely expand to fill in the gap so to speak.
robotic navigation and manipulation with environment would be my immediate guess. It would be able to build a complete 3D version of the world and recognize objects. Your idea could be a reality here as well.
CVPR 2022 was a very interesting year for 3D scene reconstruction. One particular paper I recall was reaching into a database of CAD objects and simply replacing the scene with those objects that fit very close to what is shown in the scene. It could mean that a robot armed with this type of computer vision could manipulate with every single object it sees and know exactly how to interact with it without further examination.
if this is an internal code-name OK, but this public post sounds more like a product name. How is it OK to hijack the widely-known artist, with no other meaning, for your commercial VR product ?
"Antoni Gaudí i Cornet was a Catalan architect from Spain known as the greatest exponent of Catalan Modernism. Gaudí's works have a highly individualized, sui generis style. Most are located in Barcelona, including his main work, the church of the Sagrada Família"
There is DALLE and Inception from other groups (openai and google ) . Also Big Bird and Bert. It’s basically okay as long as anyone would find an issue or involve lawyers at least for Deep Learning researchers .
Speaking of which, can you take DALL-E output and feed it in and get 3D art? Or maybe someday prompt-to-3D direct, although the right kind of training data might not be there yet.
By comparison a lot of the more recent demos are not only capable of polyframe structure preservation, but also do a very good job at preserving invariants even of subjects that are moving and deforming (such as a speaking human).