Hacker News new | past | comments | ask | show | jobs | submit login
Gaudi: A Neural Architect for Immersive 3D Scene Generation (github.com/apple)
111 points by andsoitis on Aug 4, 2022 | hide | past | favorite | 27 comments



Perhaps I'm wrong, but judging by the wibbly-wobbly walls, this is well behind the state of the art when it comes to the preservation of spatial invariants.

By comparison a lot of the more recent demos are not only capable of polyframe structure preservation, but also do a very good job at preserving invariants even of subjects that are moving and deforming (such as a speaking human).


Cool! Any demos you can link to?


Honestly, as much as I hate them, NVidia does an amazing job with this sort of thing.

Camera motion through a static scene: https://blogs.nvidia.com/blog/2022/03/25/instant-nerf-resear...

I've got no idea where the most recent deep fake progress has come from - my google foo wasn't up to the task, but with a lot of more recent videos it seems that they're actually dealing with deformable 3D surfaces, and not just deformations of 2D projections.


> enables conditional generation of 3D scenes from different modalities like text or RGB images.

Please help me understand few dumb questions I have.

- What exactly is used as an input to generate such scenes is it just few pictures or even text description?

- Is it able to generate data for something which was not in the input? Like you have some common object in the corner of your photo and its able to expand the picture as if you had it in the frame in the first place?

- What is the end game of technologies like these? Could it be one day fed lets say every piece of data google has about the world like every 360 picture, every book, article, video, movie and so on allowing you to take picture of something and spawning infinitely walkable world looking and behaving as our reality? Similar to procedurally generated video game map.


i think this takes a scene, pictures or videos and reconstructs a 3D scene where it recognizes entities.

i dont think so? it just reconstructs the space it sees but it could absolutely expand to fill in the gap so to speak.

robotic navigation and manipulation with environment would be my immediate guess. It would be able to build a complete 3D version of the world and recognize objects. Your idea could be a reality here as well.

CVPR 2022 was a very interesting year for 3D scene reconstruction. One particular paper I recall was reaching into a database of CAD objects and simply replacing the scene with those objects that fit very close to what is shown in the scene. It could mean that a robot armed with this type of computer vision could manipulate with every single object it sees and know exactly how to interact with it without further examination.


What's with the weird license? Where's the code? Looks cool, otherwise.


Code will be there in a few weeks.

And the license seems to be MIT/Apache style but making it very clear not to use their logo and that Apple doesn't grant you patent indemnification.


Are there intellectual property concerns with naming a framework after a person? I’m assuming it’s a reference to Antoni Gaudi.


Is that a Wolfenstein texture I see there?


Vizdoom dataset, yes

http://vizdoom.cs.put.edu.pl/


if this is an internal code-name OK, but this public post sounds more like a product name. How is it OK to hijack the widely-known artist, with no other meaning, for your commercial VR product ?

"Antoni Gaudí i Cornet was a Catalan architect from Spain known as the greatest exponent of Catalan Modernism. Gaudí's works have a highly individualized, sui generis style. Most are located in Barcelona, including his main work, the church of the Sagrada Família"


Wait until you hear what they did to McIntosh apples and Isaac Newton.


There is DALLE and Inception from other groups (openai and google ) . Also Big Bird and Bert. It’s basically okay as long as anyone would find an issue or involve lawyers at least for Deep Learning researchers .


Dali -> DALL-E

Gaudi -> GAUD-E?


Speaking of which, can you take DALL-E output and feed it in and get 3D art? Or maybe someday prompt-to-3D direct, although the right kind of training data might not be there yet.


Seems more like an homage than a hijacking?


It's cute, but completely tone-deaf.


Why?


Among other points, there's the obvious one that his designs never seemed to work as designed and took decades longer than planned to finish.


Not to mention the (trademarked) Gaudi processor for ML/AI: https://habana.ai/training/gaudi2/

Wondering whether this Gaudi software can be ported to use Gaudi SDK :-)


So what is the thought process here.

That Apple is going to name their next flagship device: ML-Gaudi ?

As opposed to this just being a playful name for an open-source project.


I am not OK with public social arts being rebranded for corporate PR


It's just a homage to an architect.


No it's not, it's PR bullshit.


100% correct. just borrowing credibility unearned.


Why is this down-voted? Fuck the name appropriation for corporate PR.


[flagged]


Complaining about voting gets you downvoted, pretty reliably. It's also against the community guidelines: https://news.ycombinator.com/newsguidelines.html

EDIT: plus, I guess your second comment is insinuating about brigading, which is also against said guidelines. :D




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: