Magic3D looks like an improvement on DreamFusion [1], so it's sad to see that the code and models are not being made public.
What is public right now is StableDreamFusion [2]. It produces surprisingly good results on radially symmetrical organic objects like flowers and pineapples. You can run it on your own GPU or a colab.
Or, if you just want to type a prompt into a website and see something in 3D, try our demo at https://holovolo.tv
Looks like Magic3D doesn't depend on any additional training, which means that open-source methods like StableDreamFusion can be adapted to this new method quite easily.
They use https://deepimagination.cc/eDiffi/ as the text-to-image diffusion model, which can be replaced with Stable Diffusion or something else.
Many moons ago my job was banging out greebles and low poly textured models for arch-viz (architectural visualisation) libraries. Think variations of "technological" shapes to provide detailing for a space ship, chair models to be used in background of render of fancy new office, or door variation number 26 type models. Yes it was as boring as it sounds. A lot of the models have ended up being used in games more recently as well as GPUs have got better. I can certainly see these types of systems being good enough to replace that type of drudge work in modelling pipelines within the next two years.
What I really want to see (now that my livelihood doesn't depend on it) is a system that can produce models from prompts with decent mesh topology suitable for rigging (or even auto rigged) and that can be separated into component parts and even have physics applied to them.
My dream would be for the bunny to be able to hop off, the stack of pancakes to react realistically and the maple syrup to ooze down!
It certainly seems that where there is sufficient incentive-pressure, evolutions like this will be made... and tool-chains will pop up... probably more quickly than one would think...
amazing stuff, I can't wait to use this in video. I've been having so much fun training custom stable diffusion models of people and testing people doing various things . the problem with these is that finding the right text in latent space, so im still not sure text is the right medium to generate everything, but instead as a tool to start prototyping.
As a grasshopper style node graph is effectively an abstract syntax tree (AST, i.e. code) I imagine that actually something like GitHub copilot and such tools would be more appropriate.
Current state of the art tends not to differentiate between gibberish and output of actual value so that may be a bit of a downer.
But I definetly see it as plausible - I just don't know where you would get the training data though...
What is public right now is StableDreamFusion [2]. It produces surprisingly good results on radially symmetrical organic objects like flowers and pineapples. You can run it on your own GPU or a colab.
Or, if you just want to type a prompt into a website and see something in 3D, try our demo at https://holovolo.tv
[1] https://dreamfusion3d.github.io/
[2] https://github.com/ashawkey/stable-dreamfusion
[3] https://holovolo.tv