Hacker News new | past | comments | ask | show | jobs | submit login

Blender files are dramatically more complex than any image format, which are basically all just 2D arrays of 3-value vectors. The blender filetype uses a weird DNA/RNA struct system that would probably require its own training run.

More on the Blender file format: https://fossies.org/linux/blender/doc/blender_file_format/my...




But surely you wouldn't try to emit that format directly, but rather some higher level scene description? Or even just a set of instructions for how to manipulate the UI to create the imagined scene?


It sure feels weird to me as well, that GenAI is always supposed to be end-to-end with everything done inside NN blackbox. No one seems to be doing image output as SVG or .ai.


Imo the thinking is that whenever humans have tried to pre-process or feature-engineer a solution or tried to find clever priors in the past, massive self-supervised-learning enabled, coarsely architected, data-crunching NNs got better results in the end. So, many researchers / industry data scientists may just be disinclined to put effort into something that is doomed to be irrelevant in a few years. (And, of course, with every abstraction you will lose some information that may bear more importance than initially thought)


The way that website builders using GenAI work is they have a LLM generate the copy, then find a template that matches that and fill it out. This basically means the "visual creativity" part is done by a human, as the templates are made and reviewed by a human.

LLMs are good at writing copy that sounds accurate and creative enough, and there are known techniques to improve that (such as generating an outline first, then generating each section separately). If you then give them a list of templates, and written examples of what they are used for, the LLM is able to pick one that's a suitable match. But this is all just probability, there's no real creativity here.

Earlier this year I played around with trying to have GPT-3 directly output an SVG given a prompt for a simple design task (a poster for a school sports day), and the results were pretty bad. It was able to generate a syntantically coreect SVG, but the design was terrible. Think using #F00 and #0F0 as colours, placing elements outside the screen boundaries, layering elements so they are overlapping.

This was before GPT-4, so it would be interesting to repeat that now. Given the success people are having with GPT-4V, I feel that it could just be a matter of needing to train a model to do this specific task.


There is a fundamental disconnect between industry and academia here.


Over the last 10 years of industry work, I'd say about 20% of my time has been format shifting, or parsing half baked undocumented formats that change when I'm not paying attention.

That pretty much matches my experience working with NN's and LLM's


I've seen this but producing Python scripts that you run in Blender, e.g. https://www.youtube.com/watch?v=x60zHw_z4NM (but I saw something marginally more impressive, not sure where though!)


My god that is an irritating video style, "AI woweee!"


Yeah I'd imagine that's the best way. Lots of LLMs can generate workable Python code too, so code that jives with Blender's Python API doesn't seem like too much of a leap.

The only trick is that there has to be enough Blender Python code to train the LLM on.


Maybe something like OpenSCAD is a good middle ground. Procedural code-like format for specifying 3D objects that can then be converted and imported in Blender.


I tried all the AI stuff that I could on OpenSCAD.

While it generates a lot of code that initially makes sense, when you use the code, you get a jumbled block.


This. I think problem is that the LLMs really struggle with 3d scene understanding, so what you would need to do is generate code that generates code.

But also I suspect there just isn't that much openscad code in the training data, and the semantics are different enough to python or any of the other languages that are well-represented that it struggles.


Scene layouts, models and their attributes are a result of user input (ok and sometimes program output). One avenue to take there would be to train on input expecting an output. Like teaching a model to draw instead of generate images.. which in a sense we already did by broadly painting out silhouettes and then rendering details.


Voxel files could be a simpler step for 3D images.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: