Hacker News new | past | comments | ask | show | jobs | submit login

As with any generative model, trust but verify. Try it yourself. Frankly, as a generative researcher myself, there's a lot of reason to not trust what you see in papers and pages.

They link a Huggingface page (great sign!): https://huggingface.co/spaces/tencent/Hunyuan3D-2

I tried to replicate the objects they show on their project page (https://3d-models.hunyuan.tencent.com/). The full prompts exist but are truncated so you can just inspect the element and grab the text.

  Here's what I got
  Leaf
     PNG: https://0x0.st/8HDL.png
     GLB: https://0x0.st/8HD9.glb
  Guitar
     PNG: https://0x0.st/8HDf.png  other view: https://0x0.st/8HDO.png
     GLB: https://0x0.st/8HDV.glb
  Google Translate of Guitar:
     Prompt: A brown guitar is centered against a white background, creating a realistic photography style. This photo captures the culture of the instrument and conveys a tranquil atmosphere.
     PNG: https://0x0.st/8HDt.png   and  https://0x0.st/8HDv.png
     Note: Weird thing on top of guitar. But at least this time the strings aren't fusing into sound hole. 
I haven't tested my own prompts or the google translation of the Chinese prompts because I'm getting an over usage error (I'll edit comment if I get them). That said, these look pretty good. The paper and page images definitely look better, but these aren't like Stable Diffusion 1 paper vs Stable Diffusion 1 reality.

But these are long and detailed prompts. Lots of prompt engineering. That should raise some suspicion. Real world has higher variance and let's get an idea how hard it is to use. So let's try some simpler things :)

  Prompt: A guitar
    PNG: https://0x0.st/8HDg.png
    Note: Not bad! Definitely overfit but does that matter here? A bit too thick for a electric guitar but too thin for acoustic.
  Prompt: A Monstera leaf
    PNG: https://0x0.st/8HD6.png  
         https://0x0.st/8HDl.png  
         https://0x0.st/8HDU.png
    Note: A bit wonkier. I picked this because it looked like the leaf in the example but this one is doing some odd things. 
          It's definitely a leaf and monstera like but a bit of a mutant. 
  Prompt: Mario from Super Mario Bros
    PNG: https://0x0.st/8Hkq.png
    Note: Now I'm VERY suspicious....
  Prompt: Luigi from Super Mario Bros
    PNG: https://0x0.st/8Hkc.png
         https://0x0.st/8HkT.png  
         https://0x0.st/8HkA.png
    Note: Highly overfit[0]. This is what I suspected. Luigi isn't just tall Mario. 
          Where is the tie coming from? The suspender buttons are all messed up. 
          Really went uncanny valley here. So this suggests we're really brittle. 
  Prompt: Peach from Super Mario Bros
    PNG: https://0x0.st/8Hku.png  
         https://0x0.st/8HkM.png
    Note: I'm fucking dying over here this is so funny. It's just a peach with a cute face hahahahaha
  Prompt: Toad from Super Mario Bros
    PNG: https://0x0.st/8Hke.png 
         https://0x0.st/8Hk_.png
         https://0x0.st/8HkL.png
    Note: Lord have mercy on this toad, I think it is a mutated Squirtle.  
Paper can be found here (the arxiv badge on the page leads to a pdf in the repo, which github is slow to render those): https://arxiv.org/abs/2411.02293

(If you want to share images like I did all I'm doing is `curl -F'file=@foobar.png' https://0x0.st`)

[0] Overfit is a weird thing now. Maybe it doesn't generalize well, but sometimes that's not a problem. I think this is one of the bigger lessons we've learned with recent ML models. My viewpoint is "Sometimes you want a database with a human language interface. Sometimes you want to generalize". So we have to be more context driven here. But certainly there are a lot of things we should be careful about when we're talking about generation. These things are trained on A LOT of data. If you're more "database-like" then certainly there's potential legal ramifications...

Edit: For context, by "look pretty good" I mean in comparison to other works I've seen. I think it is likely a ways from being useful in production. I'm not sure how much human labor would be required to fix the issues.




Ops ran out of edit time when I was posting my last two

  Prompt: A hawk flying in the sky
    PNG: https://0x0.st/8Hkw.png
         https://0x0.st/8Hkx.png
         https://0x0.st/8Hk3.png
    Note: This looks like it would need more work. I tried a few birds and generic too. They all seem to have similar form. 
  Prompt: A hawk with the head of a dragon flying in the sky and holding a snake
    PNG: https://0x0.st/8HkE.png
         https://0x0.st/8Hk6.png
         https://0x0.st/8HkI.png
         https://0x0.st/8Hkl.png
    Note: This one really isn't great. Just a normal hawk head. Not how a bird holds a snake either...
This last one is really key for judging where the tech is at btw. Most of the generations are assets you could download freely from the internet and you could probably get better ones by some artist on fiver or something. But the last example is more our realistic use case. Something that is relatively reasonable, probably not in the set of easy to download assets, and might be something someone wants. It isn't too crazy of an ask given Chimera and how similar a dragon is to a bird in the first place, this should be on the "easier" end. I'm sure you could prompt engineer your way into it but then we have to have the discussion of what costs more a prompt engineer or an artist? And do you need a prompt engineer who can repair models? Because these look like they need repairs.

This can make it hard to really tell if there's progress or not. It is really easy to make compelling images in a paper and beat benchmarks while not actually creating a something that is __or will become__ a usable product. All the little details matter. Little errors quickly compound... That said, I do much more on generative imagery than generative 3d objects so grain of salt here.

Keep in mind: generative models (of any kind) are incredibly difficult to evaluate. Always keep that in mind. You really only have a good idea after you've generated hundreds or thousands of samples yourself and are able to look at a lot with high scrutiny.


Yeah, this is absolutely light years off being useful in production.

People just see fancy demos and start crapping on about the future, but just look at stable diffusion. It's been around for how long, and what serious professional game developers are using it as a core part of their workflow? Maybe some concept artists? But consistent style is such an important thing for any half decent game and these generative tools shit the bed on consistency in a way that's difficult to paper over.

I've spent a lot of time thinking about game design and experimenting with SD/Flux, and the only thing I think I could even get close to production that I couldn't before is maybe an MTG style card game where gameplay is far more important than graphics, and flashy nice looking static artwork is far more important than consistency. That's a fucking small niche, and I don't see a lot of paths to generalisation.


Stable Diffusion and AI in general seems to be big in marketing at least. A friend decided to abandon engineering and move to marketing and the entire social media part of his job is making a rough post, converting it to corporate marketing language via AI and then generating an eye catching piece of AI art to slap on top.

When video generation gets easy he'll probably move to making short eye catching gifs.

When 3D models and AI in general improve I can imagine him for example generating shitty little games to put in banners. I've been using an adblocker for so long I don't know what exists nowadays but I remember there being banners with "shoot 5 ducks" type games where the last duck kill opens the advertisers website. Sounds feasible for an AI to implement reliably. If you can generate different games like that based on the interests of the person seeing the ad you can probably milk some clicks.


> been around for how long, and what serious professional game developers are using it as a core part of their workflow?

Are you in the game industry? If you’re not how would you even know they have not? As someone with some connections in the industry and may soon get more involved personally, I know at least one mobile gaming studio with quite a bit of funding and momentum that has started using a good deal of AI-generated assets that would have been handcrafted in the past.


Yeah the big problem I have with my field is that there seems to be stronger incentives to be chasing benchmarks and making things look good than there is to actually solve the hard problems. There is a strong preference for "lazy evaluation" which is too dependent on assuming high levels of ethical presentation and due diligence. I find it so problematic because this focus actually makes it hard for people to publish who are tackling these problems. Because it makes the space even noisier (already incredibly noisy by the very nature of the subject) and then it becomes hard to talk about details if they're presumed solved.

I get that we gloss over details, but if there's anywhere you're allowed to be nuanced and be arguing over details should it not be in academia?

(fwiw, I'm also very supportive of having low bars to publication. If it's void of serious error and plagiarism, it is publishable imo. No one can predict what is important or impactful, so we shouldn't even play that game. Trying to decide if it is "novel" or "good enough for <Venue>" is just idiotic and breeds collusion rings and bad actors)


The first guitar has one of the strings end at the sound hole, and six tuning knobs for five strings.

The second has similar problems: it has tuning knobs with missing winding posts, then five strings becoming four at the bridge. It also has a pickup under the fretboard.

Are these considered good capability examples?


I take back a fair amount of what I said.

It is pretty good with some easier assets that I suspect there's lots of samples of (and we're comparing to other generative models, not to what humans make. Humans probably still win by a good margin). But when moving out of obvious assets that we could easily find, I'm not seeing good performance at all. Probably a lot can be done with heavy prompt engineering but that just makes things more complicated to evaluate.


Thanks for this. The results are quite impressive, after trying it myself.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: