Hacker News new | past | comments | ask | show | jobs | submit login

This is just a guess, but I don't think there's such a deep lesson here; language models and image models have simply been developed by mostly-different groups of researchers who chose different tradeoffs. In an alternate history it may very well have gone the other way around.



I would disagree. We have image generation with a variety of architectures. Diffusion models aside, it still takes a lot less parameters to model State of the art image generators with transformers (eg Parti).

Simplifying a bit, mapping (which is essentially the main goal of image generators and especially transformer generators) is just less complex than prediction.

It's like how bilingual llms can be much better translators than traditional map this sentence to this sentence translators. https://github.com/ogkalu2/Human-parity-on-machine-translati...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: