Hacker News new | past | comments | ask | show | jobs | submit login

My guess is, as is, they would be able to play, but not be very good.

The main issue I see is the spatial component is hard to describe in text. The new vision models make it easier, but still I imagine it's not trivial to integrate all the mechanics plus the spatial component on the limited prompt space.

I do think that 1) combining with the hand crafted AI and 2) having an "LLM advisors" system where for a given aspect (eg military) the "advisor" would present the options and tradeoffs to the "Main AI" and the role of the latter is to weigh the tradeoffs between the options presented by the advisors.

And what I do know is that it could be so much more immersive than the current hard coded AI!




>spatial component is hard to describe in text

Idefics, mingpt4, Next-GPT and LLava are open source multimodal LLMs that can read images.


Yes, but do they get a vague idea of what's onscreen or could they really see what's going on in each tile, keep track of all the stats and use those to inform their decisions?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: