WebSim, WorldSim and the Summer of Simulative AI

swyx · 2024-04-27T18:26:34

author here! I absolutely enjoyed interviewing Joscha Bach who was graceful enough to give 30mins of his time with zero prep and no idea who I was. I also am in a unique position to report on the rise of both WorldSim and WebSim as I literally saw them both happen up close. questions welcome!

if you liked the ChatGPT Virtual Machine story from 2022: https://news.ycombinator.com/item?id=33847479

you will like this.

if you enjoy behind the scenes, i live streamed the making of the video, audio, and essay last night with a few people on twitter/youtube https://x.com/swyx/status/1784110650777854148

comments and tough love welcome!

fjkdlsjflkds · 2024-04-27T20:13:42

A quick comment: The idea seems interesting/entertaining, but the requirement to login with a Google account will make some people (like me) simply not even try it.

ClassicRob · 2024-04-27T20:43:23

Login with google was just the quickest thing we could do to get auth, we'll roll out more ways to sign in soon. Thanks for the feedback!

mlb_hn · 2024-04-27T18:34:16

nice overview of progress over time. are there quant metrics for the sim capabilities or is it mostly vibes?

ClassicRob · 2024-04-27T19:05:59

Cofounder of Websim here. Right now it's not clear that there's any eval for a language model's simulation capabilities. Internally, we've (vibe) tested Llama 3, Command R+, WizardLM 8x22b, Mistral Large (first version of Websim came out of a Mistral hackathon) and GPT-4 Turbo and found them all lacking, due to either meh website outputs or mode collapse from reinforcement learning (lack of creativity and flexibility). That also may be a "skill issue" thing because our system prompt is very much optimized for Claude 3's "mind." We'll release functionality in the next week or two that lets users update the system prompt, in which case this may be less of an issue

Claude 3 has a much broader latent space, and seems to "enjoy" imagining things. It hasn’t been banged into too specific of an assistant shape, and doesn’t suffer the same degree of “mode collapse” https://lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-m...

Even Sonnet produces mindblowingly good outputs (https://x.com/RobertHaisfield/status/1774579381132050696). Haiku is capable of producing full websites with insightful and creative content, even if it isn't as capable as Sonnet/Opus. For example, I found Curio, an esolang where every line of code is a living, sentient being with its own unique personality, memories, and goals, mostly by browsing around with Haiku (https://x.com/RobertHaisfield/status/1782586807261233620). Although Haiku tends to perform better when it is few-shot prompted with outputs from Sonnet or Opus earlier in the "browser history."

smusamashah · 2024-04-27T20:44:21

https://websim.ai/ is the project's website being discussed in the article

grfhjyffbnh · 2024-04-28T02:52:22

> exploring the latent space of multiverses adjacent to ours

Poe’s law strikes again. Are they serious? Is it satire? Pretty hilarious.