Hacker News new | past | comments | ask | show | jobs | submit login

The cost of the LLM isn't the only or even the most important cost that matters. Take the example of automating AI research: evaluating moves effectively means inventing a new architecture or modifying an existing one, launching a training run and evaluating the new model on some suite of benchmarks. The ASI has to do this in a loop, gather feedback and update its priors - what people refer to as "Grad student descent". The cost of running each train-eval iteration during your search is going to be significantly more than generating the code for the next model.





You're talking about applying tree search as a form of network architecture search (NAS), which is different from applying it to LLM output sampling.

Automated NAS has been tried for (highly constrained) image classifier design, before simpler designs like ResNets won the day. Doing this for billion parameter sized models would certainly seem to be prohibitively expensive.


I'm not following. How do you propose search is performed by the ASI designed for "AI Research"? (as proposed by the article)

Fair enough - he discusses GPT-4 search halfway down the article, but by the end is discussing self-improving AI.

Certainly compute to test ideas (at scale) is the limiting factor for LLM developments (says Sholto @ Google), but if we're talking moving beyond LLMs, not just tweaking them, then it seems we need more than architecture search anyways.


Well people certainly are good at finding new ways to consume compute power. Whether it’s mining bitcoins or training a million AI models at once to generate a “meta model” that we think could achieve escape velocity. What happens when it doesn’t? And Sam Altman and the author want to get the government to pay for this? Am I reading this right?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: