Hacker News new | past | comments | ask | show | jobs | submit login
The State of Generative Models (nrehiew.github.io)
103 points by qouteall 43 days ago | hide | past | favorite | 6 comments



This article does a great job summarizing the rapid advancements in reasoning and AI agents. Models like OpenAI's o1/o3 and DeepSeek's r1 demonstrate how inference-time compute and structured Chain of Thought (CoT) are pushing LLM capabilities in STEM and coding tasks. The speculation about pivot words and backtracking behavior learned through reinforcement learning is particularly intriguing—it could be transformative for reasoning in domains with external verification.

The discussion on agents and moving beyond chat interfaces toward workflows like Cursor resonated with me. A shift in Human-AI interaction paradigms feels essential to unlock the full potential of autonomous agents. However, as the author notes, error rates and cost remain significant hurdles.

I've been experimenting with multi-agent systems in Python for the last year and find measuring performance and success one of the hardest parts. While today's LLM agents are still primitive, they already show immense potential. Even without advances in base models, creative agent design patterns could unlock more functionality, and with better reasoning and larger context windows, the possibilities expand even further.


This is comedy gold


Let him cook.


This was a really well done overview of the evolution of AI in 2024 at a level deeper than just model releases, benchmarks, and agentic systems but digestible by someone at that level.


Great post but there are some advance terminology that are not that straight-forward in the nitty-details of architectures.. more references would be helpful.


Considering OpenAI has consumed almost all human generated data, it's pretty much a dead end now, like anything ANN.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: