Hacker News new | past | comments | ask | show | jobs | submit login

Well, they buried the lede with this one. Using LLMs were better for some tasks and actually made it worse for others.

The first task was a generalist task ("inside the frontier" as they refer to it), which I'm not surprised has improved performance, as it purposely made to fall into an LLM's areas of strength: research into well-defined areas where you might not have strong domain knowledge. This also is the mainstay of early consultants' work, in which they are generalists in their early careers – usually as business analysts or similar – until they become more valuable and specialise later on.

LLMs are strong in this area of general research because they have generalised a lot of information. But this generalisation is also its weakness. A good way to think about it is it's like a journalist of research. If you've ever read a newspaper, you often think you're getting a lot of insight. However, as soon as you read an article on an area of your specialisation, you realise they've made many flaws with the analysis; they don't understand your subject anywhere near the level you would.

The second task (outside the frontier) required analysis of a spreadsheet, interviews and a more deeply analytical take with evidence to back it up. These are all tasks that LLMs aren't strong at currently. Unsurprisingly, the non-LLM group scored 84.5%, and between 60% and 70.6% for LLM users.

The takeaway should be that LLMs are great for generalised research but less good for specialist analytical tasks.




I was thinking about this last night. It’s a new version of Gell-Mann amnesia. I call it LLm-man amnesia.

When I ask a programming question, chat GPT hallucinates something about 20% of the time and I can only tell because I’m skilled enough to see it. For all the other domains I ask it questions if I should assume at least as much hallucination and incorrect information.


I see this as for drill-down thinking from a broad -> specific concept AI seems to be helpful when supplementing specialist work. However like you both mentioned: when needing more focused and integrated answers AI tends hinders performance.

However as the paper noted, when working within AIs areas of strength it improved not only efficiency but the quality of the work as well (accounting for the hallucinations). As you mentioned:

> When I ask a programming question, chat GPT hallucinates something about 20% of the time and I can only tell because I’m skilled enough to see it

This matches their Centaur approach, delineating between AI and one’s own skills for a task which—with generalized work—seems to fair better than not using AI at all.


LLMs are broadly good at things that average knowledge workers are good at or can be trained to be good at reasonably quickly.


Comparing LLM to journalists is good insight.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: