To your point. I find the 2+2=5 cases more interesting, and would like to see mo...

potatolicious · on Feb 15, 2023

Considering that in its initial demo, on very anodyne and "normal" use cases like "plan me a Mexican vacation" it spit out more falsehoods than truth... this seems like a problem.

Agreed on the meta-point that deliberate tool mis-use, while amusing and sometimes concerning, isn't determinative of the fate of the technology.

But the failure rate without tool mis-use seems quite high anecdotally, which also comports with our understanding of LLMs: hallucinations are quite common once you stray even slightly outside of things that are heavily present in the training data. Height of the Eiffel Tower? High accuracy in recall. Is this arbitrary restaurant in Barcelona any good? Very low accuracy.

The question is how much of the useful search traffic is like the latter vs. the former. My suspicion is "a lot".

williamcotton · on Feb 16, 2023

> But the failure rate without tool mis-use seems quite high anecdotally

The problem with your judgement is you click on every “haw haw, ChatGPT dumb” and you don’t read any of the articles that show how an LLM works, what is is quantitatively good at and bad at and how to improve performance on tasks using other methods such as PAL, Toolformer or other analytic augmentation methods.

Go read some objective studies and you won’t be yet another servomechanism blindly spreading incorrect assumptions based on anecdotes from attention starved bloggers.

potatolicious · on Feb 16, 2023

Hi, I work on LLMs daily, along with some intensely talented, skilled, and experienced machine learning engineers who also work on LLMs daily. My opinion is formed by both my own experiences with LLMs as well as the opinions of those experts.

Wanna try again? Alternatively you can keep riding the hype train from techfluencers who keep promising the moon but failing to deliver, just like they did for crypto.

currymj · on Feb 15, 2023

in my experience it happens pretty regularly if you ask one of these things to generate code (it will often come up with plausible library functions that don't exist), or to generate citations (comes up with plausible articles that don't exist).