> Unlike students, LLMs can simultaneously process and synthesize insights from ...

ben_w · 2024-11-14T00:07:44 1731542864

Kinda. Yes they have flaws, absolutely they do.

But it's not a mere coincidence that history contains the substring "story" (nor that in German, both "history" and "story" are "Geschichte") — these are tales of the past, narratives constructed based on evidence (usually), but still narratives.

Language models may well be superhuman at teasing apart the biases that are woven into the minds writing the narratives… At least in principle, though unfortunately RLHF means they're also likely sycophantically adding whatever set of biases they estimate that the user has.

numpad0 · 2024-11-14T02:17:31 1731550651

They're subhuman about debiasing or any analytical tasks because they lack reasoning engines that we all have. They pick the most emotionally loaded narrative and go with it.

They can't handle counter-intuitive but absolutely logical cases like how eggplants and potatoes belong to same biological family but not radishes, instead they'll hallucinate and start gaslighting the user. Which might be okay for "second-year" students, but only going to be a root cause of some deadly gotcha in strategic decision-making.

They're language models. It's in the name. They work like one.

ben_w · 2024-11-14T08:53:26 1731574406

> They can't handle counter-intuitive but absolutely logical cases like how eggplants and potatoes belong to same biological family but not radishes

"Can't" you say. "Does", I say: https://chatgpt.com/c/6735b10c-4c28-8011-ab2d-602b51b59a3e

Not that it matters, this isn't a demonstration of reasoning, it's a demonstration of knowledge.

A better test would be if it can be fooled by statistics that have political aspects, so I went with the recent Veritasium video on this, and at least with my custom instructions, it goes off and does actual maths by calling out to the python code interpreter, so that's not going to demonstrate anything by itself: https://chatgpt.com/share/6735b727-f168-8011-94f7-a5ef8d3610...

But this then taints the "how would ${group member} respond to this?"; if I convince it to not do real statistics and give me a purely word-based answer, you can see the same kinds of narratives that you see actual humans give when presented with this kind of info: https://chatgpt.com/share/6735b80f-ed50-8011-991f-bccf8e8b95...

> They're language models. It's in the name. They work like one.

Yes, they are.

Lojban is also a language.

Look, I'm not claiming they're fantastic at maths (at least when you stop them from using tools), but the biasing I'm talking about is part of language as it is used: the definition of "nurse" may not be gendered, but people are more likely to assume a nurse is a woman than a man, and that's absolutely a thing these models (and even their predecessors like Word2Vec) pick up on:

https://chanind.github.io/word2vec-gender-bias-explorer/#/qu...

(from: https://chanind.github.io/nlp/2021/06/10/word2vec-gender-bia...)

This is the kind of de-bias and re-bias I mean.

numpad0 · 2024-11-14T14:36:00 1731594960

> "Can't" you say. "Does", I say:

Have you seriously not seen them make this kinds of grave mistakes? That's too much kool-aid you're taking.

ben_w · 2024-11-14T16:57:50 1731603470

I literally gave you a link to a ChatGPT session where it did what you said it can't do.

And rather than use that as a basis for claiming that it's reasoning, I'm also saying the test that you proposed and which I falsified, wasn't actually about reasoning.

Not sure what that would even be in a kook-aid themed metaphor in this case… "You said that drink was poisoned with something that would make our heads explode, Dave drank some and he's fine, but also poison doesn't do that and if the real poison is α-Amanitin we wouldn't even notice problems for at about a day"?