I continue to be impressed by Anthropic’s work and their dual commitment to scal...

handwarmers · 2024-05-21T15:46:40 1716306400

What is often frustrating to me at least is the arbitrary definition of "safety" and "ethics", forged by a small group of seemingly intellectually homogenous individuals.

ben_w · 2024-05-21T22:20:53 1716330053

Yes, even though this is a mild improvement on 20 years ago when it was an even more homogenous group.

Given how often China comes up in the context of AI, I'm wondering: Lots of people in the West treat China as mysterious and alien. I wonder how true that really is (e.g. Confucianism)? Or if it ever was (e.g. perhaps it used to be before industrialisation, which homogenises everyone regardless of the origin)?

whimsicalism · 2024-05-21T15:47:55 1716306475

Say more, this is half a thought

CamperBob2 · 2024-05-21T19:46:46 1716320806

E.g., the common sentiment that "NSFW" output is to be prohibited, regardless of whether you work in a steel mill or a church.

phyalow · 2024-05-21T15:48:32 1716306512

Alot of this really isnt new, Andrej Karpathy covered the principles here 8 years ago for CS231n at Stanford https://youtu.be/yCC09vCHzF8&t=1640

kalkin · 2024-05-21T16:04:54 1716307494

This is an illustrative comment for meta reasons, I think. Karpathy's lecture almost certainly doesn't cover the superposition hypothesis (which hadn't been invented for ANNs 8 years ago), or sparse dictionary learning (whose application to ANNs is motivated by the superposition hypothesis). It certainly doesn't talk about actual specific features found in post-ChatGPT language models. What's happening here seems like a thing LLMs are often accused of dismissively - you're pattern-matching to certain associated words without really reasoning about what is or isn't new in this paper.

I worry this is going to come across as insulting, but that's not my intention. I do this too sometimes; I think everyone does. The point is we shouldn't define true reasoning so narrowly that we think no system capable of it would ever be caught doing what most of us are in fact doing most of the time.

ben_w · 2024-05-21T22:16:53 1716329813

> I worry this is going to come across as insulting, but that's not my intention. I do this too sometimes; I think everyone does. The point is we shouldn't define true reasoning so narrowly that we think no system capable of it would ever be caught doing what most of us are in fact doing most of the time.

Indeed; to me LLMs pattern match (yes, I did spot the irony) to system-1 thinking, and they do a better job of that than we humans do.

Fortunately for all of us, they're no good at doing system-2 thinking themselves, and only mediocre at translating problems into a form which can be used by a formal logic system that excels at system-2 thinking.

soco · 2024-05-21T17:10:10 1716311410

By that reasoning even humans are not thinking. But of course humans are always excluded from such research - if it's human it's thinking by default, damn the reasoning. Then of course we have snails and dogs and apes, are they thinking? Were the Neanderthals thinking? By which definition? Moving posts is a too weak metaphor for what is going on here where everybody distorts the reasoning for whatever point they're trying to make today. And because I can't shut up I'll just add my user view: if it works like a duck and outputs like a duck, it's duck enough for any practical use, let's move on and see what do we do with it (like, use or harness or adopt or...).

xtiansimon · 2024-05-21T17:56:57 1716314217

> “By that reasoning even humans are not thinking”

I’m a neophyte, so take this as such. If we can agree that people output is not always the product of thinking, then I’d be more willing to accept computational innovations as thought-like.

whimsicalism · 2024-05-21T15:56:49 1716307009

neural probing has been around for a while, true - and this result is definitely building on past results. it’s basically just a scaled up version of their paper from a little while ago anywho

but Karpathy was looking at very simple LSTMs of 1-3 layers, looking at individual nodes/cells, and these results have generally thus far been difficult to replicate among large scale transformers. Karpathy also doesn’t provide a recipe for doing this in his paper, which makes me think he was just guess and checking various cells. The representations discovered are very simple