>Tasks that are instructed using conditional clauses also require a simple form ...

godelski · 2024-03-20T02:50:27 1710903027

There's also many issues with just interpreting the results of the work.

  Our best models can perform a previously unseen task with an average performance of 83% correct based solely on linguistic instructions (that is, **zero-shot learning**).

They used GPT-2 from HuggingFace. I'm unsure what data this model is trained on. If it is the original GPT-2 checkpoint then that data is unknown. I just refuse to let anyone casually claim "zero-shot" when the training data is unknown. GPT-2 was trained on 40GB of text data (which is A LOT! It includes 8 million documents and 45 million web pages). This may not be the crazy sizes we see today, but even then the community was concerned about accurately stating what was in distribution and out of distribution. You can't know if you don't know what it was trained on AND how it was trained (since the mathematics can also put pressure on certain things that may not be realized at first).

In addition to this, their efforts look to be mainly using clustering techniques. CLIP itself is a clustering algorithm. ANNs frequently do clustering as well, but you know, there's some black box nature to them (but not entirely opaque either).

It is very hard to draw causal conclusions when you use either of these two things. Not to mention the fact that causality itself is difficult given that different graphs can be indistinguishable.

robwwilliams · 2024-03-19T22:00:14 1710885614

Yes, the word “represented” is too widely used and abused in neuroscience—to the point where a frog has “fly detector” neurons. Humberto Maturana pushed back against this pervasive idea. Chapter 4 of Terry Winograd’s and Francesco Valera’s Understanding Computers and Cognition has a good overview of common presumptions.

Given that CNS is a 700 million year hack, there will be lots of odd tricks used to generate effective behaviors.

ben_w · 2024-03-19T20:16:22 1710879382

> Or that humans can even do xor with a single neuron.

That's news to me.

I'm not hugely surprised given I've heard a biological neuron is supposed to be equivalent to a small ANN network, but still, first I've heard of that claim.

orbifold · 2024-03-19T20:21:39 1710879699

https://www.science.org/doi/full/10.1126/science.aax6239

ben_w · 2024-03-19T21:47:51 1710884871

Thanks :)

actionfromafar · 2024-03-20T10:37:56 1710931076

It's a similar situation to terminology for the atom.

It originally in Greek meant "the smallest indivisible unit of matter".

Scientists then took the name and named various elements (hydrogen, gold, etc) as various atoms.

So, this is like when computing took the idea of a neuron as "the smallest indivisible unit of memory and calculation" and ran with it.

Fast forward to now, when we know that each "atom" has a bunch of smaller stuff internally, but by now it's too late to change the terminology.

And now we also know that a biological "neuron" is something more like an embedded CPU or FPGA in its own right, each with a bunch of computing and storage capability and modes.

canjobear · 2024-03-19T21:33:12 1710883992

There’s a long debate in neuroscience about whether information is encoded in timing of individual spikes or only their rates (where rate coding is a bit more similar to how ANNs work, but still different). It hasn’t been decided by any one paper, nor is it likely to be: it seems that different populations of neurons in different parts of the brain encode information through different means.

robwwilliams · 2024-03-19T22:03:31 1710885811

Not either-or. It is both. Spike rate variation is way too slow for some types of low level compute. Spike timing us critical for actions as “simple” as throwing a fast ball into the strike zone.

sam0x17 · 2024-03-20T00:15:51 1710893751

> Or that humans can even do xor with a single neuron.

having a single neuron that has learned xor != understanding xor

Function approximation is trivial, understanding of what said functions can do and when to use them is much harder (though is arguably still function approximation)

nyrikki · 2024-03-20T01:47:30 1710899250

Well xor is linearly inseparable, which is impossible with a single perceptron.

> Our models by contrast make tractable predictions for what popu- lation and single-unit neural representations are required to support compositional generalization and can guide future experimental work examining the interplay of linguistic and sensorimotor skills in humans.

Do you see where that causes an issue with supervenience? Especially when mixed with STDP which could change that more?

It is confusing the map with the territory. At least with the extreme strength of their claim.

barrenko · 2024-03-20T08:01:52 1710921712

I'm early, so to say, in biochemistry, but could any of this relate to the level "below" neurons - to ion channels?