Hacker News new | past | comments | ask | show | jobs | submit login

Seems to me like the whole history of neural nets is basically crafting models with well-behaved gradients to make gradient descent work well. That, and models that can achieve high utilization of available hardware. The surprising thing is that models exist where the gradients are so well-behaved that we can learn GPT-4 level stuff.



There's plenty of interesting neural network designs out there but they're being overshadowed by transformers due to their recent success. I personally thing that the main reason transformers work so well is because they actually step away from the multi layer perceptron stuff and introduce some structure and in a way sparsity.


Also, multi-head attention strikes me as being about as close to how language semantics seems to actually work in human brains as I've seen.

Lots of caveats there, of course. First off, I don't know much about the neurology, I just have an amateur interest in second language acquisition research that sometimes brings me into contact with this sort of thing. On the ANN side, which is closer to my actual wheelhouse, we definitely don't actually have any way of knowing if the actual mechanism is all that close, and I'm guessing it probably isn't even close since ANN's don't actually work that similarly to brains. Nor does it need to be, but, intuitively, there's still something promising about an ANN architecture that's vaguely capable of mimicking the behavior of modules in an existing system (human brains) that's well known to be capable of doing the job. I'm not super wild about the bidirectional recurrent layers, either, because they impose some restrictions that clearly aren't great, such as the hard limit on input size. et cetera. But it still strikes me as another big step in a good direction.


I'm currently working on a variation of a spiking neural network that learns by making and purging connections between neurons, which so far has been pretty interesting, though I am having a hard time getting it to output anything more than just the patterns it recognised. I did play around with adding its outputs to the input list, making it sort of recurrent but its practically impossible to decode anything thats going on inside of the network. Im thinking of tracking the inputs around to see what its doing right now, might be interesting to see it generate some sort of tree-like structure.


Are you familiar with the edge popup algorithm introduced in "What's Hidden in a Randomly Weighted Neural Network?" https://arxiv.org/abs/1911.13299v2

Seems relevant to what you're working on. It starts with a randomly initialized, overparameterized neural net, but instead of gradient descent backpropagation, it learns by deleting connection edges.


I haven't read it, thanks a lot! I'm probably going to use it in an essay I'm writing about the topic.


That's probably true for most kinds of NN architectures, including convolutional layers and older recurring architectures (LSTM, etc). Fully connected networks do not seem to be a necessary and certainly not efficient way to represent the mechanisms that operate in the "real world", so clever way to make the networks sparse is an important key.

But it's equally important to create architectures that allow efficient backpropagation of errors.

It does seem like transformers are pretty good at both, already.

I kind of hope we're not getting much something radically better anytime soon, because it seems like AGI is already approaching faster than we can prepare for.

Then again, I would expect that someone somewhere is already using transformer based networks to develop some brand new architecture that does in fact provide such a leap.


>There's plenty of interesting neural network designs out there

Where could a person learn more about these?


It's less about enumerating the architectures that have been tried before, and more about recognizing the modularity of NN components and the different perspectives on what those modules might represent.


> gradients are so well-behaved that we can learn GPT-4 level stuff

What are "well-behaved" gradients?

What type of GPT-4 level stuff?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: