OK. I'm imagining a correlation engine that looks through code as a series of prompts that are used to generate more code from the corpus that is statistically likely to follow.
And now I'm transforming that through the concept of taking a photograph and applying the clone tool via a light airbrush.
Repeat enough times, and you get uncompilable mud.
Saying they definitely won't or they definitely will are equally over-broad and premature.
I currently expect we'll need another architectural breakthrough; but also, back in 2009 I expected no-steering-wheel-included self driving cars no later than 2018, and that the LLM output we actually saw in 2023 would be the final problem to be solved in the path to AGI.
GPT4 does inference at 560 teraflops. Human brain goes 10,000 teraflops. NVIDIA just unveiled their latest Blackwell chip yesterday which goes 20,000 teraflops. If you buy an NVL72 rack of the things, it goes 1,400,000 teraflops. That's what Jensen Huang's GPT runs on I bet.
> GPT4 does inference at 560 teraflops. Human brain goes 10,000 teraflops
AFAICT, both are guesses. The low-end estimate I've seen for human brains are ~ 162 GFLOPS[0] to 10^28 FLOPS[1]; even just the model size for GPT-4 isn't confirmed, merely a combination of human inference of public information with a rumour widely described as a "leak", likewise the compute requirements.
They're not guesses. We know they use A100s and we know how fast an A100 goes. You can cut a brain open and see how many neurons it has and how often they fire. Kurzweil's 10 petaflops for the brain (100e9 neurons * 1000 connections * 200 calculations) is a bit high for me honestly. I don't think connections count as flops. If a neuron only fires 5-50 times a second then that'd put the human brain at .5 to 5 teraflops it seems to me. That would explain why GPT is so much smarter and faster than people. The other estimates like 1e28 are measuring different things.
> They're not guesses. We know they use A100s and we know how fast an A100 goes.
And we don't know how many GPT-4 instances run on any single A100, or if it's the other way around and how many A100s are needed to run a single GPT-4 instance. We also don't know how many tokens/second any given instance produces, so multiple users may be (my guess is they are) queued on any given instance. We have a rough idea how many machines they have, but not how intensively they're being used.
> You can cut a brain open and see how many neurons it has and how often they fire. Kurzweil's 10 petaflops for the brain (100e9 neurons * 1000 connections * 200 calculations) is a bit high for me honestly. I don't think connections count as flops. If a neuron only fires 5-50 times a second then that'd put the human brain at .5 to 5 teraflops it seems to me.
You're double-counting. "If a neuron only fires 5-50 times a second" = maximum synapse firing rate * fraction of cells active at any given moment, and the 200 is what you get from assuming it could go at 1000/second (they can) but only 20% are active at any given moment (a bit on the high side, but not by much).
Total = neurons * synapses/neuron * maximum synapse firing rate * fraction of cells active at any given moment * operations per synapse firing
1e11 * 1e3 * 1e3 Hz * 10% (of your brain in use at any given moment, where the similarly phrased misconception comes from) * 1 floating point operation = 1e16/second = 10 PFLOP
It currently looks like we need more than 1 floating point operation to simulate a synapse firing.
> The other estimates like 1e28 are measuring different things.
Things which may turn out to be important for e.g. Hebbian learning. We don't know what we don't know. Our brains are much more sample-efficient than our ANNs.
Synapses might be akin to transistor count, which is only roughly correlated with FLOPs on modern architectures.
I've also heard in a recent talk that the optic nerve carries about 20 Mbps of visual information. If we imagine a saturated task such as the famous gorilla walking through the people passing around a basketball, then we can arrive at some limits on the conscious brain. This does not count the autonomic, sympathetic, and parasympathetic processes, of course, but those could in theory be fairly low bandwidth.
There is also the matter of the "slow" computation in the brain that happens through neurotransmitter release. It is analog and complex, but with a slow clock speed.
My hunch is that the brain is fairly low FLOPs but highly specialized, closer to an FPGA than a million GPUs running an LLM.
> I don't think connections count as flops. If a neuron only fires 5-50 times a second then that'd put the human brain at .5 to 5 teraflops it seems to me.
That assumes that you can represent all of the useful parts of the decision about whether to fire or not to fire in the equivalent of one floating point operation, which seems to be an optimistic assumption. It also assumes there's no useful information encoded into e.g. phase of firing.
Imagine that there's a little computer inside each neuron that decides when it needs to do work. Those computers are an implementation detail of the flops being provided by neurons, and would not increase the overall flop count, since that'd be counting them twice. For example, how would you measure the speed of a game boy emulator? Would you take into consideration all the instructions the emulator itself needs to run in order to simulate the game boy instructions?
> Imagine that there's a little computer inside each neuron that decides when it needs to do work
Yah, there's -bajillions- of floating point operation equivalents happening in a neuron deciding what to do. They're probably not all functional.
BUT, that's why I said the "useful parts" of the decision:
It may take more than the equivalent of one floating point operation to decide whether to fire. For instance, if you are weighting multiple inputs to the neuron differently to decide whether to fire now, that would require multiple multiplications of those inputs. If you consider whether you have fired recently, that's more work too.
Neurons do all of these things, and more, and these things are known to be functional-- not mere implementation details. A computer cannot make an equivalent choice in one floating point operation.
Of course, this doesn't mean that the brain is optimal-- perhaps you can do far less work. But if we're going to use it as a model to estimate scale, we have to consider what actual equivalent work is.
Yes, but it probably doesn't tell the whole story.
There's basically a few axes you can view this on:
- Number of connections and complexity of connection structure: how much information is encoded about how to do the calculations.
- Mutability of those connections: these things are growing and changing -while doing the math on whether to fire-.
- How much calculation is really needed to do the computation encoded in the connection structure.
Basically, brains are doing a whole lot of math and working on a dense structure of information, but not very precisely because they're made out of meat. There's almost certainly different tradeoffs in how you'd build the system based on the precision, speed, energy, and storage that you have to work with.
That's is based on old assumption of neuron function.
Firstly, Kurzweil underestimates the number connections by order of magnitude.
Secondly, dentritic computation changes things. Individual dentrites and the dendritic tree as a whole can do multiple individual computations. logical operations low-pass filtering, coincidence detection, ... One neuronal activation is potentially thousands of operations per neuron.
Single human neuron can be equivalent of thousands of ANN's.
They might generate improvements, but I’m not sure why people think those improvements would be unbounded. Think of it like improvements to jet engines or internal combustion engines - rapid improvements followed by decades of very tiny improvements. We’ve gone from 32-bit LLM weights down to 16, then 8, then 4 bit weights, and then a lot of messy diminishing returns below that. Moore’s is running on fumes for process improvements, so each new generation of chips that’s twice as fast manages to get there by nearly doubling the silicon area and nearly doubling the power consumption. There’s a lot of active research into pruning models down now, but mostly better models == bigger models, which is also hitting all kinds of practical limits. Really good engineering might get to the same endpoint a little faster than mediocre engineering, but they’ll both probably wind up at the same point eventually. A super smart LLM isn’t going to make sub-atomic transistors, or sub-bit weights, or eliminate power and cooling constraints, or eliminate any of the dozen other things that eventually limit you.
Saying that AI hardware is near a dead end because Moore's law is running out of steam is silly. Even GPUs are very general purpose, we can make a lot of progress in the hardware space via extreme specialization, approximate computing and analog computing.
I'm mostly saying that unless a chip-designing AI model is an actual magical wizard, it's not going to have a lot of advantage over teams of even mediocre human engineers. All of the stuff you're talking about is Moore's Law limited after 1-2 generations of wacky architectural improvements.
Bro, Jensen Huang just unveiled a chip yesterday that goes 20 petaflops. Intel's latest raptorlake cpu goes 800 gigaflops. Can you really explain 25000x progress by the 2x larger die size? I'm sure reactionary America wanted Moore's law to run out of steam but the Taiwanese betrayal made up for all the lost Moore's law progress and then some.
That speedup compared to Nvidia's previous generation came nearly entirely from: 1) a small process technology improvement from TSMC, 2) more silicon area, 3) more power consumption, and 4) moving to FP4 from FP8 (halving the precision). They aren't delivering the 'free lunch' between generations that we had for decades in terms of "the same operations faster and using less power." They're delivering increasingly exotic chips for increasingly crazy amounts of money.
Pro tip: If you want to know who is the king of AI chips, compare FLOPS (or TOPS) per chip area, not FLOPS/chip.
As long as the bottleneck is the fab capacity as wafers per hous, the number of operations per second per chip area determines who will produce more compute with best price. It's a good measure even between different technology nodes and superchips.
Nvidia is leader for a reason.
If manufacturing capacity increases to match the demand in the future, FLOPS or TOPS per Watt may become relevant, but now it's fab capacity.
LLMs are so much more than you are assuming… text, images, code are merely abstractions to represent reality. Accurate prediction requires no less than usefully generalizable models and deep understanding of the actual processes in the world that produced those representations.
I know they can provide creative new solutions to totally novel problems from firsthand experience… instead of assuming what they should be able to do, I experimented to see what they can actually do.
Focusing on the simple mechanics of training and prediction is to miss the forest for the trees. It’s as absurd as saying how can living things have any intelligence? They’re just bags of chemicals oxidizing carbon. True but irrelevant- it misses the deeper fact that solving almost any problem deeply requires understanding and modeling all of the connected problems, and so on, until you’ve pretty much encompassed everything.
Ultimately it doesn’t even matter what problem you’re training for- all predictive systems will converge on general intelligence as you keep improving predictive accuracy.
And now I'm transforming that through the concept of taking a photograph and applying the clone tool via a light airbrush.
Repeat enough times, and you get uncompilable mud.
LLMs are not going to generate improvements.