Isn't it disconcerting that brains consume so little energy doing continuous hig...

Retric · 2024-08-23T04:00:14 1724385614

We don’t hyper optimize computers for energy use the way evolution did. But even still the brain uses about as much energy as 4 Apple M3 chips which isn’t nothing in terms of computer AI. Multiply that by years of training time and it doesn’t seem that crazy different IMO.

trilbyglens · 2024-08-23T20:35:18 1724445318

I think more impressive is that the energy supply system is efficient enough to run a days worth of software generation off of a ham sandwich.

interstice · 2024-08-23T03:56:33 1724385393

Degrees of entropy perhaps? CPU's are low entropy in that they are highly ordered and organised - firing only when specifically expected to, whereas brains are high entropy in that they get to a result by riding a fine line between randomness and order and those connections are not inherently expensive.

That and billions of years of optimising for low energy switches.

readyplayernull · 2024-08-23T04:41:26 1724388086

Yeah, there could be something. Computers fight entropy to gain precision, so intelligence should benefit of fuzzyness to get the "free energy meal". But that brings a sort of uncertainty principle. We might be able to grasp what intelligence is, but like a cloud of electrons we will never be sure of its detailed state.

burnished · 2024-08-23T05:39:03 1724391543

Think that 'grasp' in 'grasp what intelligence is' might be load bearing there.

caseyy · 2024-08-23T04:31:27 1724387487

You also have to consider what intelligence is useful. We already have human brains in abundance, but computers can do certain things we can’t.

I think LLMs are a bit of a red herring. We will discover powerful ways to use machine learning that are very different from human brains. The goal doesn’t need to be to replace them.

NoPicklez · 2024-08-23T05:03:02 1724389382

Well the human approach to intelligence and its tool has had approx 6 million years to improve and evolve and has created the reality we experience today, which could be really intelligent behavior or not. We don't have anything to compare it to.

Think of the size of the computers we had in the 1960's and their computing power, now you have that multiple times over on your wrist. The tools we use seem to get bigger and bigger, then smaller, then bigger until a new paradigm becomes available (e.g. quantum)

planckscnst · 2024-08-23T04:57:26 1724389046

Computers aren't evolving in an energy-scarce environment (yet) we (currently) pretty much give computers as much energy as they want. Computers are evolving in an environment where a dependent species relies on them for that species' own weaknesses.

drdaeman · 2024-08-23T03:48:57 1724384937

> brains consume so little energy doing continuous high complexity tasks

Do they? While I've read that there's not a lot of difference between just chilling and thinking of some hard problem (not sure if true, though - haven't really dug to verify), thinking is thinking, so I guess that makes sense.

But when e.g. a reaction time is important, I suspect, it could be quite energy-hungry. I mean, my watch sends me that I have elevated heart rate when I'm playing any fast-paced videogames, and I guess all that blood pumping surely has some biological reason (supplying more oxygen or something) and isn't just for funsies. I'm not a biologist, so this is purely based on my own observations and stuff I've heard somewhere.

eric-hu · 2024-08-23T04:28:19 1724387299

> While I've read that there's not a lot of difference between just chilling and thinking of some hard problem

I find that difficult to reconcile with chess grandmaster calorie consumption.

https://www.npr.org/2019/09/18/762046422/the-chess-grandmast...

> Polard, this company that tracks heart rates, monitored chess players during a tournament and found out that this 21-year-old Russian grandmaster, Mikhail Antipov, had burned 560 calories in two hours, which we found out was roughly what Roger Federer would burn in one hour of singles tennis.

caseyy · 2024-08-23T04:40:31 1724388031

I notice this in software engineering, too. Challenging problems drain brainpower quicker.

I was working on difficult SOTA stuff for the last few months, half of it was reading articles and half was implementing them in software. I’d crash by 3.30-4 pm every day unless I had an extra meal around then.

Lost 5 kilos, too. So it wasn’t all going into my waistline.

You can observe this, I think. Drivers too tend to make more mistakes if they don’t eat well on long drives.

axdsk · 2024-08-23T22:28:16 1724452096

The right tool might be Spiking Neural Networks. Due to their sparse activation, event driven computation, and temporal coding. This all depends on how good neuromorphic chips get.

cen4 · 2024-08-23T03:57:24 1724385444

Babies don't come installed with those brains. They take a decade or two to pick up most skills that have any value. And after all that, the vast majority end up playing replaceable support npc roles. What's the energy consumption? To get from - why is the compass pointing north to electromagnetism, it took thousands of years and the activity of millions of brains. We see a highly misleading and compressed version of that computing process in our textbooks. As they say knowledge is inherited wealth. People over estimate how much they deserve it or have earned it.

visarga · 2024-08-23T04:15:58 1724386558

A child learns to talk in about 150M tokens, GPT-4 needs 15T tokens. Humanity needed the total speech of 110B people to evolve to current level, which comes about 51,395,437,500,000,000,000 tokens, about 3.4 million times more than GPT-4's training run, and 342M times more than a child needs to learn.

That is to show the relative cost between imitation and discovery. Evolution is on the order of 0.3 trillion times slower than catching up by imitation.

continuational · 2024-08-23T04:25:05 1724387105

That's apples to oranges.

No human ingests that many tokens of speech; individually we learn from far fewer tokens.

og_kalu · 2024-08-23T04:57:07 1724389027

No human brain is the near blank slate an untrained transformer is either.

"We only need x tokens of ingested speech" to learn language doesn't hit the same when you have billions of years of the brain baking in an oven to get to that point.

But i agree, it's not a direct comparison.

Jensson · 2024-08-23T11:25:05 1724412305

> No human brain is the near blank slate an untrained transformer is either.

Why can't we start out a transformer at a better state and then teach it language in few tokens? Seems like a problem with architecture.

og_kalu · 2024-08-23T21:16:29 1724447789

Because we don't know or understand what a better state would be and so by far the most successful attempts are with architectures with as clean a slate/free of bias as possible.

There's nothing wrong with blank slates. It's not a problem, whatever that means. It just is.

defrost · 2024-08-23T04:19:30 1724386770

> Humanity needed the total speech of 110B people to evolve to current level

Seems a loooong stretch to say the least.

visarga · 2024-08-23T04:23:10 1724386990

I find it interesting to estimate on tokens so we can meaningfully compare people with LLMs.

defrost · 2024-08-23T04:27:57 1724387277

It's more the implied dependancy tree ... did Isaac Newton rest on the shoulders of 110 B people to conceive of calculus just as Liebnitz did the same?

How much input did the Kalahari bushmen of that time, and their ancestors also, have to that infintesimal leap forwards?

visarga · 2024-08-23T05:36:08 1724391368

The dependency tree is not made only of the winning "happy route", it contains all sorts of abandoned branches, otherwise it wouldn't function as an evolutionary system.

achierius · 2024-08-23T04:49:39 1724388579

>support npc roles

This might sound trite, but are your parents "replaceable"? Even in a strict mechanistic sense, even a "baseline" parent possesses innumerable qualities that we have absolutely no clue how to build in a robot. They have a well-tuned cost function that keeps them or anything else from killing the baby; they come to learn the child well enough to understand what they're thinking, to correct them & guide them; they themselves are reliable enough to broadly survive until the child reaches adulthood (can you imagine swapping out your robot parents every iPhone generation?). Even their fine motor skills alone are far beyond what we today understand how to build (to cradle, feed, wash, &c).

And that's not the only thing, just a convenient example of how the worldview you present -- one where the only value someone can provide is that which can be extracted by the owning class via their "job" -- is not only unkind but also incoherent. I don't even know what you mean by "deserve" or "earned": yes, none of us did anything to gain the attributes that we were born with, but so what? We are who we are, and who we are is defined in scope by who we were born to be. We are all given options and choose between them day by day. But nobody -- no thing -- chooses which options they are given.

jordanb · 2024-08-23T04:17:37 1724386657

Whenever anyone points out that what LLMs are doing is definitely not how a brain works the AI people respond with "maybe it's going to be better!"

teaearlgraycold · 2024-08-23T04:22:31 1724386951

LLMs are reversing language from text. Language is as much an extension of the brain as the retina is. The words are arbitrary but the deeper levels are closer to rubbing your neurons on a listener’s ear drum. Why would a brute force simulation be anywhere as efficient as the original hardware?

caseyy · 2024-08-23T04:47:03 1724388423

Human language seems to originate in grokking, in natural net terms. You first understand a concept, then you speak or write about it. When you hear or read something, it isn’t retained well unless you can formulate an understanding of it. So language is a communication interface, not thinking itself.

LLMs seem to create an illusion of understanding largely because we believe that understanding is a prerequisite for language — whoever uses this interface must be thinking! But that really seems to only be the case in our brains, not in LLMs.

Grokking in NNs is interesting though. It’s going to be pretty wild times when we crack that nut open. If we could drive grokking on a large data set, it’s difficult to even imagine the possibilities. Who knows what we could grokk about the universe, for example.

But that is still something only human brains can do regularly and reasonably well in heavily-parametrized situations. And even we get overwhelmed so easily by too many variables.

I do hope ML grokking becomes much better than human understanding over time. But I don’t think we are close, sadly. We have just barely scratched the surface in toy models.

burnished · 2024-08-23T05:37:34 1724391454

I misread your statement as stating that computers broadly were less energy efficient at the same tasks, because I took umbridge at your claim that human brains use `so little energy` (citation needed???) and forgot how to reading comprehension, so I compared training costs of a 7 year old and inference costs for an adult in terms of energy to the figures quoted in the article for the various models. I'll address what you actually said below but am keeping my original writeup because its kind of neat.

I'm going to compare energy costs for training a model vs growing a brain because they are analogous. TFA lists the training costs in kWh and I'll arrive at a comparable figure by estimating the total caloric intake of a 7 year old and then multiplying that by an estimate of the proportion of energy a child's brain uses.

I calculate that the lifetime caloric intake of a 7 year old is 3,139,000 dietary calroies based on a daily schedule of 1,000 for the first three years and 1,400 for the next four[1], and assuming an active child's brain uses roughly 60% of total calories[2], we end up with a lifetime total of 1,883,400 calories spent on the brain. This is 2,188 kWh.

>> the energy consumption of model training is somewhere in the range of 11,000 kWh to 52,000 kWh for moderate-sized LLMs, 40,000 kWh to 60,000 kWh for more expensive image generators, and up to 5,000,000 kWh for global-scale LLMs like Llama and ChatGPT.

The 7 year old isn't even finished training, and frankly, needing the same training energy as 2,285 7 year olds to create ChatGPT comes across as a bargain.

Just for a bonus lets compare inference costs, assuming a human brain uses roughly 200 dietary calories and that we can count all of them as part of performing productive work gives us a budget of about 0.5 kWh (464 Wh). Going to quote the article again for inference costs

>> the energy consumption of an inference operation is somewhere in the range of 1.0 Wh to 7.0 Wh for standard operations, and to 4 Wh to 7.5 Wh for more expensive jobs like image or video generation.

Hundreds of text operations and 50-100 images.

As a back of the napkin sketch it looks pretty close.

[1] https://www.healthychildren.org/English/healthy-living/nutri... [2] https://www.livescience.com/burn-calories-brain.html

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You're sorta onto something but it has to do with architecture: human brains are composed of a vast number of sparsely connected and individually poor computational units (neurons). Computers are completely connected and highly reliable but individual units are expensive. Neurons also communicate at a much slower pace than computers can perform calculations. The reason for our unreasonable effectiveness appears rooted in how very many neurons we have and how effectively parallelized most functions of the brain are. Precision calculations are easy for computers because they are literally machines that have operations that map to simple numerical calculations that are highly accurate. Numeric calculations are not simple for human brains because nothing really maps to the process of a calculation. Individual neurons do 'integrate and fire' meaning that if they get excited enough by nearby neurons in a short enough time then they will fire and excite the neurons they touch, but it isn't a reliable process, meaning that if neuron A touches neuron B and A activating is enough to cause B to activate this time it may not be enough the next time A activate. So when we do math we use those.

There are investigations into neuromorphic hardware but I'm not up to date on them anymore, the basic idea is to make computing hardware inspired by how the brain operates. Last I knew there were some interesting results but nothing resembling anything practical.